CodeFari: November 2014

RDBMS database best design practice never suggests us to keep duplicate records in a table, and If we are working on database schema/table design, we must create the primary key for each. But, many scenarios arise where duplicate records may insert in table, suppose we are working on the Development Database, and data is loaded from different sources where duplicate rows are possible. When we process this data to another database server (PreProd/ Prod), the data should be clean. Therefore, we need to delete the duplicate data.

Here I am creating a duplicate records scenario for the employee table, use the below query.

CREATE TABLE TEMP_EMPLOYEE

(

empId INT IDENTITY(1,1),

FName VARCHAR(100),

LName VARCHAR(100),

Salary DECIMAL(18,2),

Country VARCHAR(100)

)

INSERT INTO TEMP_EMPLOYEE(FName,LName,Salary,Country)

VALUES ('Dilip','Singh',20000,'IND')

INSERT INTO TEMP_EMPLOYEE(FName,LName,Salary,Country)

VALUES ('Raj','Kumar',30000,'USA')

INSERT INTO TEMP_EMPLOYEE(FName,LName,Salary,Country)

VALUES ('Anil','Singh',25000,'AUS')

INSERT INTO TEMP_EMPLOYEE(FName,LName,Salary,Country)

VALUES ('Dilip','Singh',20000,'IND')

INSERT INTO TEMP_EMPLOYEE(FName,LName,Salary,Country)

VALUES ('Dilip','Singh',20000,'IND')

SELECT * FROM TEMP_EMPLOYEE

--Before the deleting duplicate records, you must check it exists in your table or not

--We can check duplicate records exist in the table using bellow query

SELECT FName,LName,Salary,Country FROM TEMP_EMPLOYEE

GROUP BY FName, LName, Salary, Country HAVING COUNT(*)>1

-- To remove Duplicate records from the table you can use below query

DELETE FROM TEMP_EMPLOYEE WHERE empId NOT IN ( SELECT MAX(empId) FROM TEMP_EMPLOYEE GROUP BY FName,LName,Salary,Country )

--**Use above query, when you want to keep the latest record

--Otherwise you can use below query

DELETE FROM TEMP_EMPLOYEE WHERE empId NOT IN ( SELECT MIN(empId) FROM TEMP_EMPLOYEE GROUP BY FName,LName,Salary,Country )

-- you can remove duplicate records by using Common Table Expression query given below

;WITH CTE AS (

SELECT FName,LName,Salary,Country,ROW_NUMBER() OVER (PARTITION BY FName,LName,Salary,Country ORDER BY FName,LName,Salary,Country) AS [Rank] FROM TEMP_EMPLOYEE

)

DELETE FROM CTE WHERE [Rank]>1

SELECT * FROM TEMP_EMPLOYEE

DROP TABLE TEMP_EMPLOYEE

Lookup transformation is responsible to join columns to the data flow/data set, by accessing additional information in a related table based on values in common columns.

Problem : We have a table EMP which data are fetching by OLEDB Source, this table contains column FName, LName, Salary and Country. In the flow we need to add column empAddress and empPhone of Employees which information are in EMPDETAILS table.

Solution : We would use Lookup Transformation. Now see below the use of lookup transformation..

Create table EMP(

Id int identity(1,1),

FName VARCHAR(50),

LName VARCHAR(50),

Salary DECIMAL(18,2),

Country VARCHAR(50)

)

insert into EMP (FName,LName,Salary,Country)

Values('Dilip','Singh',20000,'IND')

insert into EMP (FName,LName,Salary,Country)

Values('Viru','Verma',40000,'USA')

insert into EMP (FName,LName,Salary,Country)

Values('Anil','Singh',30000,'UK')

insert into EMP (FName,LName,Salary,Country)

Values('Raju','Shukla',25000,'IND')

insert into EMP (FName,LName,Salary,Country)

Values('Raj','Kushwaha',20000,'AUS')

insert into EMP (FName,LName,Salary,Country)

Values('Salim','Sekh',26000,'PAK')

insert into EMP (FName,LName,Salary,Country)

Values('Nirmit','Katihar',27000,'AUS')

insert into EMP (FName,LName,Salary,Country)

Values('Dilip','Singh',23000,'AUS')

select * from EMP

Create Table EMPDETAILS

(

Id int identity(1,2),

empId int not null,

empAddress varchar(200),

empPhone Varchar(20)

)