Delete duplicate records from table in sql server


RDBMS database best design practice never suggests us to keep duplicate records in a table, and If we are working on database schema/table design, we must create the primary key for each. But, many scenarios arise where duplicate records may insert in table, suppose we are working on the Development Database, and data is loaded from different sources where duplicate rows are possible. When we process this data to another database server (PreProd/ Prod), the data should be clean. Therefore, we need to delete the duplicate data.

Here I am creating a duplicate records scenario for the employee table, use the below query.
CREATE TABLE TEMP_EMPLOYEE
(
empId INT IDENTITY(1,1),
FName VARCHAR(100),
LName VARCHAR(100),
Salary DECIMAL(18,2),
Country VARCHAR(100)
)
INSERT INTO TEMP_EMPLOYEE(FName,LName,Salary,Country)
VALUES ('Dilip','Singh',20000,'IND')
INSERT INTO TEMP_EMPLOYEE(FName,LName,Salary,Country)
VALUES ('Raj','Kumar',30000,'USA')
INSERT INTO TEMP_EMPLOYEE(FName,LName,Salary,Country)
VALUES ('Anil','Singh',25000,'AUS')
INSERT INTO TEMP_EMPLOYEE(FName,LName,Salary,Country)
VALUES ('Dilip','Singh',20000,'IND')
INSERT INTO TEMP_EMPLOYEE(FName,LName,Salary,Country)
VALUES ('Dilip','Singh',20000,'IND')
SELECT * FROM TEMP_EMPLOYEE

--Before the deleting duplicate records, you must check it exists in your table or not
--We can check duplicate records exist in the table using bellow query

SELECT FName,LName,Salary,Country FROM TEMP_EMPLOYEE
GROUP BY FNameLNameSalaryCountry HAVING COUNT(*)>1

-- To remove Duplicate records from the table you can use below query

DELETE FROM TEMP_EMPLOYEE WHERE empId NOT IN ( SELECT MAX(empId) FROM TEMP_EMPLOYEE GROUP BY FName,LName,Salary,Country )
--**Use above query, when you want to keep the latest record

--Otherwise you can use below query

DELETE FROM TEMP_EMPLOYEE WHERE empId NOT IN ( SELECT MIN(empId) FROM TEMP_EMPLOYEE GROUP BY FName,LName,Salary,Country )

-- you can remove duplicate records by using Common Table Expression query given below

;WITH CTE AS (
SELECT FName,LName,Salary,Country,ROW_NUMBER() OVER (PARTITION BY FName,LName,Salary,Country ORDER BY FName,LName,Salary,Country) AS [Rank] FROM TEMP_EMPLOYEE
)
DELETE FROM CTE WHERE [Rank]>1

SELECT * FROM TEMP_EMPLOYEE
DROP TABLE TEMP_EMPLOYEE

Related Posts

What is the Use of isNaN Function in JavaScript? A Comprehensive Explanation for Effective Input Validation

In the world of JavaScript, input validation is a critical aspect of ensuring that user-provided data is processed correctly. One indispensa...