Here’s a question that probably most SQL Server DBAs or developers had to answer at some point in their career: how do you remove duplicate rows from a table? Maybe that duplicate data came in through an error in some ETL process or maybe it was an error in a front-end application that allowed end users to submit data multiple times. At any rate – the problem that caused duplicate data is now fixed and we have to cleanup the duplicates from our table. (This is also a perfect scenario / question for a SQL Server job interview because it will allow you to quickly tell how a candidate approaches SQL problems that are not exactly trivial – we’ll see why in a little bit.)
Let’s set up a table and insert some sample data into it:
CREATE TABLE [dbo].[SampleData]( [Id] [int] IDENTITY(1,1) NOT NULL, [Column1] [varchar](50) NOT NULL, [Column2] [int] NOT NULL, [CreatedDate] [datetime] NOT NULL, CONSTRAINT [PK_SampleData] PRIMARY KEY CLUSTERED ( [Id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO ALTER TABLE [dbo].[SampleData] ADD CONSTRAINT [DF_SampleData_CreatedDate] DEFAULT (getdate()) FOR [CreatedDate] GO INSERT INTO SampleData (Column1, Column2) VALUES ('Value A', 30) INSERT INTO SampleData (Column1, Column2) VALUES ('Value C', 90) INSERT INTO SampleData (Column1, Column2) VALUES ('Value A', 30) INSERT INTO SampleData (Column1, Column2) VALUES ('Value B', 50) INSERT INTO SampleData (Column1, Column2) VALUES ('Value A', 30) INSERT INTO SampleData (Column1, Column2) VALUES ('Value B', 50) INSERT INTO SampleData (Column1, Column2) VALUES ('Value C', 90) INSERT INTO SampleData (Column1, Column2) VALUES ('Value B', 50) INSERT INTO SampleData (Column1, Column2) VALUES ('Value B', 50)