How to delete duplicate rows from a table in SQL Server

Here’s a question that probably most SQL Server DBAs or developers had to answer at some point in their career: how do you remove duplicate rows from a table? Maybe that duplicate data came in through an error in some ETL process or maybe it was an error in a front-end application that allowed end users to submit data multiple times. At any rate – the problem that caused duplicate data is now fixed and we have to cleanup the duplicates from our table. (This is also a perfect scenario / question for a SQL Server job interview because it will allow you to quickly tell how a candidate approaches SQL problems that are not exactly trivial – we’ll see why in a little bit.)

Let’s set up a table and insert some sample data into it:

CREATE TABLE [dbo].[SampleData](
	[Id] [int] IDENTITY(1,1) NOT NULL,
	[Column1] [varchar](50) NOT NULL,
	[Column2] [int] NOT NULL,
	[CreatedDate] [datetime] NOT NULL,
 CONSTRAINT [PK_SampleData] PRIMARY KEY CLUSTERED 
(
	[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[SampleData] ADD  CONSTRAINT [DF_SampleData_CreatedDate]  DEFAULT (getdate()) FOR [CreatedDate]
GO

INSERT INTO SampleData (Column1, Column2) VALUES ('Value A', 30)
INSERT INTO SampleData (Column1, Column2) VALUES ('Value C', 90)
INSERT INTO SampleData (Column1, Column2) VALUES ('Value A', 30)
INSERT INTO SampleData (Column1, Column2) VALUES ('Value B', 50)
INSERT INTO SampleData (Column1, Column2) VALUES ('Value A', 30)
INSERT INTO SampleData (Column1, Column2) VALUES ('Value B', 50)
INSERT INTO SampleData (Column1, Column2) VALUES ('Value C', 90)
INSERT INTO SampleData (Column1, Column2) VALUES ('Value B', 50)
INSERT INTO SampleData (Column1, Column2) VALUES ('Value B', 50)

Read more

Data caretaker and/or data interpreter – what exactly is the role of a DBA anyway?

There is a nagging thought that made its way to the back of my mind in the last few weeks. I’ve been watching or listening to a variety of technical presentations / podcasts – mostly on topics related to SQL Server – and after a while I started to see a pattern: the majority of topics and discussions were purely technical in nature and dealt directly with the SQL Server database engine itself … how to keep it running, how to make sure performance was good, how to take care of backups / high availability and so on. It seemed to me that not much thought was given to the actual data that was supposed to be passing through this database engine – the very reason for why we’re doing all these things we’re doing with SQL Server.

I got the impression that many DBAs or consultants who were involved in these discussions were more than happy to spend hours arguing the finer points of the new cardinality estimator in SQL Server 2014 (for example) but when it came time to actually try to make sense of the data passing through their hands they’d rather not have anything to do with it. To them data is just a black box – they’ll take care of it but they’re not interested in trying to make sense of it … that’s somebody else’s job.

So I started to wonder – what exactly is the role of a DBA anyway? Are DBAs strictly caretakers of data (making sure that it’s always there for use) or are they supposed to be much more than that – interpreters of data, helping the business make sense of all these bits and bytes that are getting collected all over the place?

Sometimes it seems to me that DBAs feel safe speaking the geek language of database engine specifications and features but they’re afraid to venture outside of that world where they might actually run into users trying to make sense of all that data. We don’t think that’s our job – that’s for application developers and business/data analysts.

Is that really the case?

I came across the article below (from a few years ago) where the author argues pretty much the same thing – that the DBA role should probably evolve to where DBAs get more and more involved in understanding the relevance of the data they’re managing:

Does the Role of the DBA Need to Evolve?

That article generated quite a few comments on the SQLServerCentral.com site – http://www.sqlservercentral.com/Forums/Topic1219839-263-1.aspx

What do you think of all this? Are DBAs strictly data caretakers or should they be more than that? Does it depend maybe on the size of the business/company where these DBAs work? Do DBAs really get to spend all their time having fun with technical aspects of the engine itself and not be bothered by users who need help understanding and fixing their data?

Building A Complete Data Solution Using The Microsoft BI Stack

Microsoft has a full-featured platform for handling the data needs of most companies. It starts with a strong relational database backend for transactional processing and data warehousing (SQL Server), followed by an analytical processing and data mining tool (SSAS), a solid integration service to handle various data migration tasks (SSIS) and finally a good product for various reporting needs (SSRS). On top of all that we have a variety of Excel add-ons for business intelligence and data mining and lately the very well received Power BI platform for self-service data visualization and business intelligence.

This is all great. The problem is that often many of these products end up being used by themselves instead of combining their various strengths to truly build a data management solution that will enable the business to use its own data for maximum benefits.

There are all sorts of resources online that will attempt to make one an expert in any of the products mentioned above … but not that much guidance in terms of really explaining how they’re supposed to work together.

Many technologists start with SQL Server on the OLTP side – we process daily transactions for the business and we make sure that create / update / delete operations are as fast as possible or else customers will go to a competitor.

So now we have all this data that’s accumulating in a relational database and somebody wants to run reports against it. No problem – we know how to do that so we slap together some reports in SSRS – problem solved … except that as time goes on (and we get more and more data) those reports seem to get slower and slower. And no … please don’t run that report that gives us a summary for the whole business for the last 3 years … you know which one … the one that joins 30 tables, takes 10 minutes to run and slows down the entire database.

What’s the problem here? To use the famous quote – “If you only have a hammer, you tend to see every problem as a nail.” We know SQL Server very well but the relational database model for transaction processing is not the ideal backend for business intelligence and analysis.

Somebody then mentions the term “data warehouse” .. and OLAP … and slowly changing dimensions. Oh no – more technology to learn. How exactly do we go from point A (our transactional database) to point B (some sort of data store that business analysts can query with their fancy graphical tools to build all the nice charts they’re looking for).

We need to spend more time looking at how we can integrate all these technologies (most of which we get for free once we purchase SQL Server) in order to satisfy the various data needs for the business.

I recently came across a very good series of hands-on articles (source code included) that explores all these products and creates a nice picture of how they all fit together.

Part 1 – Introduction and OLTP Database Analysis

Part 2 – OLAP Database Objects Modeling

Part 3 – Data Mart Load Approach and Coding

Part 4 – Data Mart Load Using SSIS

Part 5 – Defining A SSAS Project And Its Dimensions

Part 6 – Refining SSAS Data Warehouse Dimensions

Part 7 – Cube Creation, Cube Deployment and Data Validation

Part 8 – Creating A Sample SSRS Report

I hope you find all these useful.

Installing SQL Server 2016 Community Technology Preview 2 – step by step screenshots

Microsoft recently announced the first public Community Technology Preview (CTP2) for SQL Server 2016. It is available for download from this location:

https://www.microsoft.com/en-us/evalcenter/evaluate-sql-server-2016

If you didn’t get a chance to try it out yet, below are step by step screenshots from a standard SQL Server 2016 CTP2 installation with all options selected. We know that many new features are coming in SQL Server 2016 but from what I saw so far the installation process is pretty much similar to what we’ve seen before in SQL Server 2014.

Some quick observations:

  • SQL Server 2016 can’t be installed on anything below Windows Server 2012 (server-side) or Windows 8 (desktop-side). This is to be expected but still … good to know. I know many .NET developers still run Windows 7 for example as their development environment so SQL Server 2016 won’t install there (whereas SQL Server 2014 installs just fine on Windows 7). Here are the full technical requirements for SQL Server 2016 – https://msdn.microsoft.com/en-us/library/ms143506%28v=sql.130%29.aspx
  • I installed SQL Server 2016 CTP2 on a newly created virtual machine running Windows Server 2012 R2 and I was quite surprised to see as part of the requirements for SQL Server 2016 that it needed Oracle Java JRE 7 Update 51 or higher. I’m not sure yet what component of SQL Server 2016 depends on that – this one will be interesting to watch as more people start trying out the software. I ended up getting the latest Oracle JRE from this location – http://www.oracle.com/technetwork/java/javase/downloads/index.html.

Enjoy!