SQL Server | OdeToData

List of technology podcasts

August 27, 2015 by Cristian Satnic

I’m a full believer in the term “Automobile University” that was coined by the well-known motivational speaker Zig Ziglar – the idea being that time spent in traffic can and should be used to educate oneself on a variety of subjects. As such, I have quite a few technology / IT podcasts that I subscribe to and I make sure that my audio player always has plenty of interesting episodes available in the queue.

I already shared the list below with plenty of friends and co-workers who know that I listen to a variety of podcasts so I figured I should probably just create a blog post with this information for future reference.

At the time of this post all podcasts mentioned below appear to still be active – kudos and many thanks to all these authors who keep creating solid technical content for all of us to enjoy.

In the list below I link to the actual podcast sites. If you want to actually subscribe to them you should be able to find them in the iTunes store or wherever else you grab your podcasts from. If you know of any other ones in the various categories listed below please add them in the comments.

Cloud Computing

Big Data

IBM Big Data & Analytics

Databases (SQL Server)

Development (.NET, Javascript)

Infrastructure, Networking, Enterprise Tech

Security

I should also mention the amazing list of shows / podcasts from the TWiT network. Leo Laporte and his amazing crew create some awesome content – way faster than I can possibly consume it. You’ll probably find some interesting topics there as well.

How to delete duplicate rows from a table in SQL Server

August 9, 2015 by Cristian Satnic

Here’s a question that probably most SQL Server DBAs or developers had to answer at some point in their career: how do you remove duplicate rows from a table? Maybe that duplicate data came in through an error in some ETL process or maybe it was an error in a front-end application that allowed end users to submit data multiple times. At any rate – the problem that caused duplicate data is now fixed and we have to cleanup the duplicates from our table. (This is also a perfect scenario / question for a SQL Server job interview because it will allow you to quickly tell how a candidate approaches SQL problems that are not exactly trivial – we’ll see why in a little bit.)

Let’s set up a table and insert some sample data into it:

CREATE TABLE [dbo].[SampleData](
	[Id] [int] IDENTITY(1,1) NOT NULL,
	[Column1] [varchar](50) NOT NULL,
	[Column2] [int] NOT NULL,
	[CreatedDate] [datetime] NOT NULL,
 CONSTRAINT [PK_SampleData] PRIMARY KEY CLUSTERED 
(
	[Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [dbo].[SampleData] ADD  CONSTRAINT [DF_SampleData_CreatedDate]  DEFAULT (getdate()) FOR [CreatedDate]
GO

INSERT INTO SampleData (Column1, Column2) VALUES ('Value A', 30)
INSERT INTO SampleData (Column1, Column2) VALUES ('Value C', 90)
INSERT INTO SampleData (Column1, Column2) VALUES ('Value A', 30)
INSERT INTO SampleData (Column1, Column2) VALUES ('Value B', 50)
INSERT INTO SampleData (Column1, Column2) VALUES ('Value A', 30)
INSERT INTO SampleData (Column1, Column2) VALUES ('Value B', 50)
INSERT INTO SampleData (Column1, Column2) VALUES ('Value C', 90)
INSERT INTO SampleData (Column1, Column2) VALUES ('Value B', 50)
INSERT INTO SampleData (Column1, Column2) VALUES ('Value B', 50)

Data caretaker and/or data interpreter – what exactly is the role of a DBA anyway?

July 30, 2015 by Cristian Satnic

There is a nagging thought that made its way to the back of my mind in the last few weeks. I’ve been watching or listening to a variety of technical presentations / podcasts – mostly on topics related to SQL Server – and after a while I started to see a pattern: the majority of topics and discussions were purely technical in nature and dealt directly with the SQL Server database engine itself … how to keep it running, how to make sure performance was good, how to take care of backups / high availability and so on. It seemed to me that not much thought was given to the actual data that was supposed to be passing through this database engine – the very reason for why we’re doing all these things we’re doing with SQL Server.

I got the impression that many DBAs or consultants who were involved in these discussions were more than happy to spend hours arguing the finer points of the new cardinality estimator in SQL Server 2014 (for example) but when it came time to actually try to make sense of the data passing through their hands they’d rather not have anything to do with it. To them data is just a black box – they’ll take care of it but they’re not interested in trying to make sense of it … that’s somebody else’s job.

So I started to wonder – what exactly is the role of a DBA anyway? Are DBAs strictly caretakers of data (making sure that it’s always there for use) or are they supposed to be much more than that – interpreters of data, helping the business make sense of all these bits and bytes that are getting collected all over the place?

Sometimes it seems to me that DBAs feel safe speaking the geek language of database engine specifications and features but they’re afraid to venture outside of that world where they might actually run into users trying to make sense of all that data. We don’t think that’s our job – that’s for application developers and business/data analysts.

Is that really the case?

I came across the article below (from a few years ago) where the author argues pretty much the same thing – that the DBA role should probably evolve to where DBAs get more and more involved in understanding the relevance of the data they’re managing:

Does the Role of the DBA Need to Evolve?

That article generated quite a few comments on the SQLServerCentral.com site – http://www.sqlservercentral.com/Forums/Topic1219839-263-1.aspx

What do you think of all this? Are DBAs strictly data caretakers or should they be more than that? Does it depend maybe on the size of the business/company where these DBAs work? Do DBAs really get to spend all their time having fun with technical aspects of the engine itself and not be bothered by users who need help understanding and fixing their data?

Building A Complete Data Solution Using The Microsoft BI Stack

July 26, 2015 by Cristian Satnic

Microsoft has a full-featured platform for handling the data needs of most companies. It starts with a strong relational database backend for transactional processing and data warehousing (SQL Server), followed by an analytical processing and data mining tool (SSAS), a solid integration service to handle various data migration tasks (SSIS) and finally a good product for various reporting needs (SSRS). On top of all that we have a variety of Excel add-ons for business intelligence and data mining and lately the very well received Power BI platform for self-service data visualization and business intelligence.

This is all great. The problem is that often many of these products end up being used by themselves instead of combining their various strengths to truly build a data management solution that will enable the business to use its own data for maximum benefits.

There are all sorts of resources online that will attempt to make one an expert in any of the products mentioned above … but not that much guidance in terms of really explaining how they’re supposed to work together.

Many technologists start with SQL Server on the OLTP side – we process daily transactions for the business and we make sure that create / update / delete operations are as fast as possible or else customers will go to a competitor.

So now we have all this data that’s accumulating in a relational database and somebody wants to run reports against it. No problem – we know how to do that so we slap together some reports in SSRS – problem solved … except that as time goes on (and we get more and more data) those reports seem to get slower and slower. And no … please don’t run that report that gives us a summary for the whole business for the last 3 years … you know which one … the one that joins 30 tables, takes 10 minutes to run and slows down the entire database.

What’s the problem here? To use the famous quote – “If you only have a hammer, you tend to see every problem as a nail.” We know SQL Server very well but the relational database model for transaction processing is not the ideal backend for business intelligence and analysis.

Somebody then mentions the term “data warehouse” .. and OLAP … and slowly changing dimensions. Oh no – more technology to learn. How exactly do we go from point A (our transactional database) to point B (some sort of data store that business analysts can query with their fancy graphical tools to build all the nice charts they’re looking for).

We need to spend more time looking at how we can integrate all these technologies (most of which we get for free once we purchase SQL Server) in order to satisfy the various data needs for the business.

I recently came across a very good series of hands-on articles (source code included) that explores all these products and creates a nice picture of how they all fit together.

Part 1 – Introduction and OLTP Database Analysis

Part 2 – OLAP Database Objects Modeling

Part 3 – Data Mart Load Approach and Coding

Part 4 – Data Mart Load Using SSIS

Part 5 – Defining A SSAS Project And Its Dimensions

Part 6 – Refining SSAS Data Warehouse Dimensions

Part 7 – Cube Creation, Cube Deployment and Data Validation

Part 8 – Creating A Sample SSRS Report

I hope you find all these useful.