Strategy and tips for performance troubleshooting in SQL Server

There was probably a time in its early days when SQL Server could be considered a simple database engine – we’d run queries with SQL and we were probably pleasantly surprised when stuff just worked. Much has changed since then. Over time SQL Server has evolved into a complex relational database management system (RDBMS). If there is some task that involves data manipulation then SQL Server most likely has a component to handle that – data queries with SQL, multi-dimensional analysis, statistical analysis with R, reporting, integration with other data sources, ETL and so on.

On one hand it is amazing that given all this complexity there are plenty of workloads that work just well enough on their own (no full-time DBA needed) if enough hardware resources are made available.

On the other hand, there are certainly times when performance troubleshooting is required – either because SQL Server’s response got worse over time or because it’s just not at the level where it needs to be to keep users happy.

With all these system components fighting for limited hardware and infrastructure resources how is a DBA supposed to troubleshoot performance problems? When angry users are on the phone or when your boss is looking over your shoulder demanding quick results – that is hardly the time to come up to speed on the finer points of waits statistics analysis.

Below is a general strategy with some specific tips and tools to guide you on the road to a better SQL Server performance.

1. Understand the environment and see how SQL Server is configured (the sanity check)

This first step is very important especially when you have to troubleshoot an environment you’re not familiar with (for example when you’re a performance consultant) or when you work at a company where many people can make changes to the SQL Server environment and you’re not really sure if what was true last month about the configuration is still true today. It doesn’t make any sense to spend time looking for bad queries only to discover eventually that somebody made changes to the SQL Server memory configuration, confused megabytes with kilobytes and now SQL Server is running with much less memory available for its own use.

Here are some scripts to help in this discovery phase:

Brent Ozar and company have an awesome set of scripts in their “First Responder Kit” that’s now available on GitHub:

https://github.com/BrentOzarULTD/SQL-Server-First-Responder-Kit

The main one that you want to get familiar with is sp_Blitz but certainly look over the other ones as well. sp_Blitz performs various checks in configuration settings and it will alert you when it discovers anything that’s not considered a best practice.

The other set of very useful diagnostic scripts are those provided by Glenn Berry:

http://www.sqlskills.com/blogs/glenn/category/dmv-queries/

He usually updates them every month and targets specific features in the various SQL Server versions available.

If all looks good with the environment and configuration then it’s time to move to the next step.

2. What is SQL Server doing right now?

There are probably many commercial tools that offer all sorts of monitoring capabilities but the one free tool that every SQL Server DBA should be familiar with is sp_WhoIsActive (this is my ‘don’t leave SQL Server without it’ tool):

http://whoisactive.com/

It is an awesome stored procedure by Adam Machanic that shows all sorts of relevant data about what exactly is going on at the transaction level when the stored procedure is executed. It has tons of flags and options – all meant to customize the output in order to facilitate a performance troubleshooting session.

Adam wrote a series of blog posts explaining all its various features – it’s well worth reading these in advance so you’re familiar with them before you need to use them under pressure:

http://sqlblog.com/blogs/adam_machanic/archive/tags/month+of+monitoring/default.aspx

It’s possible that when you’re looking strictly at what’s going on in the present you may not be able to determine why SQL Server is hurting. In that case you’ll probably need to go to the next step:

3. What is SQL Server waiting on during a certain time interval? (wait statistics analysis)

SQL Server is a complex system but it’s also a pretty good patient – it keeps detailed track of many vital statistics but you need to know where to look for that data. SQL queries, until they finish their execution, go through many different kinds of waits: waiting for CPU resources, waiting for data to be moved to memory from disk, waiting for data that other queries have locks on and so on. An analysis of all these waits over time can offer critical insight into areas where SQL Server is spending time waiting instead of getting those queries processed. Below are some resources that should help you with wait stats analysis:

http://www.sqlskills.com/blogs/paul/wait-statistics-or-please-tell-me-where-it-hurts/ – Wait statistics, or please tell me where it hurts (this is the classic Paul Randal article on the topic of wait stats)

Be sure to check out his other articles on this topic:

http://www.sqlskills.com/blogs/paul/category/wait-stats/

Paul Randal recently created an online waits library documenting the various wait types seen in SQL Server:

https://www.sqlskills.com/help/waits/

Brent Ozar also has a very good article & script on wait stats as part of the “First Responder Kit”:

https://www.brentozar.com/responder/triage-wait-stats-in-sql-server/

4. Query Store

If you’re fortunate enough to work with SQL Server 2016 or Azure SQL Database then you certainly need to become familiar with the Query Store:

https://msdn.microsoft.com/en-us/library/dn817826.aspx – Monitoring Performance By Using the Query Store

This is SQL Server’s new black box. It’s what will enable you to understand query performance problems in the recent past when you were not directly monitoring SQL Server. And certainly make sure Query Store is enabled and recording – otherwise it won’t do you much good 😉

Additional resources

https://www.youtube.com/watch?v=rf17jQRcfjI – The Server is Down What will you do

https://www.red-gate.com/library/troubleshooting-sql-server-a-guide-for-accidental-dbas – Troubleshooting SQL Server: A Guide for Accidental DBAs (free ebook)

https://www.red-gate.com/library/performance-tuning-with-sql-server-dynamic-management-views – Performance Tuning With SQL Server Dynamic Management Views (free ebook)