Here’s the scenario: you are a SQL Server DBA and you manage at least one database server that’s used by custom applications created by in-house developers using Entity Framework. From your experience you believe that the database server is powerful enough (hardware-wise) to handle the applications using it but the fact is that the database server is actually struggling to keep up. You often see high CPU usage, high network traffic and a much larger volume of queries than you’d expect from the applications using it. Management wants to know what the problem is: is the server not powerful enough from a hardware point of view? Is it mis-configured? Is the application not properly coded?
You have a suspicion that the application code is not as tight as it could be but you’re not really a .NET developer and you don’t really know much about Entity Framework. What would you look for from the SQL Server side in order to make recommendations to management (and the development team maybe) about what steps to take to improve performance?
If your first instinct is to say that Entity Framework is evil, that all ORM (object-relational mapping) tools should be banned and that your databases can only be queried using stored procedures … then you’re probably on the wrong path and you’re not likely to make many friends with that approach.
Entity Framework is a tool – a very powerful tool I would say – in the .NET development stack. It enables developers to write applications faster by focusing on business logic and business models instead of having to worry about the low-level plumbing necessary to get data in and out of a database. The problem with Entity Framework (when it comes to performance) is that often it hides the database layer so well that developers forget that their various object manipulations end up generating all sorts of queries against the database – queries that they would probably be a lot more careful with if they actually had to write them from scratch in SQL code.
As is the case with most powerful tools Entity Framework has quite a set of instructions and best practices that must be followed in order to get the best performance out of it. The best single-page collection of Entity Framework performance tips (in my opinion) can be found at the location below:
Performance Considerations for Entity Framework 4, 5, and 6
That page is full of information and it can be quite intimidating. Much of it though applies to best-practices that belong in the .NET layer and would not directly be visible to a SQL Server DBA. If you want to see from the database side if best-practices were followed what are some of the tell-tale signs you should look for?
Some observations before we get started:
- I mentioned that these tips would be for SQL Server DBAs because Entity Framework apps usually use SQL Server as a database backend. The truth is that probably most of these also apply to applications that use Entity Framework with other types of databases.
- As any good IT troubleshooter will know one trick to solving IT problems is to try to isolate the actual issue from background noise. In the context of what we’re trying to do here you need to be able to run the offending application in a way that you can easily study its SQL Server activity – either against a SQL Server where you’re the sole active user or maybe against a staging system with low activity. You would then proceed to use the application through its UI and record its SQL activity for later analysis using tools such as SQL Server Profiler, Extended Events or any other third-party tool you prefer.
What should you then look for?
SELECT * FROM TABLE_NAME
Entity Framework will not exactly issue SELECT * FROM commands – what it will do though is have explicit SELECT statements that include ALL columns in a particular table. If you see that most SQL queries are selecting all columns this way (especially from large tables when it appears that the UI is not using all that data) then you know that developers got a little sloppy with their code. It’s very easy in Entity Framework to bring back all columns – it takes more work and thought to build a custom LINQ projection query to select only the needed columns.
The famous N+1 problem
Here’s how to spot this pattern: say that in the UI you go to a page that shows a grid of data with 100 rows. You would typically expect the application to be able to return all that grid data with a single query. Instead, from the SQL activity you captured you see an initial query that gets ‘most’ of the data and then … surprise … 100 other queries that all look very similar (usually with some different values in the WHERE clause).
In Entity Framework it’s very easy to fall into this trap that deals with loading related data. By default Entity Framework in an application has a setting called Lazy Loading set to true – you’re probably guessing already what this does. Let’s say you have one query where you request data for the grid in our example. If you display fields related to that data that were not directly included in the main query Entity Framework (out of the goodness of its heart) will just quietly go and grab the additional data from the database on a need-to-have basis. As the code loops over the various rows being displayed you’ll see additional queries being executed for each row. For N rows of data we then end up with N + 1 queries (or possibly even worse). This is mentioned in section ‘8 Loading Related Entities’ on the page above. The solution is to use Eager Loading (and possibly disable Lazy Loading) as a way to be very explicit about the data that is needed for display at the time when the original query is executed.
Implicit type conversions and unused indexes in queries against VARCHAR fields
This one might be harder to detect but in the ‘perfect storm’ kind of scenarios it can bring a server to its knees with high CPU usage and heavy IO activity. Here’s what’s going on: let’s say that you see queries that in the WHERE clause search against a VARCHAR column in a table (for example a GUID that’s used as a primary key). Nothing really wrong with that – the table has an index on that VARCHAR column so all should be well … except that it’s not. Simple queries from Entity Framework against that table take much longer than expected. You look at the query plan and you see that the index on that column is not being used and that SQL Server is warning you that an implicit type conversion took place.
What exactly is going on there? As you look at the original query more carefully you see that Entity Framework is passing that parameter as a NVARCHAR value. SQL Server must convert it to match the data type in the table and in the process it will not make use of that index. The problem comes from the fact that in Entity Framework all string fields by default are considered to be NVARCHAR types when queries are created. If you have VARCHAR fields in the database then the developers must specifically declare the column types in the Entity Framework data model as VARCHAR. This way Entity Framework will use the proper data types when running the query and all will be well again.
Missing caching layer for metadata / lookup tables
This is not exactly an Entity Framework problem but rather an application architecture issue that’s easy to overlook when using an ORM tool. Let’s say that you have data in the database that does not change often – such as the list of US states. Once the application gets it once it should hold on to it and use it from some sort of memory cache layer whenever it needs it. Instead, you see that queries are being made against such tables all over the place. The fastest query to execute is the one that does not need to execute at all. Developers should implement some sort of caching layer in the application and store such data locally next to the application for long term use.
So there you have it. I’ve outlined here a few usage patterns along with a very detailed page to help you, as a SQL Server DBA, determine if the Entity Framework application hitting one of your databases is probably lacking some of the performance best-practices mentioned above. Sometimes just throwing more hardware at the problem might be cheaper in the short term but as it usually happens bad application code combined with enough user activity will eventually bring even powerful servers to their knees.