Foliotek Developer Blog

Highly Inefficient Linq Queries that Break Database Indexing

I have really appreciated using Linq for the past few years to query databases. The old access methods were far more tedious and time consuming. Linq is really great.

Yet, I recently spent some time analyzing our database usage with SQL Server Profiler. Some of the queries showed an alarming number of reads. My first assumption was that adding a simple index could solve the issue. However, the Database Engine Tuning Adviser offered no help.

I then began to really look at the query that had been generated, and that’s when I noticed the real problem.

Original Linq Statement (Pre-compiled query)

 private static Func<modeldatacontext getanswerparams="" iorderedqueryable="">>  
 GetAnswersListQuery = System.Data.Linq.CompiledQuery.Compile(  
 (ModelDataContext DataContext, GetAnswerParams ps) =></modeldatacontext>

 (from a in DataContext.Answers  
 where a.UserID == ps.UserID  
 && (ps.PublishedFormID == -1 || a.PublishedFormID == ps.PublishedFormID)  
 select a)  
 );  
 }  

Take note of this particular part of the query:

    && (ps.PublishedFormID -1 || a.PublishedFormID ps.PublishedFormID)

Basically, an answer may or may not be associated with a particular Published Form. There is an index on the User ID and Published Form ID columns of the Answers table. However, combining a comparison of the input value (ps.PublishedFormID) to the vaule of -1 along with a comparison on the column in the database prevents SQL Server from using that index at all. Thus, it is relegated to only one index.

In this case, if the input Published Form ID is -1, then that will also be what was saved for the Published Form ID of the Answer table as well, so this comparison needlessly breaks the index possibilities. This is not always the case, of course.

Changing the query slightly can solve the issue:

 private static Func<modeldatacontext getanswerparams="" iorderedqueryable="">>  
 GetAnswersListQuery = System.Data.Linq.CompiledQuery.Compile(  
 (ModelDataContext DataContext, GetAnswerParams ps) =></modeldatacontext>

 (from a in DataContext.Answers  
 where a.UserID == ps.UserID  
 && (a.PublishedFormID == -1 || a.PublishedFormID == ps.PublishedFormID)  
 select a)  
 );  
 }  

This small change allows the Published Form ID index on the Answer table to be used, even though it may be -1. In other cases like this, it may be necessary to simply break the query into multiple queries. Take this as an example:

&& ((ps.PublishedFormID == -1 && a.ParentID = ps.ParentID) || (ps.PublishedFormID > 0 && a.PublishedFormID == ps.PublishedFormID)

This also breaks the indexing, of course. In this case, it may be actually better to have two distinct queries, one of which is run if the passed in Published Form ID is -1 and one of which is run if the Published Form ID is not -1.

Realized Improvements

For one particular page load that called this query multiple times, I was catching any query that had more than 1,000 reads. My results showed that the new way resulted in* 0.7% as many queries* going over the threshold, utilizing 0.8% of the CPU and executing 0.6% of the reads. This was a pretty intense page, but that made the optimization all the more important. Here are the raw stats:

Approach      Total Queries Run      Total CPU Used      Total Reads      
Old68733,9943,478,705
New528121,907
So, Linq is awesome, and writing pre-compiled queries is great, but this doesn’t automatically create the most optimized SQL possible. To achieve that, careful attention must be given to the way the Linq query is written, and then you just have to test the outcomes to know for sure.