Foliotek Development Blog

SQL Server Query Optimization – CPU Impact of Datatype Mismatching

Tuesday, March 13th, 2012

I’ve been identifying “expensive” queries in our database lately and came across one that was particularly interesting. One query showed 1400 CPU, which is certainly high. Why in the world would a query require that much processing. It was a pretty simple query as well:



SELECT [t0].[ItemID], [t0].[UserID], [t0].[ItemTypeID], etc.
FROM [dbo].[Items] AS [t0]
WHERE (CONVERT(NVarChar, [t0].[ItemID])) IN (@p0, @p1, @p2, @p3)
ORDER  BY [t0].[Name] 

This just takes a list of Item ID’s and returns the Items. The “Order By” clause doesn’t take much. Notice the CONVERT statement in the WHERE clause. The CONVERT statement does not convert the input data to that of the database column, but rather converts the datatype of the database column to match the datatype of the input variables.

The problem was a datatype mismatch. Item ID is an integer, but, through carelessness, I had passed in a list of strings (in, C#). This query (generated by Linq) appears to have converted every Item ID in the index to an nvarchar–all to retrieve, in this case, four rows from the Item table. This was a very small thing to correct–simply convert the list of strings to a list of integers in C# prior to using that in a Linq query. Failing to match types in the database had huge consequences. A little carelessness resulted in a very substantial CPU load.

After the simple correction, this query didn’t even show up in the SQL Profiler results, even when I reduced the threshold substantially. So, 1400+ CPU to 0 CPU.

For more, information on how I identified this query, see A Beginner’s Guide to SQL Server Query Optimization.

SQL Server Database Optimization – a Beginner’s Guide

Thursday, March 1st, 2012

Over the past month, I?ve been analyzing our SQL Server Database activity to identify and resolve any significantly inefficient queries. Through the process I?ve learned a few things that may help other people who are doing the same work. Let me know what you think.

Optimization Strategy

  • Focus on Reads and CPU.

If a query has a high number of reads, then it is likely that either a table is not indexed properly or a query is written in such a way that appropriate indexes cannot be used. If the CPU is particularly high, a variety of factors could be at issue.

  • Identify BOTH expensive individual queries AND expensive aggregate queries.

When I started, I just focused on identifying the queries with the highest read counts and CPU. That?s certainly helpful if very expensive queries can be significantly optimized. However, an expensive query that is rarely runs is probably not as problematic as a moderately expensive query that runs very often.

For instance, one query may run about once an hour and have 100,000 reads and 500 CPU. Yet, another query may only have 25,000 reads and 125 CPU, but runs 100 times an hour. The cumulative hourly expense would therefore be 2,500,000 reads and 12,500 CPU?a much higher expense than the query that only runs once an hour.

Optimization Process

  • Create a Baseline for your database

As you achieve performance improvements, you will want to compare them with how the system performed prior to the improvements being made. Having a baseline will allow you to quantify the improvements made in real numbers. Seeing significant improvements is very satisfying and well worth the time.

  • Create a baseline copy of the database ? This will allow you to compare how new indexes impact performance.
  • Store a run of SQL Profiler as a baseline ? Profiler allows you to write results to a database table. Name it ?Baseline? or something like that. I write to a test database to isolate that work from development. It may be good to do two runs?one that perhaps catches all queries with more than 1,000 reads and one for all queries with more than 100 CPU for a time period of 30 minutes. (These numbers certainly depend on the size and structure of your particular database. You may want to do some preliminary runs to figure out the best numbers for your database.)
  1. From SQL Server Management Studio => Tools => SQL Server Profiler
  2. Select database
  3. Click ?Save to Table? on General tab. Select development database. Table name = ?Baseline?. (or Baseline_Reads_5000 and Baseline_CPU_100)
  4. Events Selection Tab: TSQL ? SQL BatchCompleted and Stored Procedures ? RPC:Completed (if you use stored procedures)
  5. Column Filters:
    • Run 1: Reads ? Greater than or Equal ? 5000 //Exclude rows that do not contain values
    • Run 2: CPU ? Greater than or Equal ? 100 //Exclude rows that do not contain values
  • Identify Expensive Queries

Query your baseline run of SQL Profiler. The following query could be of great help. It aggregates the total reads and total CPU, allowing you to sort by those columns. This allows you to see which queries are the most expensive from an aggregate perspective. Some queries that are rarely run still may show up in such sorts. This really emphasizes the importance of tackling such queries.

The key grouping is on ?SUBSTRING(Textdata, 1, 150)?. Each query that is run typically has variables appended to the end of the query. Comparing the first part of the query can allow identical queries with different parameters to be grouped together to provide aggregate results. You may need to play with the number of characters from time to time.

I also group by DatabaseName, since multiple databases may be stored on the same server.


SELECT DatabaseName,
        COUNT(*) AS [Query Count],
        SUM(CPU) AS [CPU Query Total],
        AVG(CPU) AS [CPU Per Query],
        SUM(Reads) AS [Reads Query Total],
        AVG(Reads) AS [Reads Per Query],
        SUBSTRING(Textdata, 1, 150) AS Query
FROM My_Database
GROUP BY DatabaseName, SUBSTRING(Textdata, 1, 150)
ORDER BY [Reads, Query, Total] DESC

This is another query that is a little more complex, but it adds some interesting information. Specifically, it gets the totals across all queries (that met the read or CPU threshold) for query counts, Reads, and CPU, then computes the percentages for each query.


WITH [Total All Queries] AS
(
        SELECT COUNT(*) AS [Query Count],
                SUM(CPU) AS [Total CPU],
                SUM(Reads) AS [Total Reads]
                FROM My_Database
)</p>
SELECT DatabaseName,
        COUNT(*) AS [Query Count],
        CAST(COUNT(*) * 100.0 / (SELECT TOP 1 [Query Count] FROM [Total All Queries]) AS DECIMAL(4,2)) AS [Query Count Percentage],
        SUM(CPU) AS [CPU Query Total],
        AVG(CPU) AS [CPU Per Query],
        CAST(SUM(CPU) * 100.0 / (SELECT TOP 1 [Total CPU] FROM [Total All Queries]) AS DECIMAL(4,2)) AS [CPU Query Percentage],
        SUM(Reads) AS [Reads Query Total],
        AVG(Reads) AS [Reads Per Query],
        CAST(SUM(Reads) * 100.0 / (SELECT TOP 1 [Total Reads] FROM [Total All Queries]) AS DECIMAL(4,2)) AS [Reads Query Percentage],
        SUBSTRING(Textdata, 1, 150) AS Query
FROM My_Database
GROUP BY DatabaseName, SUBSTRING(Textdata, 1, 150)
ORDER BY [Reads Query Total] DESC

After you have run this and identified a query for optimization, you?ll need to query the database again so you can get the entire query at issue. Identify something unique in the query and use that as a comparison. (In my experience, I have had to compare on individual terms.)


SELECT CPU, Reads, DatabaseName, TextData AS Query,*
FROM Baseline_Reads_5000
WHERE Baseline_Reads_5000.Textdata like '%part of query%'
ORDER BY Baseline_Reads_5000.CPU DESC

  • Analyze Expensive Queries in Database Engine Tuning Adviser

You can copy and paste queries from your Baseline database into a new query window, then select the query, right mouse button click, and select ?Analyze Query in Database Engine Tuning Adviser?. This will show any recommendations for new indexes or statistics that would help optimize the query. It also gives you a projected percentage of optimization that can be achieved.

If it recommends a new index and a few statistics that are pretty small that is projected to yield a high improvement, then you?re probably in good shape. If it recommends very large indexes to achieve small results or if it doesn?t offer any recommendations, then you have more work to do.

Sometimes a particular column is indexed on one table, but your query is looking for it on a different table. Changing the query to use an index that already exists on the column in a different table can give you substantial improvements without needing to create any new indexes.

Sometimes the query is just not well written. In those cases, you may need to break up the query or think of a different method to obtain the same results.

  • Profile the Results of the Optimizations

Sometimes you can see a very significant difference by just running a query in SQL Server Management Studio before and after, especially for very expensive queries. With moderately expensive queries, though, it may be helpful to obtain more precise numbers.

It is possible to run SQL Profiler and restrict the results to your machine. Alternatively, you can isolate a test database and run the query there. If you?re using Linq, then you will need to get the new generated SQL to compare or just run it in a development environment that you can isolate.

For this run of SQL Profiler, you don?t have to write your results to a database table, since the rows should be minimal enough to view onscreen. It can still be nice to have for later reference, though, so you can name the table something like ?Optimize Registrations?.

Still, you can see the difference your changes make right in the Profiler window. Sometimes the first run will show up, but the changes are so significant that the second run (with the new index or whatever) will be below the threshold you set. So, for these runs, it?s good to set a lower threshold of reads and/or CPU.

Additionally, watching the impact of a specific event on the database can identify redundant queries–queries that are run multiple times to get the same data. ?Storing the query results the first time can provide very significant decreases in database usage. ?Also, every access of the database requires some overhead, so accessing the database a hundred times to get a small piece of information may be more expensive to accessing it once to get a hundred small pieces of information.

Final Notes

  • Read and CPU Thresholds

After you?ve optimized your most expensive queries, you may need to lower these to get more results. However, at some point the gains you achieve will not be worth the amount of effort it takes. At that point, set a reminder to come back in a few months and run another profile to see if any new queries have been introduced that are not properly optimized.

  • Complex Queries

Not all expensive queries can be resolved by adding indexes. Sometimes a query is just too complex and needs to be broken up into smaller queries. For instance, a long OR statement like ?(@value1 = -1 AND IndexedColumn2 = @value2) OR (@value1 > -1 AND IndexedColumn1 = @value1) can render the indexing of columns of no use.

  • Random Activity Profiling and Planned Activity Profiling

Random Activity Planning: When you first start out, it?s probably good to just profile the system as people are using it. This will measure random activity of your users. It may not be best to start profiling during your highest volume time, since you don?t want to overload the system, so pick an average load time at first. Eventually, though, it would be good to profile high activity times.

Planned Activity Profiling: During a less active time or in a development environment, it is a good idea to do some profiling on specific actions which may not randomly be performed by users in a particular window of profiling. They may, however, occur occasionally during your peak hours, and if they involve very expensive queries, they can really slow down everyone?s experience. So, profiling a period of time in which rare but possibly expensive queries are explicitly run can be very critical.

Good luck!

P.S. You may also find this post helpful that describes Highly Inefficient Linq Queries that Break Database Indexing

For more tips on managing a SQL Server Database, see Managing a SQL Server Database: Tips for Beginners

Highly Inefficient Linq Queries that Break Database Indexing

Thursday, February 23rd, 2012

I have really appreciated using Linq for the past few years to query databases. The old access methods were far more tedious and time consuming. Linq is really great.

Yet, I recently spent some time analyzing our database usage with SQL Server Profiler. Some of the queries showed an alarming number of reads. My first assumption was that adding a simple index could solve the issue. However, the Database Engine Tuning Adviser offered no help.

I then began to really look at the query that had been generated, and that’s when I noticed the real problem.

Original Linq Statement (Pre-compiled query)


private static Func<ModelDataContext, GetAnswerParams, IOrderedQueryable<Answer>>
    GetAnswersListQuery = System.Data.Linq.CompiledQuery.Compile(
          (ModelDataContext DataContext, GetAnswerParams ps) =>

                (from a in DataContext.Answers
                 where a.UserID == ps.UserID
                     &amp;&amp; (ps.PublishedFormID == -1 || a.PublishedFormID == ps.PublishedFormID)
                 select a)
                 );
}

Take note of this particular part of the query:

    && (ps.PublishedFormID == -1 || a.PublishedFormID == ps.PublishedFormID)

Basically, an answer may or may not be associated with a particular Published Form. There is an index on the User ID and Published Form ID columns of the Answers table. However, combining a comparison of the input value (ps.PublishedFormID) to the vaule of -1 along with a comparison on the column in the database prevents SQL Server from using that index at all. Thus, it is relegated to only one index.

In this case, if the input Published Form ID is -1, then that will also be what was saved for the Published Form ID of the Answer table as well, so this comparison needlessly breaks the index possibilities. This is not always the case, of course.

Changing the query slightly can solve the issue:


private static Func<ModelDataContext, GetAnswerParams, IOrderedQueryable<Answer>>
    GetAnswersListQuery = System.Data.Linq.CompiledQuery.Compile(
          (ModelDataContext DataContext, GetAnswerParams ps) =>

                (from a in DataContext.Answers
                 where a.UserID == ps.UserID
                     &amp;&amp; (a.PublishedFormID == -1 || a.PublishedFormID == ps.PublishedFormID)
                 select a)
                 );
}

This small change allows the Published Form ID index on the Answer table to be used, even though it may be -1. In other cases like this, it may be necessary to simply break the query into multiple queries. Take this as an example:

    && ((ps.PublishedFormID == -1 && a.ParentID = ps.ParentID)
|| (ps.PublishedFormID > 0 && a.PublishedFormID == ps.PublishedFormID)

This also breaks the indexing, of course. In this case, it may be actually better to have two distinct queries, one of which is run if the passed in Published Form ID is -1 and one of which is run if the Published Form ID is not -1.

Realized Improvements

For one particular page load that called this query multiple times, I was catching any query that had more than 1,000 reads. My results showed that the new way resulted in 0.7% as many queries going over the threshold, utilizing 0.8% of the CPU and executing 0.6% of the reads. This was a pretty intense page, but that made the optimization all the more important. Here are the raw stats:

Approach       Total Queries Run       Total CPU Used       Total Reads      
Old 687 33,994 3,478,705
New 5 281 21,907

So, Linq is awesome, and writing pre-compiled queries is great, but this doesn’t automatically create the most optimized SQL possible. To achieve that, careful attention must be given to the way the Linq query is written, and then you just have to test the outcomes to know for sure.

Automatic Update of Datamodel In Linq

Friday, August 6th, 2010

One of the issues that we came across fairly quickly when converting some projects to Linq was how unusable the default Linq to SQL utility was. The interface worked extremely well for 2 or 3 tables, but quickly became unmanageable with much more than that.

We began looking for other solutions, and discovered a command line utility called SQLMetal that can be used to generate a DataModel from a connection string and/or an XML file.

The solution we settled on uses SQLMetal to generate XML markup from the database, then uses XSL Transformations to make desired property/model changes (Private properties, Delay loaded, etc), and then uses SQLMetal to generate a code file from this XML file.

To start, we created a batch file called updateModel.bat and placed it in the project:


"C:Program FilesMicrosoft SDKsWindowsv6.0AbinSQLMetal.exe" /conn:"CONNECTIONSTRING" 
      /timeout:0 /namespace:MODELNAMESPACE /context:DATACONTEXTNAME /language:csharp /pluralize 
      /dbml:"%~dp0DataModel.dbml"
"%~msxsl.exe" "%~dp0DataModel.dbml" "%~dp0ModifyDbml.xslt" -o "%~dp0DataModel.dbml"
"C:Program FilesMicrosoft SDKsWindowsv6.0AbinSQLMetal.exe" /code:"%~dp0DataModel.cs"
    /namespace:MODELNAMESPACE /context:DATACONTEXTNAME /language:csharp 
    /pluralize %DataModel.dbml

The output of SQLMetal in the first line of this file is an XML file called DataModel.dbml that looks something like this :


  <Table Name="dbo.Person" Member="Persons">
    <Type Name="Person">
      <Column Name="PersonID" Type="System.Int32" DbType="Int NOT NULL IDENTITY" IsPrimaryKey="true" IsDbGenerated="true" CanBeNull="false"></Column>
      <Column Name="AddressID" Type="System.Int32" DbType="Int NOT NULL" CanBeNull="false"></Column>
      <Column Name="Email" Type="System.String" DbType="VarChar(255) NOT NULL" CanBeNull="false"></Column
      <Column Name="NameFirst" Type="System.String" DbType="VarChar(255) NOT NULL" CanBeNull="false"></Column>
      <Column Name="NameLast" Type="System.String" DbType="VarChar(255) NOT NULL" CanBeNull="false"></Column>
<Association Name="FK_Person_Address" Member="Address" ThisKey="AddressID" OtherKey="AddressID" Type="Address" IsForeignKey="true"></Association>
    </Type>
  </Table>

The second line of this script uses a utility called msxsl.exe (note that this requires MSXML). This program uses a file called ModifyDbml.xslt to perform an XSL tranformation on the DataModel.dbml file.


<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
								xmlns:dbml="http://schemas.microsoft.com/linqtosql/dbml/2007"
								xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
								xmlns:msxsl="urn:schemas-microsoft-com:xslt"
								exclude-result-prefixes="msxsl">
  <xsl:output method="xml" indent="yes"/>

  <xsl:template match="dbml:Database/dbml:Table/dbml:Type[@Name = 'Person']/dbml:Column[@Name = 'AddressID']">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()" />
       <xsl:attribute name="AccessModifier">Private</xsl:attribute>
    </xsl:copy>
  </xsl:template>

<xsl:template match="@* | node()">
     <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>
</xsl:stylesheet>


This will make it so the “AddressID” property of “Person” is a private property – and this is where all such DataModel changes should be stored. Note that any changes made directly to the DataModel after this point will be lost each time the files are generated.

The final line of this script generates a DataModel.cs file from the updated XML file.

Finally, we looked for a way to call this script to update from within visual studio. To do this, we went to Tools -> External Tools -> Add, and used the following arguments:

This allows you to highlight the updateModel.bat from Visual Studio and go to “Tools->CMD Prompt” to update your DataModel.

Using the Web.Config connection string with LINQ to SQL

Friday, June 11th, 2010

When updating a project to use LINQ to SQL, I found an issue with deploying to multiple environments. ?Each environment (development, staging, live) had its’ own database associated with this. ?Since I had the .dbml in another assembly, it was only reading from the app.config in the assembly it resided in. ?I was storing the database connection string in the web.config of the project so I thought it would be nice to just use that instead of the app.config.

The first thing I needed to do was to keep the .dbml file from reading from the app.config. ?After opening up the .dbml file, I opened the properties window for the file. ?In the properties window, there is a setting for “Connection”. ?In the “Connection” dropdown I selected the “(None)” selection. ?That keeps the .dbml file from accessing the app.config for the database connection string.

??

The "Connection" setting in the .dbml Properties

Now I needed to get my MainDataContext to use the Web.Config connection string. ?For this I created a partial class for my MainDataContext and created a constructor that passed the connection string from the Web.Config.


public partial class MainDataContext
{
    public MainDataContext()
        : base(System.Configuration.ConfigurationManager.ConnectionStrings["Database.connection.string.from.web.config"].ToString(), mappingSource)
    {
        OnCreated();
    }
}

Now when I deploy to different environments the .dbml file is accessing the correct database instead of the same one from the app.config.