Foliotek Developer Blog

SQL Query Optimization for Reporting: a Different Approach

Synopsis–Not Just Faster Queries, but Fewer

Several months ago, we identified that some of the reports on our site were simply taking extremely too long to run (sometimes 30 minutes to an hour even). ?Users were getting frustrated, and it just made the system annoying to use. ?Consequently, we began investigating how we could best optimize the report queries. ?(We were using SQL Server with Linq.) ?What we eventually found was that what we needed was not just faster queries–but fewer.

The Old Approach

Here’s an example of how the reports were being generated:

  • Query the database for all the 1st Level Options associated with the selected Item
  • Bind this to a repeater.
  • For each repeater item/1st LevelOption (on data bind) - Query the database for all the different 2nd Level Options associated with the current 1st Level Option
  • Bind this to a repeater.
  • For each 2nd Level Option (on data bind) - Query the database to get all the 3rd Level options
  • Bind this to a repeater.
  • For each 3rd level option - Compute any necessary values for the report

Our initial approach was to identify which of the queries were taking the longest amount of time and optimize them as best we could (computed queries, adding indexes, etc.). ?While we did see marginal improvements, what we really needed was something that would give us drastic improvements–not just marginal ones.

The Problem

What we realized was that no matter how optimized the inner queries were, each of those queries was being run hundreds and even thousands of times, so there was a real limit to how much improvement we could ever gain in this way. ?In fact, some of the inner queries would be run, only to find that there was no data to be returned, making the time it took to run the query a complete waste.

The New Approach

The new approach we took was to get all of the data we would need for the report at the very beginning. ?So, we went from hundreds or thousands of queries (depending on the report data) to 3-5 queries that returned all the data we would need. ?Then in code, we just filtered these results when we needed them.

The Results

Admittedly, the first time I ran the report after implementing the new approach (and fixing the errors that arose), I thought something was wrong. ?What had been taking 30 minutes had just taken a few seconds and was giving the same data. ?Thinking something weird had happened, I reran the report from the start and found once again that* it completed in just a few seconds. *

The Conclusion

So, what we needed in this case was not just faster queries, but fewer queries. Before, we were running a few queries that returned very small amounts of data and doing this hundreds, thousands, and even tens of thousands times. ?Now, we run a few queries that return a lot of data only one time a piece. ?We continued to implement this strategy across all of our reports, and the results were quite significant in every case, making our system more responsive and easier to use.

*Note. *We wondered how the system would respond by collecting such a large amount of data on the front end, but we’ve experienced no problems in this regard. In fact, since the queries finish so quickly, the amount of memory required only takes up a few seconds anyway before the report is generated and the memory released.

An Example

[sourcecode lang="csharp"] protected void repListItemDataBound(object sender, System.Web.UI.WebControls.RepeaterItemEventArgs e) { if (e.Item.ItemType ListItemType.Item || e.Item.ItemType ListItemType.AlternatingItem) { CurrentSection = (dac.ScoringGuideSection)e.Item.DataItem; Repeater repEvaluators = (Repeater)e.Item.FindControl("repEvaluators"); //EvaluationScorers was run once and contains all scorers on any section in the report. repEvaluators.DataSource = EvaluationScorers; repEvaluators.DataBind(); Repeater repSubSections = (Repeater)e.Item.FindControl("repSubSections"); //ScoringGuideSections contains all sections in the report and is filtered here to get only the subsections of this articular main section. repSubSections.DataSource = ScoringGuideSections.Where(s => s.ParentID == CurrentSection.SectionID); repSubSections.DataBind(); } } [/sourcecode]


A Programmers Guide to Performance Tuning SQL Server Queries

First off – I am not a DBA. What follows is just some general tools and tips I’ve found to help write better queries, find queries that are performing poorly, and find some ideas on how you might improve performance.

Here’s the general procedure I use to find and fix the poorly performing “low hanging fruit” in an SQL Server driven database application.

  1. Do you know a query you want to improve? ?If so, skip to step 3.
  2. Use SQL Profiler to identify candidate queries 1. Fire up SQL Profiler. ?You can find it under ‘SQL ….->Performance Tools’ in your start menu, or you can open it in the ‘Tools” menu of SQL Management Studio.
  3. Start a new trace. ? Use the event selections tab to filter for queries you care about. 1. I usually check ‘Stored Procedures->RPC:Completed’ and ‘TSQL->SQL:BatchCompleted’
  4. I also use the column filters to limit to the ApplicationName/DatabaseName/LoginName/HostName I care about and the performance characteristics I’m looking for ?(a good place to start is >100 cpu, >50 duration or >5000 reads. ?Adjust as necessary to find the worst performing in each category)
  5. Run the trace until you see some queries you are interested in. ?Pause or stop it and select the query text from the detail window.
  6. Copy your identified query into SQL Management Studio. ?Right click to turn on ‘Execution Plan’ and ‘Client Statistics’ if they aren’t on already. ?Assuming you’ve selected a SELECT query that won’t change data, run it to get a feel for the amount of time you want to improve. ?Look at the execution plan and the client statistics to see what parts of the query cost most – you might identify ways to rewrite the query to improve performance. ? Common ways to improve are to shrink the amount of data you are returning by limiting columns returned, etc.
  7. If you can’t find a way to improve performance with a rewrite (or want to improve it even more) you can often improve performance by adding SQL Statistics or Indexes. ?You might be able to guess appropriate indexes based on the execution ?plan/client statistics – but it’s just as easy to let SQL identify candidates for you with SQL Tuning Advisor. 1. Highlight the query, right-click, and select Analyze Query in Database Engine Tuning Advisor
  8. Click the play button ‘start analysis’
  9. Look over the recommended statistics and indexes, especially the size column. ?Indexes are a size/speed tradeoff – you want to make sure you are getting enough benefit out of the extra index space. ?Statistics generally have a lower impact to both speed and storage space (they are stored in a blob in an internal table). ?NOTE: I’ve generally found the ‘Estimated improvement’ number useless. ?I’ve applied an estimated 2% improvement that realized a 2x speed benefit.
  10. Uncheck any you don’t want, and then click ‘Save Recommendations’ to create a .sql file you can apply to the server to create the indexes and statistics.

Converting Names to Mixed Case/Camel Case in SQL Server

I was recently given the task to convert names in a database to mixed case. ?Many of them were already in mixed case, so I wanted to leave those alone and just focus on the ones that were in all caps or all lowercase. ?No automated conversion could be perfect, but this got me pretty close. ?To view a post on determining mixed case in Sql Server, click here.

Here’s my strategy:

  • If the name is already in mixed case, trust that it’s right.
  • If the name is not in mixed case, then do the following: - Trim up the name to eliminate white space
  • Account for the names that start with “Mc” like “McDavid”
  • Account for names with apostrophes like O’Reilly
  • Account for hyphenated names (married names) “Anderson-Johnson”
  • Account for multiple word names like “La Russa”
  • Make sure suffixes included in the names field are capitalized appropriately

Here’s the code:

[sourcecode lang="sql"]
CREATE FUNCTION [dbo].[GetCamelCaseName]
(
@Name varchar(50)
)
RETURNS VARCHAR(50) WITH SCHEMABINDING
AS
BEGIN
– Declare the return variable here
DECLARE @NameCamelCase VARCHAR(50)

– This is determining whether or not the name is in camel case already (if the 1st character is uppercase
– and the third is lower (because the 2nd could be an apostrophe). To do this, you have to cast the
– character as varbinary and compare it with the upper case of the character cast as varbinary.

IF (CAST(SUBSTRING(@Name, 1,1) as varbinary) = CAST(SUBSTRING(UPPER(@Name), 1, 1) as varbinary)
AND ((CAST(SUBSTRING(@Name, 2,1) as varbinary) = CAST(SUBSTRING(LOWER(@Name), 2, 1) as varbinary)
AND SUBSTRING(@Name, 2,1) != ””)
or
(CAST(SUBSTRING(@Name, 4,1) as varbinary) = CAST(SUBSTRING(LOWER(@Name), 4, 1) as varbinary)
AND SUBSTRING(@Name, 2,1) = ””)))

BEGIN
SELECT @NameCamelCase = RTRIM(LTRIM(@Name))
SELECT @NameCamelCase = REPLACE(cast(@NameCamelCase as nvarchar(max)),cast( ‘ sr’ as nvarchar(max)),cast( ‘ Sr’ as nvarchar(max)))
SELECT @NameCamelCase = REPLACE(cast(@NameCamelCase as nvarchar(max)),cast( ‘ jr’ as nvarchar(max)),cast( ‘ Jr’ as nvarchar(max)))
SELECT @NameCamelCase = REPLACE(cast(@NameCamelCase as nvarchar(max)),cast( ‘ ii’ as nvarchar(max)),cast( ‘ II’ as nvarchar(max)))
SELECT @NameCamelCase = REPLACE(cast(@NameCamelCase as nvarchar(max)),cast( ‘ iii’ as nvarchar(max)),cast( ‘ III’ as nvarchar(max)))
SELECT @NameCamelCase = REPLACE(cast(@NameCamelCase as nvarchar(max)),cast( ‘ DE ‘ as nvarchar(max)),cast( ‘ de ‘ as nvarchar(max)))
SELECT @NameCamelCase = REPLACE(cast(@NameCamelCase as nvarchar(max)),cast( ‘macdonald’ as nvarchar(max)),cast( ‘MacDonald’ as nvarchar(max)))

if (@NameCamelCase LIKE ‘% iv’) — avoid changing “Ivan” to “IVan”
SELECT @NameCamelCase = REPLACE(cast(@NameCamelCase as nvarchar(max)),cast( ‘ iv’ as nvarchar(max)),cast( ‘ IV’ as nvarchar(max)))

if ((@NameCamelCase = ‘i’) or (@NameCamelCase = ‘ii’) or (@NameCamelCase = ‘iii’) or (@NameCamelCase = ‘iv’))
SELECT @NameCamelCase = UPPER(@NameCamelCase)

RETURN @NameCamelCase

END

ELSE

BEGIN

SELECT @NameCamelCase = RTRIM(LTRIM(@Name))

– “Mc-”
SELECT @NameCamelCase =
CASE
WHEN @Name LIKE ‘mc%’
THEN UPPER(SUBSTRING(@Name, 1, 1)) + LOWER(SUBSTRING(@Name, 2, 1)) + UPPER(SUBSTRING(@Name, 3, 1)) + LOWER(SUBSTRING(@Name, 4, 47))
ELSE
UPPER(SUBSTRING(@Name, 1, 1)) + LOWER(SUBSTRING(@Name, 2, 49))
END

– Apostrophes
SELECT @NameCamelCase =
CASE
WHEN @NameCamelCase LIKE ‘%”%’
THEN SUBSTRING(@NameCamelCase, 1, CHARINDEX(””, @NameCamelCase) – 1) + ”” + UPPER(SUBSTRING(@NameCamelCase, CHARINDEX(””, @NameCamelCase) + 1, 1)) + SUBSTRING(@NameCamelCase, CHARINDEX(””, @NameCamelCase) + 2, 50)
ELSE
@NameCamelCase
END

– Hyphenated names (do it twice to account for double hyphens)
SELECT @NameCamelCase =
CASE
WHEN @NameCamelCase LIKE ‘%-%’
THEN SUBSTRING(@NameCamelCase, 1, CHARINDEX(‘-’, @NameCamelCase) – 1) + ‘^’ + UPPER(SUBSTRING(@NameCamelCase, CHARINDEX(‘-’, @NameCamelCase) + 1, 1)) + SUBSTRING(@NameCamelCase, CHARINDEX(‘-’, @NameCamelCase) + 2, 50)
ELSE
@NameCamelCase
END

SELECT @NameCamelCase =
CASE
WHEN @NameCamelCase LIKE ‘%-%’
THEN SUBSTRING(@NameCamelCase, 1, CHARINDEX(‘-’, @NameCamelCase) – 1) + ‘^’ + UPPER(SUBSTRING(@NameCamelCase, CHARINDEX(‘-’, @NameCamelCase) + 1, 1)) + SUBSTRING(@NameCamelCase, CHARINDEX(‘-’, @NameCamelCase) + 2, 50)
ELSE
@NameCamelCase
END

SELECT @NameCamelCase = REPLACE(cast(@NameCamelCase as nvarchar(max)),cast( ‘^’ as nvarchar(max)),cast( ‘-’ as nvarchar(max)))

– Multiple word names (do it twice to account for three word names)
SELECT @NameCamelCase =
CASE
WHEN @NameCamelCase LIKE ‘% %’
THEN SUBSTRING(@NameCamelCase, 1, CHARINDEX(‘ ‘, @NameCamelCase) – 1) + ‘?’ + UPPER(SUBSTRING(@NameCamelCase, CHARINDEX(‘ ‘, @NameCamelCase) + 1, 1)) + SUBSTRING(@NameCamelCase, CHARINDEX(‘ ‘, @NameCamelCase) + 2, 50)
ELSE
@NameCamelCase
END

SELECT @NameCamelCase =
CASE
WHEN @NameCamelCase LIKE ‘% %’
THEN SUBSTRING(@NameCamelCase, 1, CHARINDEX(‘ ‘, @NameCamelCase) – 1) + ‘?’ + UPPER(SUBSTRING(@NameCamelCase, CHARINDEX(‘ ‘, @NameCamelCase) + 1, 1)) + SUBSTRING(@NameCamelCase, CHARINDEX(‘ ‘, @NameCamelCase) + 2, 50)
ELSE
@NameCamelCase
END

SELECT @NameCamelCase = REPLACE(cast(@NameCamelCase as nvarchar(max)),cast( ‘?’ as nvarchar(max)),cast( ‘ ‘ as nvarchar(max)))

– Names in Parentheses
SELECT @NameCamelCase =
CASE
WHEN @NameCamelCase LIKE ‘%(%’
THEN SUBSTRING(@NameCamelCase, 1, CHARINDEX(‘(‘, @NameCamelCase) – 1) + ‘(‘ + UPPER(SUBSTRING(@NameCamelCase, CHARINDEX(‘(‘, @NameCamelCase) + 1, 1)) + SUBSTRING(@NameCamelCase, CHARINDEX(‘(‘, @NameCamelCase) + 2, 50)
ELSE
@NameCamelCase
END

SELECT @NameCamelCase = REPLACE(cast(@NameCamelCase as nvarchar(max)),cast( ‘ sr’ as nvarchar(max)),cast( ‘ Sr’ as nvarchar(max)))
SELECT @NameCamelCase = REPLACE(cast(@NameCamelCase as nvarchar(max)),cast( ‘ jr’ as nvarchar(max)),cast( ‘ Jr’ as nvarchar(max)))
SELECT @NameCamelCase = REPLACE(cast(@NameCamelCase as nvarchar(max)),cast( ‘ ii’ as nvarchar(max)),cast( ‘ II’ as nvarchar(max)))
SELECT @NameCamelCase = REPLACE(cast(@NameCamelCase as nvarchar(max)),cast( ‘ iii’ as nvarchar(max)),cast( ‘ III’ as nvarchar(max)))
SELECT @NameCamelCase = REPLACE(cast(@NameCamelCase as nvarchar(max)),cast( ‘ DE ‘ as nvarchar(max)),cast( ‘ de ‘ as nvarchar(max)))
SELECT @NameCamelCase = REPLACE(cast(@NameCamelCase as nvarchar(max)),cast( ‘macdonald’ as nvarchar(max)),cast( ‘MacDonald’ as nvarchar(max)))

if (@NameCamelCase LIKE ‘% iv’)
SELECT @NameCamelCase = REPLACE(cast(@NameCamelCase as nvarchar(max)),cast( ‘ iv’ as nvarchar(max)),cast( ‘ IV’ as nvarchar(max)))

if ((@NameCamelCase = ‘i’) or (@NameCamelCase = ‘ii’) or (@NameCamelCase = ‘iii’) or (@NameCamelCase = ‘iv’))
SELECT @NameCamelCase = UPPER(@NameCamelCase)

– Return the result of the function
RETURN ISNULL(@NameCamelCase, ”)

END

RETURN ISNULL(@NameCamelCase, ”)

END

[/sourcecode]


SQL Select Multiple Columns into Comma-Separated List

I needed to select a field from multiple rows in one table into one column in another table. ?I wanted the one column to be a comma-separated list as well.

One way to do this would be to use cursors, but the following post showed an easier way using XML Path: ?http://bytes.com/topic/sql-server/answers/431513-getting-data-multiple-rows-into-one-column.

I ended up doing something like this:

[sourcecode lang="sql"]

SELECT ?State.Name,
(SELECT ?City.Name AS [text()]
FROM City
WHERE City.StateID = State.StateID ?&& City.Population > 100000
FOR XML Path(“)) AS LargeCities
FROM State

[/sourcecode]

Of course, this just gives me the list of city names all run together like this: ?”DenverBoulderPuebloColorado Springs”.

I wanted to separate these by commas so I changed the query to “Select City.Name + ‘, ‘ AS [text()], but this gives me an extra comma on the end like this: “Denver, Boulder, Pueblo, Colorado Springs, “.

Of course, I could have used SUBSTRING(“the inner select statement”, 0,LEN(“the inner select statement”) – 2) and been fine,but I didn’t want to rerun the query and it would be harder to ready anyway.

What I ended up doing was inverting the positioning of the “City.Name” and the “, “. ?This gave me the following: ?”, Denver, Boulder, Pueblo, Colorado Springs”.

Then, all I had to do with the SUBSTRING command was this: ?SUBSTRING(“the inner select statement”, 2, 10000). ?The “2″ as the starting position got me past the first comma, and there was no trailing comma to worry about, since each city was preceded by a comma rather than followed by a comma. ?So, the final query looks like this:

[sourcecode lang="sql"]

SELECT ?State.Name,
SUBSTRING((SELECT ‘, ‘ + ?City.Name AS [text()]
FROM City
WHERE City.StateID = State.StateID ?&& City.Population > 100000
FOR XML Path(“))), 2, 10000) AS LargeCities
FROM State

[/sourcecode]

and would give the following results:

* State Name ? ? ?LargeCities*
Colorado ? ? ? ? ? ? ? Denver, Boulder, Pueblo, Colorado Springs

This strategy could be applied to regular code as well when building a comma-separated list.

[sourcecode lang="csharp"]

string list = “”;
for (int i = 0; i {
list += “, ” + items[i].Name;
}
list = list..Substring(2);

[/sourcecode]


Unexpected benefits of Precompilation of LINQ

I once had a manager who told me – I can solve any maintenance problem by adding a layer of abstraction.? I can solve any performance problem by removing a layer of abstraction.

I think LINQ to SQL is a wonderful way to abstract the persistence layer elegant, easy to use, easy to manipulate, and easy to maintain lines of code.? Instead of writing SQL which amounts to “how to retrieve” the data – you manipulate an expression tree that gets closer to specifying “what data I want”.? The upside of this is huge – you can change the expression tree at any level of your code, and let .NET decide how to best write the SQL at the last possible moment – which effectively gits rid of dealing with intermediate results and inefficiently written SQL.? Unfortunately, this abstraction does indeed cause a performance hit – the translation/compilation of the tree to SQL – and it’s probably much bigger than you would think.? See http://peterkellner.net/2009/05/06/linq-to-sql-slow-performance-compilequery-critical/ to see what I mean.? In my analysis (using ANTS Profiler), when using uncompiled LINQ – the performance hit is usually about 80% compilation and only 20% retrieving the data!? Thankfully, .NET does allow you to precompile a LINQ query and save the compilation to use over and over again.

Your natural tendency when hearing those kind of numbers might be to precompile every single LINQ query you write.? There’s a big downside to doing that, though – you lose the ability to manipulate the compiled query in other parts of your code.? Another downside is that the precompilation code itself is fairly ugly and hard to read/maintain.

I’m a big believer in avoiding “premature optimization”.? What happens if you precompile everything, and in a version or two Microsoft resolves the issue and caches compilations for you behind the scenes?? You have written a ton of ugly code that breaks a major benefit of LINQ to SQL and is totally unnecessary.

Instead, I recommend you go after the low hanging fruit first – precompile the most frequently accessed queries in your application and the ones that gain no benefit from manipulating the expression tree.? In the applications I work on – there is a perfect case that fits both of these – the “get” method that returns the LINQ object representation of a single row in the database.? These are hit quite often – and there is absolutely no case where the expression tree is further refined.

The old way it was written:

[sourcecode lang="csharp"]
public static Item Get(int itemid) {
return (from i in DataContext.Items
where i.ItemID itemid
select i).First();
}

[/sourcecode]

The new way with Precompiled LINQ:

[sourcecode lang="csharp"]
private static Func
GetQuery = CompiledQuery.Compile(
(ModelDataContext DataContext,int itemid) =>

(from i in DataContext.Items
where i.ItemID itemid
select i).First()

);

public static Item Get(int itemid) {
return GetQuery.Invoke(DataContext,itemid);
}

[/sourcecode]

Applying this fairly simple change to the application, I’d estimate we got 80%+ of the benefits of compiled LINQ, at the expense of a few extra lines of code per object/table and absolutely no loss of the tree manipulation.,>