Foliotek Developer Blog

Page Level Web Application Database Optimization

Do you know what queries are required to run for a single page load in your application? If not, perhaps you should.

That is a great question I wish I would have asked myself a long time ago, but didn’t know to ask it.

Over the past several years, I have enjoyed the challenge of database query optimization. I work for a small company, and no one else seems to have much of an interest in this sort of thing, so that means I get to do it. It is certainly interesting to see how a small rewrite of a query or a new index can achieve significant results. However, I recently began looking at optimization from a different perspective–that of the individual page.

How many queries are required to load a single page? How many should it take?

To identify expensive queries that need optimization, I ran a SQL Profiler session that would catch queries with Reads greater than 1000, 5000 or maybe 10000. Or maybe more than 500 CPU. However, one day I decided to see all the queries that ran on a given page load, and I set the Read threshold all the way down to 1. I also set up the web application running on my machine to point to a test database and restricted the Profiler session accordingly.

From SQL Server Management Studio:

  • Tools > SQL Server Profiler
  • Enter Database credentials, then click Connect
  • Event Selection tab
  • Check “Show all events” and “Show all columns”
  • Column Filters - DatabaseName: TestDBName
  • HostName: ComputerName (ws###)
  • Reads: 1
  • Run

After clicking “run” and loading the first page, I was literally stunned.

One page was running far more queries than I had ever imagined. But why? I started looking through the results. The column “TextData” shows the first 100 or so characters, and then the window on the bottom shows the full query. Suffice it to say that there were many opportunities for optimization.

What I Learned

Repetitive Queries

The first thing I noticed was that on many of the page loads, there were many redundant queries. The page would query the database for the User or Company multiple times, sometimes one right after the other. None of this was intended. The impact of certain ways of doing things, however, was never realized in terms of the load on the database.

One of the biggest things to optimize are things like side menus, top menus, or footers–things that are displayed on every page. Any inefficiencies in these will be hit every time anyone loads any page. That also means, though, that any increased inefficiency can have a huge impact. If 100 pages are hit every second, and there are 20 queries that can be eliminated from each page load, you have just eliminated 1000 queries a second.

Many of these queries are small, to be sure. It takes very little if any CPU and only a few reads to retrieve a User from the User table. However, the cumulative increased traffic can be significant.

In many cases, I realized that I could just pass the user to a method that accessed a particular property on the row instead of getting the user inside the method itself. Or I could just pass in the property itself. Alternatively, I could create a page-level (or control-level [.ascx]) variable for User or Company or other tables used in many places.

private DataAccess.User _currentUser;  
public DataAccess.UserCurrentOrganization  
{  
    get  
    {  
        if (_currentUser== null)  
            _currentUser= DataAccess.User.Get(LoggedInUserID);  
        return _currentUser;  
    }
}

Unused Queries

It took some time, but I matched every query to the line of code that generated it. I found a few places where a query was called, but then the result was never used. This, of course, is bound to happen in a large application as modifications are introduced. But again, removing these for every page load can certainly help.

Unnecessary Queries

I also encountered a few places where unnecessary queries were being run. For instance, upon login, the system should check the expiration date of the user’s registration. That was the intent of the code. However, at some point, the code was moved and ended up getting run on every page load. So, the code was necessary on login, but not on every page. I never would have known it ran on every page had I not seen it for myself in the SQL Profiler session.

Final Thoughts

It is, indeed, rewarding to identify a very expensive query and optimize it down to something very small. Yet, I’ve learned that there are not only gigantic weeds. Sometimes there are millions of smaller weeds. It is better to get rid of both.


Web Application Functional Regression Testing Using Selenium

At Foliotek, we use a rapid development methodology.? Typically, a new item will go from definition through coding to release in a month’s time (bucketed along with other new items for the month).? A bugfix will nearly always be released within a week of the time it was reported.? In fact, we are currently experimenting with a methodology that will allow us to test and deploy new items individually as well – which means that a new (small) item can go from definition to release in as little as a week, too.

Overall, this kind of workflow is great for us, and great for our customers.? We don’t need to wait a year to change something to make our product more compelling, and customers don’t have to wait a year to get something they want implemented.? We also avoid the shock of suddenly introducing a year’s worth of development to all our customers all at once – a handful of minor changes every month (or week) is much easier to cope with.

However, it also means that Foliotek is never exactly the same as it was the week before.? Every time something changes, there is some risk that something breaks.?? We handle this risk in two ways:

  1. We test extremely thoroughly
  2. We fix any problems that arise within about a week (severe problems usually the same day)

At first, we did all testing manually.? This is the best way to test, assuming you have enough good testers with enough time to do it well.? Good testers can’t be just anyone – they have to have a thorough knowledge of how the system should work,they have to care that it does work perfectly,and they have to have a feel for how they might try to break things.? Having enough people like this with enough time to do testing is expensive.

Over time two related things happened.? One was that we added more developers to the project, and started building more faster.? Two was that the system was growing bigger and more complex.

As more people developed on it and the system grew more complex, our testing needs grew exponentially.? The rise in complexity and people developing led to much, much more potential for side-effects – problems where one change affects a different (but subtly related) subsystem.? Side-effects by their nature are impossible to predict.? The only way to catch them was to test EVERYTHING any time ANYTHING changed.

We didn’t have enough experienced testers to do that every month (new development release) let alone every week (bugfix release).

To deal with that, we started by writing a manual regression test script to run through each week.? While this didn’t free up any time overall – it did mean that once the test was written well, anyone could execute it.? This was doable, because we had interns who had to be around to help handle support calls anyways – and they were only intermittently busy.? In their free time they could execute the tests.

Another route we could have gone would have been to write automated unit tests (http://en.wikipedia.org/wiki/Unit_testing).? Basically, these are tiny contracts the developers would write that say something like “calling the Add function on the User class with name Luke will result in the User database table having a new row with name Luke”.? Each time the project is built, the contracts are verified.? This is great for projects like code libraries and APIs where the product of the project IS the result of each function.? For a web application, though, the product is the complex interaction of functions and how they produce an on screen behavior.? There are lots of ways that the individual functions could all be correct and the behavior still fails.? It is also very difficult to impossible to test client-side parts of a web application – javascript, AJAX, CSS, etc.? Unit testing would cost a non trivial amount (building and maintaining the tests) for a trivial gain.

Eventually, we discovered the Selenium project (http://seleniumhq.org/download/).? The idea of Selenium is basically to take our manual regression test scripts, and create them such that a computer can automatically run the tests in a browser (pretty much) just like a human tester would.? This allows us to greatly expand our regression test coverage, and run it for every single change we make and release.

Here are the Selenium tools we use and what we use them for:

  • Selenium IDE (http://release.seleniumhq.org/selenium-ide/) : A Firefox plugin that lets you quickly create tests using a ‘record’ function that builds it out of your clicks, lets you manually edit to make your tests more complex, and runs them in Firefox.
  • Selenium RC (http://selenium.googlecode.com/files/selenium-remote-control-1.0.3.zip):? A java application that will take the tests you create with Selenium IDE, and run them in multiple browsers (firefox, ie, chrome, etc).? It runs from the command line, so its fairly easy to automate test runs into build actions/etc as well.
  • Sauce RC (http://saucelabs.com/downloads): A fork of RC that adds a web ui on top of the command line interface.? It’s useful for quickly debugging tests that don’t execute properly in non-firefox browsers.? It also integrates with SauceLabs – a service that lets you run your tests in the cloud on multiple operating systems and browsers (for a fee).
  • BrowserMob (http://browsermob.com/performance-testing): An online service that will take your selenium scripts and use them to generate real user traffic on your site.? Essentially, it spawns off as many real machines and instances of FireFox at once to run your test – each just as you would do locally – for a fee.? It costs less than $10 to test up to 25 “real browser users” – which actually can map to many more users than that since the automated test doesn’t have to think between clicks.? It gets expensive quickly to test more users than that.

Selenium is a huge boon for us.? We took the manual tests that would occupy a tester for as much as a day, and made it possible to run those same tests with minimal interaction in a half hour or less.? We’ll be able to cover more test cases, and run it more – even running them as development occurs to catch issues earlier.

In my next post, I’ll talk about the details of how you build tests, run them, maintain them, etc. with the tools mentioned above. See it here: Selenium Tips and Tricks

Also, for Selenium 2 with ASP.NET Web Forms, see Simplifying C# Selenium 2 Tests for ASP.NET WebForms