Foliotek Developer Blog

Managing a SQL Server Database: Tips for Beginners

The company I work for is a web development company that uses SQL Server for its databases. The company is not quite big enough to have a DBA, so each of the developers takes a varying amount of responsibility to manage the databases. Over time, I have gravitated perhaps more than the others toward this responsibility and have learned several things that have really helped to maintain them well. If you are in the same position, I hope these things can be a benefit to you as well.

1. Index Fragmentation

Over time, indexes inevitably get fragmented and need to be rebuilt. A table with GUID’s that are not sequential will fragment really quickly if there are a good amount of inserts regularly. So, how do you know if you need to rebuilding your indexes? Fortunately, you can run a simple query to get this information. It should be noted that you need not worry about fragmentation on indexes that are very small. The following query gets the fragmentation statistics of indexes with a (memory) page count greater than 50 and a fragmentation greater than 25 percent. (Both of these variables may be adjusted.)

SELECT sys.objects.name AS [Table Name],  
 sys.indexes.name AS [Index Name],  
 partition_number AS [Partition],  
 CAST(avg_fragmentation_in_percent AS DECIMAL (4,2)) AS [Fragmentation Percentage],  
 CONVERT(decimal(18,2), page_count * 8 / 1024.0) AS [Total Index Size (MB)],  
 page_count AS [Page Count]  
 FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, 'LIMITED') IndexStats  
 INNER JOIN sys.objects  
 ON sys.objects.object_id = IndexStats.object_id  
 INNER JOIN SYS.indexes  
 ON SYS.indexes.object_id = SYS.OBJECTS.object_id  
 AND IndexStats.index_id = sys.indexes.index_id  
 WHERE avg_fragmentation_in_percent > 25 AND Indexes.index_id > 0  
 AND page_count > 50

So, how did it look? If you have index fragmentation percentages over 90 for indexes that use a lot of pages, perhaps more than 500, then you probably need to set up a SQL Agent job to run weekly to rebuild the indexes. For more information on how to set this up, see SQL Server Optimization Script with Index Rebuilding to Reduce Fragmentation

2. Identifying and Optimizing Expensive Queries

If your database has never been analyzed or profiled, there may be some queries that are really inefficient and may be impacting your overall site performance. Oftentimes, when new tables are created, indexes are added that are thought to be necessary. However, only real world experiences will reveal if an additional index is needed or if a query needs to be rewritten to leverage indexes that are already built. To identify expensive queries, use the SQL Server Profiler tool from the Tools menu in SQL Server Management Studio.

For starters, I would suggest running it for five minutes, setting a threshold to ignore queries with less than 25 reads. Make sure you write the results to a database table. That way you can run queries on them and perform some careful analysis. Oftentimes, adding the right indexes or restructuring a few queries can drastically reduce the load on your database.

For more details on how to do this, see SQL Server Database Optimization ? a Beginner?s Guide.

3. Running Reports to Identify Largest Tables and Indexes

Do you know what your largest tables are? Do you know which table requires the most space for its indexes? Well, it would be really good to know both of those things, and it’s pretty easy to find that information. Right-click on your database, scroll down to “Reports”, then “Standard Report”, and then “Disk Usage by Top Tables”. This will provide you with some great information. It comes initially sorted by the amount of reserved space used per table. This is a combination of the space for the data and the indexes combined. You also can see the number of records in each table and the amount of unused space.

Knowing which of your tables is the largest and which table requires the most space for its indexes can help direct your efforts in analyzing how your database. Sometimes you can find an index that is really large but is unnecessary. I found a column in a table that was very large, but was no longer being used or referenced anywhere in our system. I was able to simply set all the values of the column to blanks and recoup quite a bit of space, and, again, this was for data that we were not using at all anymore. This will often highly log tables that contain data that is no longer necessary as well. See later in this post on removing such tables.

4. Cleaning up log tables

Oftentimes a database will have tables for errors, actions, emails, and various other things like that. However, it is rarely necessary to store all of this data in the database forever. An entry in the error log may be of interest for a month or two, but after that amount of time has passed, it has either been resolved or was not important enough to deal with. Furthermore, as the database changes, it becomes more and more difficult to really recreate what caused the problem.

If you have a good database backup plan–keeping copies of daily, weekly, monthly, and yearly backups, then you can probably remove old rows. If you keep every yearly backup, then should be fine to remove all rows older than one year. The January 1st backup will contain all of the errors for the previous year, so removing older rows only removes them from the current version of the database. You can still get back to older rows by restoring annual backups if your really need to do so.

For other tables, such as action logs, it may be necessary to be able to go back as much as a year, but if there is no business reason for being able to go back farther than that, you can probably remove rows older than perhaps two months. Your monthly backups will have a record that you can retrieve if necessary.

These types of tables can grow to be fairly large over time. If the tables are indexed, then your index rebuilding jobs will have to deal with them and basic inserts will become slower as well. Consequently, taking a careful look at tables like these and then scheduling a monthly SQL Agent job to remove old rows can really serve to keep your database lean and efficient.

5. Identifying and Removing Unused Indexes

There is probably a good chance that you have some indexes in your database that are not being used. These unused indexes can sometimes be rather large. This results in increased database size and backup storage, decreased insert time, and increased time to rebuild the indexes when they get too fragmented. So, it’s probably good to run a query to see if you have any unused indexes that are medium to large. You probably don’t need to worry about small indexes that are not currently being used.

There are actually two queries that you can run. There is an index usage table that you can reference and see when certain values are set to zero. Those indexes are not being used. However, there are also some indexes that, for some reason, never seem to even show up in that table. So, you have to run two different queries:

Indexes Not In Index Usage Table

-- GET UNUSED INDEXES THAT DO **NOT** APPEAR IN THE INDEX USAGE STATS TABLE  
 DECLARE @dbid INT  
 SELECT @dbid = DB_ID(DB_NAME())

SELECT Databases.Name AS [Database],  
 object_name(Indexes.object_id) AS [Table],  
 Indexes.NAME AS [Index],  
 Indexes.INDEX_ID,  
 PhysicalStats.page_count as [Page Count],  
 CONVERT(decimal(18,2), PhysicalStats.page_count * 8 / 1024.0) AS [Total Size (MB)],  
 CONVERT(decimal(18,2), PhysicalStats.avg_fragmentation_in_percent) AS [Frag. %]  
 FROM SYS.INDEXES Indexes  
 INNER JOIN SYS.OBJECTS Objects ON Indexes.OBJECT_ID = Objects.OBJECT_ID  
 LEFT JOIN sys.dm_db_index_physical_stats(@dbid, null, null, null, null) PhysicalStats  
 ON PhysicalStats.object_id = Indexes.object_id  
 AND PhysicalStats.index_id = indexes.index_id  
 INNER JOIN sys.databases Databases  
 ON Databases.database_id = PhysicalStats.database_id  
 WHERE Objects.type = 'U' -- User Table  
 AND Indexes.is_primary_key = 0  
 AND Indexes.type = 2 -- Nonclustered indexes  
 AND Indexes.INDEX_ID NOT IN (  
 SELECT UsageStats.INDEX_ID  
 FROM SYS.DM_DB_INDEX_USAGE_STATS UsageStats  
 WHERE UsageStats.OBJECT_ID = Indexes.OBJECT_ID  
 AND Indexes.INDEX_ID = UsageStats.INDEX_ID  
 AND DATABASE_ID = @dbid)

Unused Indexes In the Index Usage Table

-- GET UNUSED INDEXES THAT APPEAR IN THE INDEX USAGE STATS TABLE  
 DECLARE @MinimumPageCount int  
 SET @MinimumPageCount = 500

SELECT Databases.name AS [Database],  
 object_name(Indexes.object_id) AS [Table],  
 Indexes.name AS [Index],  
 PhysicalStats.page_count as [Page Count],  
 CONVERT(decimal(18,2), PhysicalStats.page_count * 8 / 1024.0) AS [Total Size (MB)],  
 CONVERT(decimal(18,2), PhysicalStats.avg_fragmentation_in_percent) AS [Frag. (%)],  
 ParititionStats.row_count AS [Row Count],  
 CONVERT(decimal(18,2),  
 (PhysicalStats.page_count * 8.0 * 1024) / ParititionStats.row_count)  
 AS [Index Size/Row (Bytes)],  
 UsageStats.last_user_scan,  
 UsageStats.last_user_seek  
 FROM sys.dm_db_index_usage_stats UsageStats  
 INNER JOIN sys.indexes Indexes  
 ON Indexes.index_id = UsageStats.index_id  
 AND Indexes.object_id = UsageStats.object_id  
 INNER JOIN SYS.databases Databases  
 ON Databases.database_id = UsageStats.database_id  
 INNER JOIN sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL)  
 AS PhysicalStats  
 ON PhysicalStats.index_id = UsageStats.Index_id  
 AND PhysicalStats.object_id = UsageStats.object_id  
 INNER JOIN SYS.dm_db_partition_stats ParititionStats  
 ON ParititionStats.index_id = UsageStats.index_id  
 AND ParititionStats.object_id = UsageStats.object_id  
 WHERE ((UsageStats.user_scans = 0 AND UsageStats.user_seeks = 0)  
 OR (UsageStats.last_user_scan AND (UsageStats.last_user_seek -- ignore indexes with less than a certain number of pages of memory  
 AND PhysicalStats.page_count > @MinimumPageCount  
 -- Exclude primary keys, which should not be removed  
 AND Indexes.is_primary_key = 0

For more information on this, see Identifying Unused Indexes in a SQL Server Database

6. Identifying and Removing Duplicate or Near Duplicate Indexes

In addition to removing unused indexes, you can also remove duplicate indexes or near-duplicate indexes. Obviously, a duplicate index is unnecessary, but sometimes a near-duplicate index can be removed without any adverse effects. I have found two indexes that used all of the same columns in the index itself and only differed in the columns that were “included” in the index (but not indexed). If there are two indexes that only differ by one included column and the index is large, then you may be able to remove the simpler index. Of course, it’s a good practice to save a script that can quickly regenerate the index if you find that it is actually necessary after all.

Ok, so here’s the query:

WITH IndexSummary AS  
 (

SELECT DISTINCT sys.objects.name AS [Table Name],  
 sys.indexes.name AS [Index Name],  
 SUBSTRING((SELECT ', ' + sys.columns.Name as [text()]  
 FROM sys.columns  
 INNER JOIN sys.index_columns  
 ON sys.index_columns.column_id = sys.columns.column_id  
 AND sys.index_columns.object_id = sys.columns.object_id  
 WHERE sys.index_columns.index_id = sys.indexes.index_id  
 AND sys.index_columns.object_id = sys.indexes.object_id  
 AND sys.index_columns.is_included_column = 0  
 0RDER BY sys.columns.name  
 FOR XML Path('')), 2, 10000) AS [Indexed Column Names],  
 ISNULL(SUBSTRING((SELECT ', ' + sys.columns.Name as [text()]  
 FROM sys.columns  
 INNER JOIN sys.index_columns  
 ON sys.index_columns.column_id = sys.columns.column_id  
 AND sys.index_columns.object_id = sys.columns.object_id  
 WHERE sys.index_columns.index_id = sys.indexes.index_id  
 AND sys.index_columns.object_id = sys.indexes.object_id  
 AND sys.index_columns.is_included_column = 1  
 0RDER BY sys.columns.name  
 FOR XML Path('')), 2, 10000), '') AS [Included Column Names],  
 sys.indexes.index_id, sys.indexes.object_id  
 FROM sys.indexes  
 INNER JOIN SYS.index_columns  
 ON sys.indexes.index_id = SYS.index_columns.index_id  
 AND sys.indexes.object_id = sys.index_columns.object_id  
 INNER JOIN sys.objects  
 ON sys.OBJECTS.object_id = SYS.indexES.object_id  
 WHERE sys.objects.type = 'U'  
 )

SELECT IndexSummary.[Table Name],  
 IndexSummary.[Index Name],  
 IndexSummary.[Indexed Column Names],  
 IndexSummary.[Included Column Names],  
 PhysicalStats.page_count as [Page Count],  
 CONVERT(decimal(18,2), PhysicalStats.page_count * 8 / 1024.0) AS [Size (MB)],  
 CONVERT(decimal(18,2), PhysicalStats.avg_fragmentation_in_percent) AS [Fragment %]  
 FROM IndexSummary  
 INNER JOIN sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL)  
 AS PhysicalStats  
 ON PhysicalStats.index_id = IndexSummary.index_id  
 AND PhysicalStats.object_id = IndexSummary.object_id  
 WHERE (SELECT COUNT(*) as Computed  
 FROM IndexSummary Summary2  
 WHERE Summary2.[Table Name] = IndexSummary.[Table Name]  
 AND Summary2.[Indexed Cols] = IndexSummary.[Indexed Cols]) > 1  
 0RDER BY [Table Name], [Index Name], [Indexed Column Names], [Included Column Names]

For more information on this, see Query to Identify Duplicate or Redundant Indexes in SQL Server

7. Checking Identity Columns for Overflow Risk

If your database is using integer identity columns, then the largest ID that can be stored is 2,147,483,647 (2.1 billion). Most tables never come close to overflowing, but it is definitely better to make sure than to not be sure and experience an overflow one day. So, here’s an easy query to quickly check:

SELECT sys.tables.name AS [Table Name],  
 last_value AS [Last Value],  
 CASE (MAX_LENGTH)  
 WHEN 1 THEN 'TINYINT'  
 WHEN 2 THEN 'SMALLINT'  
 WHEN 4 THEN 'INT'  
 WHEN 8 THEN 'BIGINT'  
 ELSE 'UNKNOWN'  
 END AS DataType,  
 CAST(cast(last_value as int) / 2147483647.0 * 100.0 AS DECIMAL(5,2))  
 AS [Percentage of ID's Used]  
 FROM sys.identity_columns  
 INNER JOIN sys.tables  
 ON sys.identity_columns.object_id = sys.tables.object_id

For more information, see Checking Integer Identity Columns in SQL Server for Overflow Risk

Conclusion

Alright, so there are some tips that I hope you will find helpful. Good luck!


An Obscure Performance Pitfall for Test Accounts and Improperly Indexed Database Tables

I recently came across a very problematic pitfall that test accounts could easily fall into and be completely unaware of the issue for quite some time. ?The pitfall has to do with improperly indexed tables, but test accounts that reference some of the first rows in the tables.

Test accounts for system testers are often some of the first accounts that are created in a system, and appear, therefore, very early in the user tables. ?So, when a test account logs in, the user table or other related table could be very poorly indexed, causing the database to revert to a table search (reading all rows) rather than using an appropriate index. ?However, since the test account is one of the first accounts, the performance is actually pretty good. ?The reads are very low as well, so this issue can go on undetected for quite awhile, even though real user experience could be pretty bad.

An inevitable complaint will come in. ?The system testers will test by logging in themselves, perhaps from home or coffee shops or at the slow connection at their parents’ house. ?Yet, they find that the system does pretty well–nothing like what the user called to complain about. ?It’s easy to chalk up the one user’s experience to a bad connection, a virus-filled computer, or the old PEBCAK issue, since the issue seems to be unrepeatable.

I came across this in our system recently after aggregating some SQL Server Profiler results. ?A particular table that had been used quite some time ago, but is no longer even available for newer users due to better options, was showing a substantial amount of reads per query. ?The system was checking that table to see if the logged in user had created anything using this old functionality. ?For a test user that had actually used it long ago, the system was able to find a match very quickly, even though I realized that the table had not been indexed on the User ID. ?Yet, since the rows for the test account were pretty early, the query did not take too long at all. ?However, for newer users, the system ended up reading the entire table every time to see if there were any matches, but never finding any since the functionality was no longer available to new users.

So, the Pitfall was that the test accounts were created early on and were not representative of the newest user. *In fact, the experience was quite a bit different. ?My specific test showed that an early test account could retrieve the appropriate row in *7 reads, while a user with no matches generated *6,465 *reads–every time. ?This isn’t a whole lot compared to other complex queries, but when it’s aggregated for every user that logs in, it actually can take a good deal of resources and is a completely unnecessary usage of the database.

Creating the index on User ID reduced the amount reads for a query with no matches down to just 8. ?Aggregated across a lot of users, this was a significant reduction in total reads.

So, this is a pretty obscure pitfall. ?After realizing what was going on, it seemed pretty obvious, but it certainly was not obvious before.

My recommendation, then, is to make sure new test accounts are created periodically, so that the test experience actually represents the experience of newer users to the system.


SQL Server Optimization Script with Index Rebuilding to Reduce Fragmentation

Overview

I recently completed the SQL Server 2008 Essential Training on Lynda.com and learned about the Standard Reports feature in SQL Server Management Studio. ?From looking at these reports, I observed that several of our indexes on various databases had become quite fragmented and were spread out over sometimes thousands of (memory) pages.

I searched for some information online and was directed to a query that would identify indexes that were 30% or more fragmented and rebuild them all. ?I noticed, though, that some indexes on very small tables weren’t seeing any improvement, so I added another parameter to filter out indexes with a small amount of pages.

The results were pretty significant. ?One database size was actually reduced by over 30%, since the fragmented indexes were taking up so much memory and were so inefficiently stored.

SQL Server Standard Reports

If you haven’t seen the SQL Standard Reports before, do the following:

  • Right click on your database in SQL Server Management Studio
  • Mouse over “Reports” and then “Standard Reports”
  • Select a report.

The two reports I found most useful were “Index Physical Statistics” *and *“Disk Usage by Top Tables”. ?On the “Index Physical Statistics” report, you will see recommendations for each index that will say 1) nothing, 2) reorganize, or 3) rebuild. ?The reorganize is apparently quicker and, from what I saw, more geared towards slightly fragmented indexes. ?The rebuild is for more highly fragmented ones. ?Of particular note on this report is the number of pages the indexes are taking up. When it numbers in the thousands with a high percentage of fragmentation, it really needs to be rebuilt and will probably yield some significant improvements in performance and memory requirements.

On the “Disk Usage by Top Tables” report you can see which tables in your database are taking up the most space, which as the most records, and how much memory is being allocated towards storage for indexes. ?**From what I could tell, the storage for the primary index seems to be included in the Data (KB) column rather than the Indexes (KB) column.

*NOTE: *Before you start rebuilding any indexes, I would recommend that you do the following for comparison purposes:

  • Save these reports (or export them), so you can tell how significantly the rebuild improved your database by reducing fragmentation and database size.
  • Write a fairly comp

Also, if your indexes are highly fragmented and you don’t know when they were last rebuilt, it may be better to begin with higher values for the fragmentation and page count, so you don’t make your database inaccessible for too long. ?You might start at 95% and 1000 pages and then just walk down bit by bit.

Index Rebuild Script

So, here’s the script I ran to identify and rebuild fragmented indexes with a high page count (you can adjust the fragmentation percentage and page count as desired). I definitely borrowed most of this from somewhere else, but now I can’t find where that was. My apologies for that.

 BEGIN TRY  
 BEGIN TRAN

– Ensure a USE statement has been executed first.

SET NOCOUNT ON;

DECLARE @objectid int;  
 DECLARE @indexid int;  
 DECLARE @partitioncount bigint;  
 DECLARE @schemaname nvarchar(130);  
 DECLARE @objectname nvarchar(130);  
 DECLARE @indexname nvarchar(130);  
 DECLARE @partitionnum bigint;  
 DECLARE @partitions bigint;  
 DECLARE @frag float;  
 DECLARE @pagecount int;  
 DECLARE @command nvarchar(4000);

DECLARE @page_count_minimum smallint  
 SET @page_count_minimum = 50

DECLARE @fragmentation_minimum float  
 SET @fragmentation_minimum = 30.0

– Conditionally select tables and indexes from the sys.dm_db_index_physical_stats function  
 – and convert object and index IDs to names.

SELECT object_id AS objectid,  
 index_id AS indexid,  
 partition_number AS partitionnum,  
 avg_fragmentation_in_percent AS frag,  
 page_count AS page_count  
 INTO #work_to_do  
 FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, ‘LIMITED’)  
 WHERE avg_fragmentation_in_percent > @fragmentation_minimum  
 AND index_id > 0  
 AND page_count > @page_count_minimum;

– Declare the cursor for the list of partitions to be processed.  
 DECLARE partitions CURSOR FOR SELECT * FROM #work_to_do;

– Open the cursor.  
 OPEN partitions;

– Loop through the partitions.  
 WHILE (1=1)  
 BEGIN;  
 FETCH NEXT  
 FROM partitions  
 INTO @objectid, @indexid, @partitionnum, @frag, @pagecount;

IF @@FETCH_STATUS

SELECT @objectname = QUOTENAME(o.name), @schemaname = QUOTENAME(s.name)  
 FROM sys.objects AS o  
 JOIN sys.schemas as s ON s.schema_id = o.schema_id  
 WHERE o.object_id = @objectid;

SELECT @indexname = QUOTENAME(name)  
 FROM sys.indexes  
 WHERE object_id = @objectid AND index_id = @indexid;

SELECT @partitioncount = count (*)  
 FROM sys.partitions  
 WHERE object_id = @objectid AND index_id = @indexid;

SET @command = N'ALTER INDEX ' + @indexname + N' ON ' + @schemaname + N'.' + @objectname + N' REBUILD';

IF @partitioncount > 1  
 SET @command = @command + N’ PARTITION=’ + CAST(@partitionnum AS nvarchar(10));

EXEC (@command);

PRINT N’Rebuilding index ‘ + @indexname + ‘ on table ‘ + @objectname;  
 PRINT N’ Fragmentation: ‘ + CAST(@frag AS varchar(15));  
 PRINT N’ Page Count: ‘ + CAST(@pagecount AS varchar(15));  
 PRINT N’ ‘;  
 END;

– Close and deallocate the cursor.  
 CLOSE partitions;  
 DEALLOCATE partitions;

– Drop the temporary table.  
 DROP TABLE #work_to_do;  
 COMMIT TRAN

END TRY  
 BEGIN CATCH  
 ROLLBACK TRAN  
 PRINT ‘ERROR ENCOUNTERED’  
 END CATCH

SQL Server Agent Job

In order to prevent my indexes from ever getting too fragmented again, I set up a job under SQL Server Agent that runs once a month. ?The frequency that is best for your database will depend on its size and how fast it grows.

Final Comparison

After you’ve set all this up, don’t forget to run the Standard Reports again and compare them with the reports you ran before you did anything. ?You can also rerun the query ?you wrote and see if you can detect any improvement.

OK. ?That’s it. ?Good luck.

For more tips os optimizing SQL Server database queries, see A Beginner’s Guide to SQL Server Database Optimization

Script to identify indexes that should be rebuilt.

P.S. To simply identify indexes that should be rebuilt, run just a portion of the script:

– This query identifies indexes on a database that need to be rebuilt.  
 –  
 – The recommendation is to rebuilt queries whose fragmentation exceeds 30.0%  
 – and who page count exceeds 50 pages. These parameters may be adjusted below.

DECLARE @page_count_minimum smallint  
 DECLARE @fragmentation_minimum float

SET @page_count_minimum = 50  
 SET @fragmentation_minimum = 30

SELECT  
 sys.objects.name AS Table_Name,  
 sys.indexes.name AS Index_Name,  
 avg_fragmentation_in_percent AS frag,  
 page_count AS page_count,  
 sys.dm_db_index_physical_stats.object_id AS objectid,  
 sys.dm_db_index_physical_stats.index_id AS indexid,  
 partition_number AS partitionnum  
 FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, ‘LIMITED’)  
 inner join sys.objects  
 on sys.objects.object_id = sys.dm_db_index_physical_stats.object_id  
 inner join sys.indexes  
 on sys.indexes.index_id = sys.dm_db_index_physical_stats.index_id  
 and sys.indexes.object_id = sys.dm_db_index_physical_stats.object_id  
 WHERE avg_fragmentation_in_percent > @fragmentation_minimum  
 AND sys.dm_db_index_physical_stats.index_id > 0  
 AND page_count > @page_count_minimum  
 ORDER BY page_count DESC

For more tips on managing a SQL Server Database, see Managing a SQL Server Database: Tips for Beginners