Foliotek Developer Blog

Query to Identify Duplicate or Redundant Indexes in SQL Server

I recently noticed that a particular table had two indexes that were nearly identical. In fact, the indexed columns were exactly the same. The only difference was in the included columns. This made me wonder if there were any other indexes that had duplicates or near duplicates, so I developed a query to get results from the entire database. The query returns the following columns:

  • Table Name
  • Index Name
  • Indexed Column Names
  • Included Column Names
  • Page Count (for the Index)
  • Total Index Size
  • Fragmentation (of the Index)

This made it really easy to see not only which indexes had duplicates or near duplicates, but also how large those indexes were. This allowed me to focus my time on the larger indexes, which have a bigger impact in not only storage, but also rebuilding and reorganization tasks.

So, here it is:

WITH IndexSummary AS  
 (

SELECT DISTINCT sys.objects.name AS [Table Name],  
 sys.indexes.name AS [Index Name],  
 SUBSTRING((SELECT ', ' + sys.columns.Name as [text()]  
 FROM sys.columns  
 INNER JOIN sys.index_columns  
 ON sys.index_columns.column_id = sys.columns.column_id  
 AND sys.index_columns.object_id = sys.columns.object_id  
 WHERE sys.index_columns.index_id = sys.indexes.index_id  
 AND sys.index_columns.object_id = sys.indexes.object_id  
 AND sys.index_columns.is_included_column = 0  
 ORDER BY sys.columns.name  
 FOR XML Path('')), 2, 10000) AS [Indexed Column Names],  
 ISNULL(SUBSTRING((SELECT ', ' + sys.columns.Name as [text()]  
 FROM sys.columns  
 INNER JOIN sys.index_columns  
 ON sys.index_columns.column_id = sys.columns.column_id  
 AND sys.index_columns.object_id = sys.columns.object_id  
 WHERE sys.index_columns.index_id = sys.indexes.index_id  
 AND sys.index_columns.object_id = sys.indexes.object_id  
 AND sys.index_columns.is_included_column = 1  
 ORDER BY sys.columns.name  
 FOR XML Path('')), 2, 10000), '') AS [Included Column Names],  
 sys.indexes.index_id, sys.indexes.object_id  
 FROM sys.indexes  
 INNER JOIN SYS.index_columns  
 ON sys.indexes.index_id = SYS.index_columns.index_id  
 AND sys.indexes.object_id = sys.index_columns.object_id  
 INNER JOIN sys.objects  
 ON sys.OBJECTS.object_id = SYS.indexES.object_id  
 WHERE sys.objects.type = 'U'  
 )

SELECT IndexSummary.[Table Name],  
 IndexSummary.[Index Name],  
 IndexSummary.[Indexed Column Names],  
 IndexSummary.[Included Column Names],  
 PhysicalStats.page_count as [Page Count],  
 CONVERT(decimal(18,2), PhysicalStats.page_count * 8 / 1024.0) AS [Size (MB)],  
 CONVERT(decimal(18,2), PhysicalStats.avg_fragmentation_in_percent) AS [Fragment %]  
 FROM IndexSummary  
 INNER JOIN sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL)  
 AS PhysicalStats  
 ON PhysicalStats.index_id = IndexSummary.index_id  
 AND PhysicalStats.object_id = IndexSummary.object_id  
 WHERE (SELECT COUNT(*) as Computed  
 FROM IndexSummary Summary2  
 WHERE Summary2.[Table Name] = IndexSummary.[Table Name]  
 AND Summary2.[Indexed Cols] = IndexSummary.[Indexed Cols]) > 1  
 ORDER BY [Table Name], [Index Name], [Indexed Column Names], [Included Column Names]  

Table Name Index Indexed Cols Included Cols Pages Size (MB) Frag % MyTable Indx1 Col1 Col2, Col3 123 0.96 8.94 MyTable Indx2 Col1 Col2, Col3 123 0.96 8.94

For more tips on managing a SQL Server Database, see Managing a SQL Server Database: Tips for Beginners


Checking Integer Identity Columns in SQL Server for Overflow Risk

A very common practice is to use a column of type integer for the primary key on a table with auto increment. This works great in most cases. However, there is a limitation to this that should be kept in mind. The largest value that an integer column can hold is 2,147,483,647 (2.1 billion). So, if you have a database that will contain the names of everyone living in Wyoming and Montana, you’ll be fine. However, if your database will contain the names of everyone living in China and India, you will definitely have an overflow issue.

So, how can you tell what your risk level is? You could go through each table and select the max ID field, but that would take too long unless your database is very small. Yes, I did start doing it this way myself, but quickly realized this was not going to be very fun. So, I came up with this query that provides the Table Name, the Last Value used in the identity column, the Data Type of the column, and the percentage of valid integers that have been used (for the integer data type).

Running this on my database, I was able to see that we do have one table that has an identity column that has reached 50 million. This is only about 2.3% of the way to the largest integer, so we have some time to deal with it. Still, it is nice to know that we are safe for a while.

 SELECT sys.tables.name AS [Table Name],  
 last_value AS [Last Value],  
 CASE (MAX_LENGTH)  
 WHEN 1 THEN 'TINYINT'  
 WHEN 2 THEN 'SMALLINT'  
 WHEN 4 THEN 'INT'  
 WHEN 8 THEN 'BIGINT'  
 ELSE 'UNKNOWN'  
 END AS DataType,  
 CAST(cast(last_value as int) / 2147483647.0 * 100.0 AS DECIMAL(5,2))  
 AS [Percentage of ID's Used]  
 FROM sys.identity_columns  
 INNER JOIN sys.tables  
 ON sys.identity_columns.object_id = sys.tables.object_id  
 ORDER BY last_value DESC  

For more tips on managing a SQL Server Database, see Managing a SQL Server Database: Tips for Beginners


Index Recommendations from the Database Engine Tuning Adviser should not be accepted blindly

The Problem

I ran a SQL Profiler session on one of our SQL databases toda to catch any expensive queries and see if they could be optimized. One query that showed up was one that selected the top 30 rows from one table where a certain bit column was set to true, sorted by oldest first. The table being queried had close to 100,000 rows in it. Each query took about 6,000 reads.

The query looked something like this:

 SELECT TOP (30) *  
 FROM TheTable  
 WHERE BitColumn = 1  
 ORDER BY DateColumn  

The Suggested Solution

My inclination was that all I needed to do was add a very small index on the column of type BIT being used in the WHERE clause. However, I went ahead and ran the query through the Database Engine Tuning Adviser, just to see what it came up with.

The index recommendation suggested an index on three columns, one of which was, in fact, the one in the WHERE clause. The Date column was also included along with the table’s Primary ID. The estimated improvement was 99%, which was great and what I had expected.

However, the suggested index also had every other table column in its INCLUDE statement. The result was that the new index was projected to take up 68.5 MB of space. That’s quite a bit of space for an index on a bit column.

The Employed Solution

Consequently, I manually went in to SQL Server Management Studio and just added a simple index on the column used in the WHERE clause. Then I checked the size of the new index and found that it was only 1.75 MB–quite significantly less than 68.5 MB.

The query now requires only 14 reads, compared with the nearly 6,000 it used before, and I haven’t added 68 MB of index storage to the database.

Final Remark

So, there are certainly times where the recommendations from the Database Tuning Adviser would probably be fine. Each case is different, of course. In this case, adding 68 MB of index storage just was not worth it. Adding a simple index manually gave the optimization we needed without really impacting the size of the database. Of course, 68 MB is not really a lot, but adding that 10 or 20 or 50 times when it is not necessary would certainly begin to unnecessarily impact it.

So, the Database Tuning Adviser is definitely helpful, but should probably be used more as a guide rather than as something not to be questioned.


Identifying Unused Indexes in a SQL Server Database

One of our databases began timing out during our weekly index rebuild job, so I began to look into what indexes were getting rebuilt. After looking at the Index Usage Report, I noticed that some of them were not even being used. Some of them were very large indexes as well.

Obviously, an index that is never used doesn’t really need to be rebuilt to improve performance, so I removed a few of them. Then I began to look more closely at how I could identify other indexes that were not being used, aside from looking through every index on the SQL Report.

Unused Indexes with no reads in the Index Usage Stats Table

From research I did online, I saw that the system table sys.dmdbindexusagestats contains usage statistics for indexes. There are two key columns on this table that are of particular interest, namely userscans **and **userseeks. If both of these values are zero, then the system probably isn’t using the index.

It is probably a good idea to at least look at a few other columns as well, including userlookups, systemseeks, systemscans, and systemlookups.

I created a query that identifies unused indexes based on this criteria and outputs them in a pretty nice format. I was unable to find any other queries online that output everything in as nice of a format as this, so I hope you find this helpful.

 -- GET UNUSED INDEXES THAT APPEAR IN THE INDEX USAGE STATS TABLE  
 DECLARE @MinimumPageCount int  
 SET @MinimumPageCount = 500

SELECT Databases.name AS [Database],  
 object_name(Indexes.object_id) AS [Table],  
 Indexes.name AS [Index],  
 PhysicalStats.page_count as [Page_Count],  
 CONVERT(decimal(18,2), PhysicalStats.page_count * 8 / 1024.0) AS [Total Size (MB)],  
 CONVERT(decimal(18,2), PhysicalStats.avg_fragmentation_in_percent) AS [Frag %],  
 ParititionStats.row_count AS [Row Count],  
 CONVERT(decimal(18,2), (PhysicalStats.page_count * 8.0 * 1024)  
 / ParititionStats.row_count) AS [Index Size/Row (Bytes)]  
 FROM sys.dm_db_index_usage_stats UsageStats  
 INNER JOIN sys.indexes Indexes  
 ON Indexes.index_id = UsageStats.index_id  
 AND Indexes.object_id = UsageStats.object_id  
 INNER JOIN SYS.databases Databases  
 ON Databases.database_id = UsageStats.database_id  
 INNER JOIN sys.dm_db_index_physical_stats (DB_ID(),NULL,NULL,NULL,NULL)  
 AS PhysicalStats  
 ON PhysicalStats.index_id = UsageStats.Index_id  
 and PhysicalStats.object_id = UsageStats.object_id  
 INNER JOIN SYS.dm_db_partition_stats ParititionStats  
 ON ParititionStats.index_id = UsageStats.index_id  
 and ParititionStats.object_id = UsageStats.object_id  
 WHERE UsageStats.user_scans = 0  
 AND UsageStats.user_seeks = 0  
 -- ignore indexes with less than a certain number of pages of memory  
 AND PhysicalStats.page_count > @MinimumPageCount  
 -- Exclude primary keys, which should not be removed  
 AND Indexes.type_desc != ‘CLUSTERED’  
 ORDER BY [Page_Count] DESC

This query includes the following helpful information on the unused indexes:

  • Table Name
  • Index Name
  • Page Count
  • Total Index Size (MB)
  • Fragmentation (%)
  • Row Count
  • Index Size/Row (Bytes)

Unused Indexes with no entry in the Index Usage Stats Table

For part of my research on other sites, I noticed that there are some indexes that somehow never get listed in the dmdbindexusagestats table. So, running this first query is not really enough. In fact, the SQL Index Usage Report doesn’t list all the indexes either. So, there is another query that I created to identify these types of indexes. I joined on several tables again to make the output more useful.

 -- GET UNUSED INDEXES THAT DO **NOT** APPEAR IN THE INDEX USAGE STATS TABLE  
 DECLARE @dbid INT  
 SELECT @dbid = DB_ID(DB_NAME())

SELECT Databases.Name AS [Database],  
 Objects.NAME AS [Table],  
 Indexes.NAME AS [Index],  
 Indexes.INDEX_ID,  
 PhysicalStats.page_count as [Page Count],  
 CONVERT(decimal(18,2), PhysicalStats.page_count * 8 / 1024.0) AS [Total Index Size (MB)],  
 CONVERT(decimal(18,2), PhysicalStats.avg_fragmentation_in_percent) AS [Fragmentation (%)]  
 FROM SYS.INDEXES Indexes  
 INNER JOIN SYS.OBJECTS Objects ON Indexes.OBJECT_ID = Objects.OBJECT_ID  
 LEFT JOIN sys.dm_db_index_physical_stats(@dbid, null, null, null, null) PhysicalStats  
 ON PhysicalStats.object_id = Indexes.object_id  
 AND PhysicalStats.index_id = indexes.index_id  
 INNER JOIN sys.databases Databases  
 ON Databases.database_id = PhysicalStats.database_id  
 WHERE Objects.type = ‘U’ — Is User Table  
 AND Indexes.is_primary_key = 0  
 AND Indexes.INDEX_ID NOT IN (  
 SELECT UsageStats.INDEX_ID  
 FROM SYS.DM_DB_INDEX_USAGE_STATS UsageStats  
 WHERE UsageStats.OBJECT_ID = Indexes.OBJECT_ID  
 AND Indexes.INDEX_ID = UsageStats.INDEX_ID  
 AND DATABASE_ID = @dbid)  
 ORDER BY PhysicalStats.page_count DESC,  
 Objects.NAME,  
 Indexes.INDEX_ID,  
 Indexes.NAME ASC

When to Remove an Index

Just because an index is not being used does not necessarily mean it should be removed. The queries above ignore the clustered indexes on the tables, since there are very few times, if any, when a table should not have any index at all. There may be other reasons to keep an index around as well. Those decisions are best made by those that have a knowledge of how the data is used and will be used in the future.

Conclusion

I was able to use these two queries to get rid of a lot of dead weight in several of our databases. In one of them, in particular, I was able to remove almost 1 GB of space. That’s 1 GB less for every backup, so it adds up over time.

One last note… After removing indexes, it’s probably a good idea to run SQL Profiler, just to be safe, on the database to catch any queries that may be expensive and somehow used an index that should not have been deleted. See my other post on SQL Optimization – A Beginner’s Guide

For more tips on managing a SQL Server Database, see Managing a SQL Server Database: Tips for Beginners


jQuery Mobile Code Snippets for Visual Studio

I’ve been recently learning jQuery Mobile and am really impressed with the technology. It is a powerful and robust tool to quickly create content for mobile devices. However, with it being so new to me and with it being so robust, I find it difficult to remember all of the syntax for the many different things you can do with it. It occurred to me that code snippets could be a great way to relieve this burden, so I began creating my own to do this.

As an example, using my jqmPage code snippet (by typing

<div data-role="page" data-theme="THEME" id="PAGENAME">  
    <div data-position="fixed" data-role="header" data-theme="a">
        <h1>HEADER</h1>
    </div>
    <div data-role="content" data-theme="a"></div>
    <div class="ui-bar" data-position="fixed" data-role="footer" data-theme="a">
        <h4>FOOTER</h4>
    </div>
</div>  

Then, if I wanted to, for instance, add a slider, all I have to do is use my jqmSlider snippet (by typing

<fieldset data-role="fieldcontain">  
    <label for="SLIDER1">Slider Input:</label>  
    <input type="range" name="slider" id="SLIDER1" min="MIN" max="MAX"
           value="INITIALVALUE" data-highlight="false" data-mini="false" />
</fieldset>  

Then, to add a text field, I just have to type <jqmText + tab + tab, and I get this:

<div class="fieldcontain">  
    <label for="txtNAME1">Text Input:</label>  
    <input data-mini="false" id="txtNAME1" name="name" placeholder="PLACEHOLDER" type="TEXT" value=""></input>
</div>  

The Code Snippet Tooltip for the input type identifies the valid options.

I’ve included all my current snippet files below. They’ll probably change a little bit as I continue to learn jQuery Mobile better, and you’ll probably want to make changes of your own. Nonetheless, this is a decent start.

jQuery Mobile Code Snippets