Foliotek Development Blog

Managing a SQL Server Database: Tips for Beginners

by John Pasquet

The company I work for is a web development company that uses SQL Server for its databases. The company is not quite big enough to have a DBA, so each of the developers takes a varying amount of responsibility to manage the databases. Over time, I have gravitated perhaps more than the others toward this responsibility and have learned several things that have really helped to maintain them well. If you are in the same position, I hope these things can be a benefit to you as well.

1. Index Fragmentation

Over time, indexes inevitably get fragmented and need to be rebuilt. A table with GUID’s that are not sequential will fragment really quickly if there are a good amount of inserts regularly. So, how do you know if you need to rebuilding your indexes? Fortunately, you can run a simple query to get this information. It should be noted that you need not worry about fragmentation on indexes that are very small. The following query gets the fragmentation statistics of indexes with a (memory) page count greater than 50 and a fragmentation greater than 25 percent. (Both of these variables may be adjusted.)


SELECT	sys.objects.name AS [Table Name],
	sys.indexes.name AS [Index Name],
	partition_number AS [Partition],
	CAST(avg_fragmentation_in_percent AS DECIMAL (4,2)) AS [Fragmentation Percentage],
	CONVERT(decimal(18,2), page_count * 8 / 1024.0) AS [Total Index Size (MB)],
	page_count AS [Page Count]
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, 'LIMITED') IndexStats
	INNER JOIN sys.objects
		ON sys.objects.object_id = IndexStats.object_id
	INNER JOIN SYS.indexes
		ON SYS.indexes.object_id = SYS.OBJECTS.object_id
			AND IndexStats.index_id = sys.indexes.index_id
WHERE avg_fragmentation_in_percent > 25 AND Indexes.index_id > 0
	AND page_count > 50

So, how did it look? If you have index fragmentation percentages over 90 for indexes that use a lot of pages, perhaps more than 500, then you probably need to set up a SQL Agent job to run weekly to rebuild the indexes. For more information on how to set this up, see SQL Server Optimization Script with Index Rebuilding to Reduce Fragmentation

2. Identifying and Optimizing Expensive Queries

If your database has never been analyzed or profiled, there may be some queries that are really inefficient and may be impacting your overall site performance. Oftentimes, when new tables are created, indexes are added that are thought to be necessary. However, only real world experiences will reveal if an additional index is needed or if a query needs to be rewritten to leverage indexes that are already built. To identify expensive queries, use the SQL Server Profiler tool from the Tools menu in SQL Server Management Studio.

For starters, I would suggest running it for five minutes, setting a threshold to ignore queries with less than 25 reads. Make sure you write the results to a database table. That way you can run queries on them and perform some careful analysis. Oftentimes, adding the right indexes or restructuring a few queries can drastically reduce the load on your database.

For more details on how to do this, see SQL Server Database Optimization – a Beginner’s Guide.

3. Running Reports to Identify Largest Tables and Indexes

Do you know what your largest tables are? Do you know which table requires the most space for its indexes? Well, it would be really good to know both of those things, and it’s pretty easy to find that information. Right-click on your database, scroll down to “Reports”, then “Standard Report”, and then “Disk Usage by Top Tables”. This will provide you with some great information. It comes initially sorted by the amount of reserved space used per table. This is a combination of the space for the data and the indexes combined. You also can see the number of records in each table and the amount of unused space.

Knowing which of your tables is the largest and which table requires the most space for its indexes can help direct your efforts in analyzing how your database. Sometimes you can find an index that is really large but is unnecessary. I found a column in a table that was very large, but was no longer being used or referenced anywhere in our system. I was able to simply set all the values of the column to blanks and recoup quite a bit of space, and, again, this was for data that we were not using at all anymore. This will often highly log tables that contain data that is no longer necessary as well. See later in this post on removing such tables.

4. Cleaning up log tables

Oftentimes a database will have tables for errors, actions, emails, and various other things like that. However, it is rarely necessary to store all of this data in the database forever. An entry in the error log may be of interest for a month or two, but after that amount of time has passed, it has either been resolved or was not important enough to deal with. Furthermore, as the database changes, it becomes more and more difficult to really recreate what caused the problem.

If you have a good database backup plan–keeping copies of daily, weekly, monthly, and yearly backups, then you can probably remove old rows. If you keep every yearly backup, then should be fine to remove all rows older than one year. The January 1st backup will contain all of the errors for the previous year, so removing older rows only removes them from the current version of the database. You can still get back to older rows by restoring annual backups if your really need to do so.

For other tables, such as action logs, it may be necessary to be able to go back as much as a year, but if there is no business reason for being able to go back farther than that, you can probably remove rows older than perhaps two months. Your monthly backups will have a record that you can retrieve if necessary.

These types of tables can grow to be fairly large over time. If the tables are indexed, then your index rebuilding jobs will have to deal with them and basic inserts will become slower as well. Consequently, taking a careful look at tables like these and then scheduling a monthly SQL Agent job to remove old rows can really serve to keep your database lean and efficient.

5. Identifying and Removing Unused Indexes

There is probably a good chance that you have some indexes in your database that are not being used. These unused indexes can sometimes be rather large. This results in increased database size and backup storage, decreased insert time, and increased time to rebuild the indexes when they get too fragmented. So, it’s probably good to run a query to see if you have any unused indexes that are medium to large. You probably don’t need to worry about small indexes that are not currently being used.

There are actually two queries that you can run. There is an index usage table that you can reference and see when certain values are set to zero. Those indexes are not being used. However, there are also some indexes that, for some reason, never seem to even show up in that table. So, you have to run two different queries:

Indexes Not In Index Usage Table


-- GET UNUSED INDEXES THAT DO **NOT** APPEAR IN THE INDEX USAGE STATS TABLE
DECLARE @dbid INT
SELECT @dbid = DB_ID(DB_NAME())

SELECT	Databases.Name AS [Database],
	object_name(Indexes.object_id) AS [Table],
	Indexes.NAME AS [Index],
	Indexes.INDEX_ID,
	PhysicalStats.page_count as [Page Count],
	CONVERT(decimal(18,2), PhysicalStats.page_count * 8 / 1024.0) AS [Total Size (MB)],
	CONVERT(decimal(18,2), PhysicalStats.avg_fragmentation_in_percent) AS [Frag. %]
FROM SYS.INDEXES Indexes
	INNER JOIN SYS.OBJECTS Objects ON Indexes.OBJECT_ID = Objects.OBJECT_ID
	LEFT JOIN sys.dm_db_index_physical_stats(@dbid, null, null, null, null) PhysicalStats
		ON PhysicalStats.object_id = Indexes.object_id
                     AND PhysicalStats.index_id = indexes.index_id
	INNER JOIN sys.databases Databases
		ON Databases.database_id = PhysicalStats.database_id
WHERE Objects.type = 'U' -- User Table
	AND Indexes.is_primary_key = 0
	AND Indexes.type = 2	-- Nonclustered indexes
	AND Indexes.INDEX_ID NOT IN (
		SELECT UsageStats.INDEX_ID
		FROM SYS.DM_DB_INDEX_USAGE_STATS UsageStats
		WHERE UsageStats.OBJECT_ID = Indexes.OBJECT_ID
			AND   Indexes.INDEX_ID = UsageStats.INDEX_ID
			AND   DATABASE_ID = @dbid)

Unused Indexes In the Index Usage Table


-- GET UNUSED INDEXES THAT APPEAR IN THE INDEX USAGE STATS TABLE
DECLARE @MinimumPageCount int
SET @MinimumPageCount = 500

SELECT	Databases.name AS [Database],
	object_name(Indexes.object_id) AS [Table],
	Indexes.name AS [Index],
	PhysicalStats.page_count as [Page Count],
	CONVERT(decimal(18,2), PhysicalStats.page_count * 8 / 1024.0) AS [Total Size (MB)],
	CONVERT(decimal(18,2), PhysicalStats.avg_fragmentation_in_percent) AS [Frag. (%)],
	ParititionStats.row_count AS [Row Count],
	CONVERT(decimal(18,2),
		(PhysicalStats.page_count * 8.0 * 1024) / ParititionStats.row_count)
			AS [Index Size/Row (Bytes)],
	UsageStats.last_user_scan,
	UsageStats.last_user_seek
FROM sys.dm_db_index_usage_stats UsageStats
	INNER JOIN sys.indexes Indexes
		ON Indexes.index_id = UsageStats.index_id
			AND Indexes.object_id = UsageStats.object_id
	INNER JOIN SYS.databases Databases
		ON Databases.database_id = UsageStats.database_id
	INNER JOIN sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL)
			AS PhysicalStats
		ON PhysicalStats.index_id = UsageStats.Index_id
			AND PhysicalStats.object_id = UsageStats.object_id
	INNER JOIN SYS.dm_db_partition_stats ParititionStats
		ON ParititionStats.index_id = UsageStats.index_id
			AND ParititionStats.object_id = UsageStats.object_id
WHERE ((UsageStats.user_scans = 0 AND UsageStats.user_seeks = 0)
		OR (UsageStats.last_user_scan < DATEADD(year, -1, getdate())
			AND (UsageStats.last_user_seek < DATEADD(year, -1, getdate()))))
	-- ignore indexes with less than a certain number of pages of memory
	AND PhysicalStats.page_count > @MinimumPageCount
	-- Exclude primary keys, which should not be removed
	AND Indexes.is_primary_key = 0	

For more information on this, see Identifying Unused Indexes in a SQL Server Database

6. Identifying and Removing Duplicate or Near Duplicate Indexes

In addition to removing unused indexes, you can also remove duplicate indexes or near-duplicate indexes. Obviously, a duplicate index is unnecessary, but sometimes a near-duplicate index can be removed without any adverse effects. I have found two indexes that used all of the same columns in the index itself and only differed in the columns that were “included” in the index (but not indexed). If there are two indexes that only differ by one included column and the index is large, then you may be able to remove the simpler index. Of course, it’s a good practice to save a script that can quickly regenerate the index if you find that it is actually necessary after all.

Ok, so here’s the query:


WITH IndexSummary AS
(

SELECT DISTINCT sys.objects.name AS [Table Name],
	sys.indexes.name AS [Index Name],
	SUBSTRING((SELECT ', ' +  sys.columns.Name as [text()]
		FROM sys.columns
			INNER JOIN sys.index_columns
				ON sys.index_columns.column_id = sys.columns.column_id
				AND sys.index_columns.object_id = sys.columns.object_id
		WHERE sys.index_columns.index_id = sys.indexes.index_id
			AND sys.index_columns.object_id = sys.indexes.object_id
			AND sys.index_columns.is_included_column = 0
		0RDER BY sys.columns.name
	FOR XML Path('')), 2, 10000) AS [Indexed Column Names],
	ISNULL(SUBSTRING((SELECT ', ' +  sys.columns.Name as [text()]
		FROM sys.columns
			INNER JOIN sys.index_columns
			ON sys.index_columns.column_id = sys.columns.column_id
			AND sys.index_columns.object_id = sys.columns.object_id
		WHERE sys.index_columns.index_id = sys.indexes.index_id
			AND sys.index_columns.object_id = sys.indexes.object_id
			AND sys.index_columns.is_included_column = 1
		0RDER BY sys.columns.name
		FOR XML Path('')), 2, 10000), '') AS [Included Column Names],
	sys.indexes.index_id, sys.indexes.object_id
FROM sys.indexes
	INNER JOIN SYS.index_columns
		ON sys.indexes.index_id = SYS.index_columns.index_id
			AND sys.indexes.object_id = sys.index_columns.object_id
	INNER JOIN sys.objects
		ON sys.OBJECTS.object_id = SYS.indexES.object_id
WHERE sys.objects.type = 'U'
)

SELECT IndexSummary.[Table Name],
	IndexSummary.[Index Name],
	IndexSummary.[Indexed Column Names],
	IndexSummary.[Included Column Names],
	PhysicalStats.page_count as [Page Count],
	CONVERT(decimal(18,2), PhysicalStats.page_count * 8 / 1024.0) AS [Size (MB)],
	CONVERT(decimal(18,2), PhysicalStats.avg_fragmentation_in_percent) AS [Fragment %]
FROM IndexSummary
	INNER JOIN sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL)
	   AS PhysicalStats
		ON PhysicalStats.index_id = IndexSummary.index_id
			AND PhysicalStats.object_id = IndexSummary.object_id
WHERE (SELECT COUNT(*) as Computed
		FROM IndexSummary Summary2
		WHERE Summary2.[Table Name] = IndexSummary.[Table Name]
			AND Summary2.[Indexed Cols] = IndexSummary.[Indexed Cols]) > 1
0RDER BY [Table Name], [Index Name], [Indexed Column Names], [Included Column Names]

For more information on this, see Query to Identify Duplicate or Redundant Indexes in SQL Server

7. Checking Identity Columns for Overflow Risk

If your database is using integer identity columns, then the largest ID that can be stored is 2,147,483,647 (2.1 billion). Most tables never come close to overflowing, but it is definitely better to make sure than to not be sure and experience an overflow one day. So, here’s an easy query to quickly check:


SELECT sys.tables.name AS [Table Name],
	last_value AS [Last Value],
	CASE (MAX_LENGTH)
		WHEN 1 THEN 'TINYINT'
		WHEN 2 THEN 'SMALLINT'
		WHEN 4 THEN 'INT'
		WHEN 8 THEN 'BIGINT'
		ELSE 'UNKNOWN'
	END AS DataType,
	CAST(cast(last_value as int) / 2147483647.0 * 100.0 AS DECIMAL(5,2))
	    AS [Percentage of ID's Used]
FROM sys.identity_columns
	INNER JOIN sys.tables
		ON sys.identity_columns.object_id = sys.tables.object_id

For more information, see Checking Integer Identity Columns in SQL Server for Overflow Risk

Conclusion

Alright, so there are some tips that I hope you will find helpful. Good luck!


Query to Identify Duplicate or Redundant Indexes in SQL Server

by John Pasquet

I recently noticed that a particular table had two indexes that were nearly identical. In fact, the indexed columns were exactly the same. The only difference was in the included columns. This made me wonder if there were any other indexes that had duplicates or near duplicates, so I developed a query to get results from the entire database. The query returns the following columns:

  • Table Name
  • Index Name
  • Indexed Column Names
  • Included Column Names
  • Page Count (for the Index)
  • Total Index Size
  • Fragmentation (of the Index)

This made it really easy to see not only which indexes had duplicates or near duplicates, but also how large those indexes were. This allowed me to focus my time on the larger indexes, which have a bigger impact in not only storage, but also rebuilding and reorganization tasks.

So, here it is:


WITH IndexSummary AS
(

SELECT DISTINCT sys.objects.name AS [Table Name],
	sys.indexes.name AS [Index Name],
	SUBSTRING((SELECT ', ' +  sys.columns.Name as [text()]
		FROM sys.columns
			INNER JOIN sys.index_columns
				ON sys.index_columns.column_id = sys.columns.column_id
				AND sys.index_columns.object_id = sys.columns.object_id
		WHERE sys.index_columns.index_id = sys.indexes.index_id
			AND sys.index_columns.object_id = sys.indexes.object_id
			AND sys.index_columns.is_included_column = 0
		0RDER BY sys.columns.name
	FOR XML Path('')), 2, 10000) AS [Indexed Column Names],
	ISNULL(SUBSTRING((SELECT ', ' +  sys.columns.Name as [text()]
		FROM sys.columns
			INNER JOIN sys.index_columns
			ON sys.index_columns.column_id = sys.columns.column_id
			AND sys.index_columns.object_id = sys.columns.object_id
		WHERE sys.index_columns.index_id = sys.indexes.index_id
			AND sys.index_columns.object_id = sys.indexes.object_id
			AND sys.index_columns.is_included_column = 1
		0RDER BY sys.columns.name
		FOR XML Path('')), 2, 10000), '') AS [Included Column Names],
	sys.indexes.index_id, sys.indexes.object_id
FROM sys.indexes
	INNER JOIN SYS.index_columns
		ON sys.indexes.index_id = SYS.index_columns.index_id
			AND sys.indexes.object_id = sys.index_columns.object_id
	INNER JOIN sys.objects
		ON sys.OBJECTS.object_id = SYS.indexES.object_id
WHERE sys.objects.type = 'U'
)

SELECT IndexSummary.[Table Name],
	IndexSummary.[Index Name],
	IndexSummary.[Indexed Column Names],
	IndexSummary.[Included Column Names],
	PhysicalStats.page_count as [Page Count],
	CONVERT(decimal(18,2), PhysicalStats.page_count * 8 / 1024.0) AS [Size (MB)],
	CONVERT(decimal(18,2), PhysicalStats.avg_fragmentation_in_percent) AS [Fragment %]
FROM IndexSummary
	INNER JOIN sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL)
	   AS PhysicalStats
		ON PhysicalStats.index_id = IndexSummary.index_id
			AND PhysicalStats.object_id = IndexSummary.object_id
WHERE (SELECT COUNT(*) as Computed
		FROM IndexSummary Summary2
		WHERE Summary2.[Table Name] = IndexSummary.[Table Name]
			AND Summary2.[Indexed Cols] = IndexSummary.[Indexed Cols]) > 1
0RDER BY [Table Name], [Index Name], [Indexed Column Names], [Included Column Names]

* For some bizarre reason, the 0RDER BY clauses caused some formatting issues, so I replaces the “O” with a zero.

Table Name	Index	Indexed Cols	Included Cols	Pages	Size (MB)	Frag %
My_Table	Indx_1	   Col1		Col2, Col3	123	0.96		8.94
My_Table	Indx_2	   Col1		Col2, Col3	123	0.96		8.94

For more tips on managing a SQL Server Database, see Managing a SQL Server Database: Tips for Beginners


Checking Integer Identity Columns in SQL Server for Overflow Risk

by John Pasquet

A very common practice is to use a column of type integer for the primary key on a table with auto increment. This works great in most cases. However, there is a limitation to this that should be kept in mind. The largest value that an integer column can hold is 2,147,483,647 (2.1 billion). So, if you have a database that will contain the names of everyone living in Wyoming and Montana, you’ll be fine. However, if your database will contain the names of everyone living in China and India, you will definitely have an overflow issue.

So, how can you tell what your risk level is? You could go through each table and select the max ID field, but that would take too long unless your database is very small. Yes, I did start doing it this way myself, but quickly realized this was not going to be very fun. So, I came up with this query that provides the Table Name, the Last Value used in the identity column, the Data Type of the column, and the percentage of valid integers that have been used (for the integer data type).

Running this on my database, I was able to see that we do have one table that has an identity column that has reached 50 million. This is only about 2.3% of the way to the largest integer, so we have some time to deal with it. Still, it is nice to know that we are safe for a while.


SELECT sys.tables.name AS [Table Name],
	last_value AS [Last Value],
	CASE (MAX_LENGTH)
		WHEN 1 THEN 'TINYINT'
		WHEN 2 THEN 'SMALLINT'
		WHEN 4 THEN 'INT'
		WHEN 8 THEN 'BIGINT'
		ELSE 'UNKNOWN'
	END AS DataType,
	CAST(cast(last_value as int) / 2147483647.0 * 100.0 AS DECIMAL(5,2))
	    AS [Percentage of ID's Used]
FROM sys.identity_columns
	INNER JOIN sys.tables
		ON sys.identity_columns.object_id = sys.tables.object_id
ORDER BY last_value DESC

For more tips on managing a SQL Server Database, see Managing a SQL Server Database: Tips for Beginners


Index Recommendations from the Database Engine Tuning Adviser should not be accepted blindly

by John Pasquet

The Problem

I ran a SQL Profiler session on one of our SQL databases toda to catch any expensive queries and see if they could be optimized. One query that showed up was one that selected the top 30 rows from one table where a certain bit column was set to true, sorted by oldest first. The table being queried had close to 100,000 rows in it. Each query took about 6,000 reads.

The query looked something like this:


    SELECT TOP (30) *
    FROM TheTable
    WHERE BitColumn = 1
    ORDER  BY DateColumn

The Suggested Solution

My inclination was that all I needed to do was add a very small index on the column of type BIT being used in the WHERE clause. However, I went ahead and ran the query through the Database Engine Tuning Adviser, just to see what it came up with.

The index recommendation suggested an index on three columns, one of which was, in fact, the one in the WHERE clause. The Date column was also included along with the table’s Primary ID. The estimated improvement was 99%, which was great and what I had expected.

However, the suggested index also had every other table column in its INCLUDE statement. The result was that the new index was projected to take up 68.5 MB of space. That’s quite a bit of space for an index on a bit column.

The Employed Solution

Consequently, I manually went in to SQL Server Management Studio and just added a simple index on the column used in the WHERE clause. Then I checked the size of the new index and found that it wasonly 1.75 MB–quite significantly less than 68.5 MB.

The query now requires only 14 reads, compared with the nearly 6,000 it used before, and I haven’t added 68 MB of index storage to the database.

Final Remark

So, there are certainly times where the recommendations from the Database Tuning Adviser would probably be fine. Each case is different, of course. In this case, adding 68 MB of index storage just was not worth it. Adding a simple index manually gave the optimization we needed without really impacting the size of the database. Of course, 68 MB is not really a lot, but adding that 10 or 20 or 50 times when it is not necessary would certainly begin to unnecessarily impact it.

So, the Database Tuning Adviser is definitely helpful, but should probably be used more as a guide rather than as something not to be questioned.


Identifying Unused Indexes in a SQL Server Database

by John Pasquet

One of our databases began timing out during our weekly index rebuild job, so I began to look into what indexes were getting rebuilt. After looking at the Index Usage Report, I noticed that some of them were not even being used. Some of them were very large indexes as well.

Obviously, an index that is never used doesn’t really need to be rebuilt to improve performance, so I removed a few of them. Then I began to look more closely at how I could identify other indexes that were not being used, aside from looking through every index on the SQL Report.

Unused Indexes with no reads in the Index Usage Stats Table

From research I did online, I saw that the system table sys.dm_db_index_usage_stats contains usage statistics for indexes. There are two key columns on this table that are of particular interest, namely user_scans and user_seeks. If both of these values are zero, then the system probably isn’t using the index.

It is probably a good idea to at least look at a few other columns as well, including user_lookups, system_seeks, system_scans, and system_lookups.

I created a query that identifies unused indexes based on this criteria and outputs them in a pretty nice format. I was unable to find any other queries online that output everything in as nice of a format as this, so I hope you find this helpful.


-- GET UNUSED INDEXES THAT APPEAR IN THE INDEX USAGE STATS TABLE
DECLARE @MinimumPageCount int
SET @MinimumPageCount = 500

SELECT	Databases.name AS [Database],
	object_name(Indexes.object_id) AS [Table],
	Indexes.name AS [Index],
	PhysicalStats.page_count as [Page_Count],
	CONVERT(decimal(18,2), PhysicalStats.page_count * 8 / 1024.0) AS [Total Size (MB)],
	CONVERT(decimal(18,2), PhysicalStats.avg_fragmentation_in_percent) AS [Frag %],
	ParititionStats.row_count AS [Row Count],
	CONVERT(decimal(18,2), (PhysicalStats.page_count * 8.0 * 1024)
		/ ParititionStats.row_count) AS [Index Size/Row (Bytes)]
FROM sys.dm_db_index_usage_stats UsageStats
	INNER JOIN sys.indexes Indexes
		ON Indexes.index_id = UsageStats.index_id
			AND Indexes.object_id = UsageStats.object_id
	INNER JOIN SYS.databases Databases
		ON Databases.database_id = UsageStats.database_id
	INNER JOIN sys.dm_db_index_physical_stats (DB_ID(),NULL,NULL,NULL,NULL)
			AS PhysicalStats
		ON PhysicalStats.index_id = UsageStats.Index_id
			and PhysicalStats.object_id = UsageStats.object_id
	INNER JOIN SYS.dm_db_partition_stats ParititionStats
		ON ParititionStats.index_id = UsageStats.index_id
			and ParititionStats.object_id = UsageStats.object_id
WHERE UsageStats.user_scans = 0
	AND UsageStats.user_seeks = 0
	-- ignore indexes with less than a certain number of pages of memory
	AND PhysicalStats.page_count > @MinimumPageCount
	-- Exclude primary keys, which should not be removed
	AND Indexes.type_desc != 'CLUSTERED'
ORDER BY [Page_Count] DESC

This query includes the following helpful information on the unused indexes:

  • Table Name
  • Index Name
  • Page Count
  • Total Index Size (MB)
  • Fragmentation (%)
  • Row Count
  • Index Size/Row (Bytes)

Unused Indexes with no entry in the Index Usage Stats Table

For part of my research on other sites, I noticed that there are some indexes that somehow never get listed in the dm_db_index_usage_stats table. So, running this first query is not really enough. In fact, the SQL Index Usage Report doesn’t list all the indexes either. So, there is another query that I created to identify these types of indexes. I joined on several tables again to make the output more useful.


-- GET UNUSED INDEXES THAT DO **NOT** APPEAR IN THE INDEX USAGE STATS TABLE
DECLARE @dbid INT
SELECT @dbid = DB_ID(DB_NAME())

SELECT	Databases.Name AS [Database],
	Objects.NAME AS [Table],
	Indexes.NAME AS [Index],
	Indexes.INDEX_ID,
	PhysicalStats.page_count as [Page Count],
	CONVERT(decimal(18,2), PhysicalStats.page_count * 8 / 1024.0) AS [Total Index Size (MB)],
	CONVERT(decimal(18,2), PhysicalStats.avg_fragmentation_in_percent) AS [Fragmentation (%)]
FROM SYS.INDEXES Indexes
	INNER JOIN SYS.OBJECTS Objects ON Indexes.OBJECT_ID = Objects.OBJECT_ID
	LEFT JOIN sys.dm_db_index_physical_stats(@dbid, null, null, null, null) PhysicalStats
		ON PhysicalStats.object_id = Indexes.object_id
                     AND PhysicalStats.index_id = indexes.index_id
	INNER JOIN sys.databases Databases
		ON Databases.database_id = PhysicalStats.database_id
WHERE Objects.type = 'U' -- Is User Table
	AND Indexes.is_primary_key = 0
	AND Indexes.INDEX_ID NOT IN (
			SELECT UsageStats.INDEX_ID
			FROM SYS.DM_DB_INDEX_USAGE_STATS UsageStats
			WHERE UsageStats.OBJECT_ID = Indexes.OBJECT_ID
				AND   Indexes.INDEX_ID = UsageStats.INDEX_ID
				AND   DATABASE_ID = @dbid)
ORDER BY PhysicalStats.page_count DESC,
	Objects.NAME,
        Indexes.INDEX_ID,
        Indexes.NAME ASC

When to Remove an Index

Just because an index is not being used does not necessarily mean it should be removed. The queries above ignore the clustered indexes on the tables, since there are very few times, if any, when a table should not have any index at all. There may be other reasons to keep an index around as well. Those decisions are best made by those that have a knowledge of how the data is used and will be used in the future.

Conclusion

I was able to use these two queries to get rid of a lot of dead weight in several of our databases. In one of them, in particular, I was able to remove almost 1 GB of space. That’s 1 GB less for every backup, so it adds up over time.

One last note… After removing indexes, it’s probably a good idea to run SQL Profiler, just to be safe, on the database to catch any queries that may be expensive and somehow used an index that should not have been deleted. See my other post on SQL Optimization – A Beginner’s Guide

For more tips on managing a SQL Server Database, see Managing a SQL Server Database: Tips for Beginners