# Foliotek Development Blog

## Computing Color-Based Representation of Performance

Friday, February 6th, 2015

### Initial Work

I have been working on trying to visually represent numerical data of how a particular user/student has performed, and I thought I would share some of my thoughts. There are many different scales in our system with which a student may be scored. For instance, a metric may have 5 levels of scoring:

• Does not meet expectations: 1 point
• Approaches expectations: 2 points
• Meets expectations: 3 points
• Exceeds expectations: 4 points
• Exemplary: 5 points

The maximum amount of points is 5 and the minimum is 1. One way to represent this is to show a miniature rubric of the levels with the student’s level selected. This just a simple table with a td for each level. Mousing over the td shows the level’s name. This table is specified at a 100 px width.

However, just having a colored indicator for what was selected seemed a little bland, so I began trying to envision how an appropriate color could be added. Red and Green are commonly used to associate bad to good, so I decided to try to compute appropriate greens and reds to represent the student’s score. The all green (#0F0) and all red (#F00) seem a little too much, so I thought I would set the range between what I will refer to as “strong green” and “strong red” (#0A0 and #A00).

For anyone not familiar with this representation, it is the RGB/Red Green Blue value in hexadecimal with an integer range of 0 to 255 for each. “A” = 160 an “F” = 255.

##### The Math

So, how do you represent the different levels by color? Essentially, the amount of green will start at zero and increase to “A” (160) and the amount of red will start at “A” (160) and decrease to zero. The blue will remain at zero no matter what.

In the case above, I initially came up with this computation:

Green = ( (Points Earned) / (Max Points Possible) ) * 160
= ( 4 / 5 ) * 160
= 80% green
= 128 (80 in hex)

Red = 160 – Green
= 32 (20 in hex)

Computed Color = #280 (20% of max red, 80% of max green, no blue)

##### The Mathematical Error

However, this does not work completely right, specifically due to the fact that the scale does not begin at zero. Thus, while the highest score will compute to the maximum green, the lowest score will not compute to no green, but to 20% of the maximum amount of green.

To fix this, I realized I needed to essentially make the minimum score equal to zero. This can be easily done by subtracting the minimum points possible from BOTH the Points Earned and the Max Points Possible.

Green = ( (Points Earned) – (Min Points Possible) ) / ( (Max Points Possible) – (Min Points Possible) ) * 160
= ( (4 – 1) / (5 – 1) ) * 160
= ( 3 / 4 ) * 160
= 75% green
= 120 (78 in hex)

Red = 160 – Green
= 40 (28 in hex)

Computed Color = #280 (20% of max red, 80% of max green, no blue)

Thus, there are five possibilities: 0% Green, 25% Green, 50% Green, 75% Green and 100% Green. The inclusive zero is the key here.

### The Colorful Grid

This allows for a more colorful grid. I have provided one grid with a single score selected and then one with each of them, just to show the different colors:

### Expanding to Aggregate Scores

We also needed a way to represent aggregate data that would not have specific set of defined levels. For instance, if a user had 20 scores from varying metrics, how could this be represented?

What I did was again computed the total number of points and the total earned points. Then I again adjusted for the minimum number of points possible in cases where the metrics do not start at zero.

I enlarged the width of the table for this to stretch across the entire screen. I’ll make it 500 px for this example. This table also has just three cells, with the first and third being essentially spacers and the second cell representing the score. I realized I only needed to compute the width of the first cell. The second cell will be a fixed width of 30 px, and then the third cell will take up the remaining space, if any.

The math for computing the color will still be the same. The key thing is to compute the size of the first column. If the table width is 500 px and the width of the score cell is 30 px, then there is only a total range of 0 to 470 pixels where that cell can be placed on the range. When it is placed at 470 pixels, that is the maximum it can be, as it will fill the remainder of the space on the table.

##### The Math for Placement

So, let’s assume that the total number of points a student received was 215 out of a total possible of 250. The minimum points earned will be 15. Adjusting the scale, this would indicate 200 out of 235 points earned. We can easily see, then, that our math should give us a width of the first cell of 400 pixels, since 235 is half of 470. (Yeah, I made it easy. I know. But this way we can validate the math.)

Placement = ( (Total Points Earned) – (Min Points Possible) ) / ( (Max Points Possible) – (Min Points Possible) ) * 470
= ( 215 – 15 ) / ( 250 – 15 ) * 470
= 200 / 235 * 470
= 400

Green = ( (Points Earned) – (Min Points Possible) ) / ( (Max Points Possible) – (Min Points Possible) ) * 160
= ( (215 – 15) / (250 – 15) ) * 160
= ( 200 / 235 ) * 160
= 85% green
= 136 (88 in hex)

Red = 160 – Green
= 24 (18 in hex)

Thus, the table looks like this:

##### Conclusion

So, I thought that was pretty interesting, but it gives a fairly clear visual representation of a student’s performance.

## Page Level Web Application Database Optimization

Monday, January 19th, 2015

### Do you know what queries are required to run for a single page load in your application? If not, perhaps you should.

That is a great question I wish I would have asked myself a long time ago, but didn’t know to ask it.

Over the past several years, I have enjoyed the challenge of database query optimization. I work for a small company, and no one else seems to have much of an interest in this sort of thing, so that means I get to do it. It is certainly interesting to see how a small rewrite of a query or a new index can achieve significant results. However, I recently began looking at optimization from a different perspective–that of the individual page.

How many queries are required to load a single page? How many should it take?

To identify expensive queries that need optimization, I ran a SQL Profiler session that would catch queries with Reads greater than 1000, 5000 or maybe 10000. Or maybe more than 500 CPU. However, one day I decided to see all the queries that ran on a given page load, and I set the Read threshold all the way down to 1. I also set up the web application running on my machine to point to a test database and restricted the Profiler session accordingly.

From SQL Server Management Studio:

• Tools > SQL Server Profiler
• Enter Database credentials, then click Connect
• Event Selection tab
• Check “Show all events” and “Show all columns”
• Column Filters
• DatabaseName: TestDBName
• HostName: ComputerName (ws###)
• Run

After clicking “run” and loading the first page, I was literally stunned.

One page was running far more queries than I had ever imagined. But why? I started looking through the results. The column “TextData” shows the first 100 or so characters, and then the window on the bottom shows the full query. Suffice it to say that there were many opportunities for optimization.

## What I Learned

#### Repetitive Queries

The first thing I noticed was that on many of the page loads, there were many redundant queries. The page would query the database for the User or Company multiple times, sometimes one right after the other. None of this was intended. The impact of certain ways of doing things, however, was never realized in terms of the load on the database.

One of the biggest things to optimize are things like side menus, top menus, or footers–things that are displayed on every page. Any inefficiencies in these will be hit every time anyone loads any page. That also means, though, that any increased inefficiency can have a huge impact. If 100 pages are hit every second, and there are 20 queries that can be eliminated from each page load, you have just eliminated 1000 queries a second.

Many of these queries are small, to be sure. It takes very little if any CPU and only a few reads to retrieve a User from the User table. However, the cumulative increased traffic can be significant.

In many cases, I realized that I could just pass the user to a method that accessed a particular property on the row instead of getting the user inside the method itself. Or I could just pass in the property itself. Alternatively, I could create a page-level (or control-level [.ascx]) variable for User or Company or other tables used in many places.

```

private DataAccess.User _currentUser;
public DataAccess.UserCurrentOrganization
{
get
{
if (_currentUser== null)
_currentUser= DataAccess.User.Get(LoggedInUserID);
return _currentUser;
}
}

```

#### Unused Queries

It took some time, but I matched every query to the line of code that generated it. I found a few places where a query was called, but then the result was never used. This, of course, is bound to happen in a large application as modifications are introduced. But again, removing these for every page load can certainly help.

#### Unnecessary Queries

I also encountered a few places where unnecessary queries were being run. For instance, upon login, the system should check the expiration date of the user’s registration. That was the intent of the code. However, at some point, the code was moved and ended up getting run on every page load. So, the code was necessary on login, but not on every page. I never would have known it ran on every page had I not seen it for myself in the SQL Profiler session.

## Final Thoughts

It is, indeed, rewarding to identify a very expensive query and optimize it down to something very small. Yet, I’ve learned that there are not only gigantic weeds. Sometimes there are millions of smaller weeds. It is better to get rid of both.

## Simplifying Manual Insertion of Static Data from one Database to another

Thursday, September 5th, 2013

### Problem Description

We’ve all run into the issue. You have created a look-up table on one database, perhaps a test database. Now it’s time to take it live, and you need to port all the rows from the table in the test database into the corresponding table on the live database. Sure, you could use SQL Data Compare or various tools like that, but this table only has 10 rows, so is it really worth the trouble?

In my case, I had to port data into two live database instances, so I didn’t want to have to go through a long process. We do have SQL Compare tools, but need to upgrade them for SQL Server 2012, and I didn’t want to have to wait.

Of course, if the databases were on the same server, then copying between servers is pretty simple. However, if the databases are separated by type (i.e. Oracle and SQL Server), then that is much more difficult.

Let me use a standard User table as an example for simplicity, although that really does not fit with what you would use this for.

### The Brute Force Method

What I initially did was to just select everything from one table, paste it into a new query window in SQL Server Management Studio, then add commas and tick marks to insert each row manually.

```
SELECT * FROM Users
__________________________________________________________________________
UserID Email FirstName MiddleName LastName IsImpeached
1 geo...@presidents.gov George   Washington 0
3 tho...@presidents.gov Thomas   Jefferson 0
5 jame...@presidents.gov James   Monroe	0
6 joh...@presidents.gov John Q. Adams 0
7 and...@presidents.gov Andrew Jackson 0
8 mar...@presidents.gov Martin Van Buren 0
9 will...@presidents.gov William Henry Harrison	0

```

Next, I started creating the INSERT statement. This necessitated the insertion of a lot of commas and ticks and parentheses. This is tedious, to be sure, but it does work, and it isn’t all that bad for a limited number of rows.

```
INSERT INTO Users (UserID, Email, FirstName, LastName, MiddleName, IsImpeached)
VALUES	(1, 'geo...@presidents.gov', 'George', '', 'Washington', 0)
__________________________________________________________________________
3 tho...@presidents.gov Thomas   Jefferson 0
5 jame...@presidents.gov James   Monroe	0
6 joh...@presidents.gov John Q. Adams 0
7 and...@presidents.gov Andrew Jackson 0
8 mar...@presidents.gov Martin Van Buren 0
9 will...@presidents.gov William Henry Harrison	0

```

After than, I could copy the Insert statement from the beginning to the first parenthesis after the VALUES keyword and then add all the commas and ticks. Still tedious, but not all that bad if you’re just wanting to complete the task manually.

### A Slightly Less Taxing Brute Force Method

It occurred to me somewhere in this process that there is a fairly easy way to make this brute force method slightly less taxing, and I don’t know why this hadn’t occurred to me before. Perhaps it is one of those things that does not seem obvious until you see it for the first time. Anyway, the main effort comes in adding all the parentheses, commas and ticks, so why not have the system generate that for us?

All I did was to modify the original select query as follows:

```
SELECT '(' + CAST(UserID AS VARCHAR(25)) + ', ''' + Email + ''', ''' + FirstName + ''', ''' + MiddleName + ''', ''' + LastName + ''', ' + CAST(IsImpeached AS CHAR(1)) + ')'
FROM Users
__________________________________________________________________________
(1, 'geo...@presidents.gov', 'George', '', 'Washington', 0)
(2, 'j...@presidents.gov', 'John', '', 'Adams', 0)
(3, 'tho...@presidents.gov', 'Thomas', '', 'Jefferson', 0)
(4, 'jam...@presidents.gov', 'James', '', 'Madison', 0)
(5, 'jame...@presidents.gov', 'James', '', 'Monroe', 0)
(6, 'joh...@presidents.gov', 'John', 'Q.', 'Adams', 0)
(7, 'and...@presidents.gov', 'Andrew', '', 'Jackson', 0)
(8, 'mar...@presidents.gov', 'Martin', '', 'Van Buren', 0)
(9, 'will...@presidents.gov', 'William', 'Henry', 'Harrison', 0)

```

As you can see, this reduces the effort for the brute force method quite significantly, even to the point that it might not be able to be called the brute force method anymore. All that has to be done is to copy the the part of the INSERT statement that precedes the actual values.

### Now I’m Just Being Lazy

Someone once suggested that innovation sometimes comes just because we are lazy. I know what that means–we’re not really lazy, we just want to do more interesting things, so we automate the simple. It just occurred to me that this could be simplified even more by including the “INSERT INTO… VALUES prefix” into the SELECT statement. This tactic eliminates all manual addition of commas, parentheses, ticks and everything else. Then all I have to do is copy all the rows, paste them into a query window and run it.

```
SELECT 'INSERT INTO Users (UserID, Email, FirstName, LastName, MiddleName, IsImpeached )
VALUES (' +
CAST(UserID AS VARCHAR(25)) + ', ''' + Email + ''', ''' + FirstName + ''', '''
+ MiddleName + ''', ''' + LastName + ''', ' + CAST(IsImpeached AS CHAR(1)) + ')'
+ '
'
FROM Users
__________________________________________________________________________
INSERT INTO Users (UserID, Email, FirstName, LastName, MiddleName, IsImpeached)
VALUES	(1, 'geo...@presidents.gov', 'George', '', 'Washington', 0)

INSERT INTO Users (UserID, Email, FirstName, LastName, MiddleName, IsImpeached)
VALUES	(2, 'j...@presidents.gov', 'John', '', 'Adams', 0)

INSERT INTO Users (UserID, Email, FirstName, LastName, MiddleName, IsImpeached)
VALUES	(3, 'tho...@presidents.gov', 'Thomas', '', 'Jefferson', 0)

INSERT INTO Users (UserID, Email, FirstName, LastName, MiddleName, IsImpeached)
VALUES	(4, 'jam...@presidents.gov', 'James', '', 'Madison', 0)

INSERT INTO Users (UserID, Email, FirstName, LastName, MiddleName, IsImpeached)
VALUES	(5, 'jame...@presidents.gov', 'James', '', 'Monroe', 0)

INSERT INTO Users (UserID, Email, FirstName, LastName, MiddleName, IsImpeached)
VALUES	(6, 'joh...@presidents.gov', 'John', 'Q.', 'Adams', 0)

INSERT INTO Users (UserID, Email, FirstName, LastName, MiddleName, IsImpeached)
VALUES	(7, 'and...@presidents.gov', 'Andrew', '', 'Jackson', 0)

INSERT INTO Users (UserID, Email, FirstName, LastName, MiddleName, IsImpeached)
VALUES	(8, 'mar...@presidents.gov', 'Martin', '', 'Van Buren', 0)

INSERT INTO Users (UserID, Email, FirstName, LastName, MiddleName, IsImpeached)
VALUES	(9, 'will...@presidents.gov', 'William', 'Henry', 'Harrison', 0)

```

### Conclusion

So, obviously, if you have SQL Data Comparison tools the automate this, that is preferable. However, if that is tedious to set up, or if you don’t have access to such tools, this can make copying rows for small tables much less tedious.

##### Note

You will probably have to add “SET IDENTITY_INSERT Users ON’ at the beginning and “SET IDENTITY_INSERT Users OFF” at the end.

## SQL Query Not Returning All Results When Matching on Multiple Search Terms – Space Character Issue

Friday, June 28th, 2013

## The Problem

I ran into an obscure issue today with a SQL Query that seemed to be failing to match on multiple words. For instance, it would match correctly on “search” but not on “search terms”. The query returned some matches but not all.

After a good bit of investigation, I finally realized that some of the data being matched on contained non-standard space characters, specifically an encoded non-breaking space. The way I originally detected this was to simply replace spaces with the pound sign (#).

```
SELECT Name,
REPLAC E(TheName, ' ', '#')
FROM MyTable

/* not sure why, but REPLACE was causing formatting issues,
so I added the _ afterwards, and it fixed it. */

```

From the results, I saw that not all the spaces were being replaced correctly. I was getting something like this:

Name with spaces replaced     Name
Name of table row Name of#table#row

Notice that the first space was not replaced, while the others were. Once I noticed this, I cast the Name column as a varbinary:

```

SELECT CAST(Name as varbinary(250))
FROM MyTable

```

This yielded the following value:

```
N  a  m  e     o  f     t  a  b  l  e     r  o  w
0x4E 61 6D 65 20 6F 66 A0 74 61 62 6C 65 A0 72  6F 77

```

Notice that the last two spaces are the code A0 (the non-breaking space) rather than 20 (standard space).

## The Solution

Once I was able to find this, the solution was pretty easy. Since the non-breaking spaces were not intended, all I had to do was change them to standard spaces, and I accomplished this with the following query (Character 160 is the non-breaking space):

```
UPDATE MyTable
SET Name = REPLAC E(Name, CHAR(160), ' ')

```

After this, the matching on multiple terms worked as desired.

## Managing a SQL Server Database: Tips for Beginners

Tuesday, April 23rd, 2013

The company I work for is a web development company that uses SQL Server for its databases. The company is not quite big enough to have a DBA, so each of the developers takes a varying amount of responsibility to manage the databases. Over time, I have gravitated perhaps more than the others toward this responsibility and have learned several things that have really helped to maintain them well. If you are in the same position, I hope these things can be a benefit to you as well.

## 1. Index Fragmentation

Over time, indexes inevitably get fragmented and need to be rebuilt. A table with GUID’s that are not sequential will fragment really quickly if there are a good amount of inserts regularly. So, how do you know if you need to rebuilding your indexes? Fortunately, you can run a simple query to get this information. It should be noted that you need not worry about fragmentation on indexes that are very small. The following query gets the fragmentation statistics of indexes with a (memory) page count greater than 50 and a fragmentation greater than 25 percent. (Both of these variables may be adjusted.)

```

SELECT	sys.objects.name AS [Table Name],
sys.indexes.name AS [Index Name],
partition_number AS [Partition],
CAST(avg_fragmentation_in_percent AS DECIMAL (4,2)) AS [Fragmentation Percentage],
CONVERT(decimal(18,2), page_count * 8 / 1024.0) AS [Total Index Size (MB)],
page_count AS [Page Count]
FROM sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL , NULL, 'LIMITED') IndexStats
INNER JOIN sys.objects
ON sys.objects.object_id = IndexStats.object_id
INNER JOIN SYS.indexes
ON SYS.indexes.object_id = SYS.OBJECTS.object_id
AND IndexStats.index_id = sys.indexes.index_id
WHERE avg_fragmentation_in_percent > 25 AND Indexes.index_id > 0
AND page_count > 50

```

So, how did it look? If you have index fragmentation percentages over 90 for indexes that use a lot of pages, perhaps more than 500, then you probably need to set up a SQL Agent job to run weekly to rebuild the indexes. For more information on how to set this up, see SQL Server Optimization Script with Index Rebuilding to Reduce Fragmentation

## 2. Identifying and Optimizing Expensive Queries

If your database has never been analyzed or profiled, there may be some queries that are really inefficient and may be impacting your overall site performance. Oftentimes, when new tables are created, indexes are added that are thought to be necessary. However, only real world experiences will reveal if an additional index is needed or if a query needs to be rewritten to leverage indexes that are already built. To identify expensive queries, use the SQL Server Profiler tool from the Tools menu in SQL Server Management Studio.

For starters, I would suggest running it for five minutes, setting a threshold to ignore queries with less than 25 reads. Make sure you write the results to a database table. That way you can run queries on them and perform some careful analysis. Oftentimes, adding the right indexes or restructuring a few queries can drastically reduce the load on your database.

For more details on how to do this, see SQL Server Database Optimization ? a Beginner?s Guide.

## 3. Running Reports to Identify Largest Tables and Indexes

Do you know what your largest tables are? Do you know which table requires the most space for its indexes? Well, it would be really good to know both of those things, and it’s pretty easy to find that information. Right-click on your database, scroll down to “Reports”, then “Standard Report”, and then “Disk Usage by Top Tables”. This will provide you with some great information. It comes initially sorted by the amount of reserved space used per table. This is a combination of the space for the data and the indexes combined. You also can see the number of records in each table and the amount of unused space.

Knowing which of your tables is the largest and which table requires the most space for its indexes can help direct your efforts in analyzing how your database. Sometimes you can find an index that is really large but is unnecessary. I found a column in a table that was very large, but was no longer being used or referenced anywhere in our system. I was able to simply set all the values of the column to blanks and recoup quite a bit of space, and, again, this was for data that we were not using at all anymore. This will often highly log tables that contain data that is no longer necessary as well. See later in this post on removing such tables.

## 4. Cleaning up log tables

Oftentimes a database will have tables for errors, actions, emails, and various other things like that. However, it is rarely necessary to store all of this data in the database forever. An entry in the error log may be of interest for a month or two, but after that amount of time has passed, it has either been resolved or was not important enough to deal with. Furthermore, as the database changes, it becomes more and more difficult to really recreate what caused the problem.

If you have a good database backup plan–keeping copies of daily, weekly, monthly, and yearly backups, then you can probably remove old rows. If you keep every yearly backup, then should be fine to remove all rows older than one year. The January 1st backup will contain all of the errors for the previous year, so removing older rows only removes them from the current version of the database. You can still get back to older rows by restoring annual backups if your really need to do so.

For other tables, such as action logs, it may be necessary to be able to go back as much as a year, but if there is no business reason for being able to go back farther than that, you can probably remove rows older than perhaps two months. Your monthly backups will have a record that you can retrieve if necessary.

These types of tables can grow to be fairly large over time. If the tables are indexed, then your index rebuilding jobs will have to deal with them and basic inserts will become slower as well. Consequently, taking a careful look at tables like these and then scheduling a monthly SQL Agent job to remove old rows can really serve to keep your database lean and efficient.

## 5. Identifying and Removing Unused Indexes

There is probably a good chance that you have some indexes in your database that are not being used. These unused indexes can sometimes be rather large. This results in increased database size and backup storage, decreased insert time, and increased time to rebuild the indexes when they get too fragmented. So, it’s probably good to run a query to see if you have any unused indexes that are medium to large. You probably don’t need to worry about small indexes that are not currently being used.

There are actually two queries that you can run. There is an index usage table that you can reference and see when certain values are set to zero. Those indexes are not being used. However, there are also some indexes that, for some reason, never seem to even show up in that table. So, you have to run two different queries:

### Indexes Not In Index Usage Table

```

-- GET UNUSED INDEXES THAT DO **NOT** APPEAR IN THE INDEX USAGE STATS TABLE
DECLARE @dbid INT
SELECT @dbid = DB_ID(DB_NAME())

SELECT	Databases.Name AS [Database],
object_name(Indexes.object_id) AS [Table],
Indexes.NAME AS [Index],
Indexes.INDEX_ID,
PhysicalStats.page_count as [Page Count],
CONVERT(decimal(18,2), PhysicalStats.page_count * 8 / 1024.0) AS [Total Size (MB)],
CONVERT(decimal(18,2), PhysicalStats.avg_fragmentation_in_percent) AS [Frag. %]
FROM SYS.INDEXES Indexes
INNER JOIN SYS.OBJECTS Objects ON Indexes.OBJECT_ID = Objects.OBJECT_ID
LEFT JOIN sys.dm_db_index_physical_stats(@dbid, null, null, null, null) PhysicalStats
ON PhysicalStats.object_id = Indexes.object_id
AND PhysicalStats.index_id = indexes.index_id
INNER JOIN sys.databases Databases
ON Databases.database_id = PhysicalStats.database_id
WHERE Objects.type = 'U' -- User Table
AND Indexes.is_primary_key = 0
AND Indexes.type = 2	-- Nonclustered indexes
AND Indexes.INDEX_ID NOT IN (
SELECT UsageStats.INDEX_ID
FROM SYS.DM_DB_INDEX_USAGE_STATS UsageStats
WHERE UsageStats.OBJECT_ID = Indexes.OBJECT_ID
AND   Indexes.INDEX_ID = UsageStats.INDEX_ID
AND   DATABASE_ID = @dbid)

```

### Unused Indexes In the Index Usage Table

```

-- GET UNUSED INDEXES THAT APPEAR IN THE INDEX USAGE STATS TABLE
DECLARE @MinimumPageCount int
SET @MinimumPageCount = 500

SELECT	Databases.name AS [Database],
object_name(Indexes.object_id) AS [Table],
Indexes.name AS [Index],
PhysicalStats.page_count as [Page Count],
CONVERT(decimal(18,2), PhysicalStats.page_count * 8 / 1024.0) AS [Total Size (MB)],
CONVERT(decimal(18,2), PhysicalStats.avg_fragmentation_in_percent) AS [Frag. (%)],
ParititionStats.row_count AS [Row Count],
CONVERT(decimal(18,2),
(PhysicalStats.page_count * 8.0 * 1024) / ParititionStats.row_count)
AS [Index Size/Row (Bytes)],
UsageStats.last_user_scan,
UsageStats.last_user_seek
FROM sys.dm_db_index_usage_stats UsageStats
INNER JOIN sys.indexes Indexes
ON Indexes.index_id = UsageStats.index_id
AND Indexes.object_id = UsageStats.object_id
INNER JOIN SYS.databases Databases
ON Databases.database_id = UsageStats.database_id
INNER JOIN sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL)
AS PhysicalStats
ON PhysicalStats.index_id = UsageStats.Index_id
AND PhysicalStats.object_id = UsageStats.object_id
INNER JOIN SYS.dm_db_partition_stats ParititionStats
ON ParititionStats.index_id = UsageStats.index_id
AND ParititionStats.object_id = UsageStats.object_id
WHERE ((UsageStats.user_scans = 0 AND UsageStats.user_seeks = 0)
OR (UsageStats.last_user_scan < DATEADD(year, -1, getdate())
AND (UsageStats.last_user_seek < DATEADD(year, -1, getdate()))))
-- ignore indexes with less than a certain number of pages of memory
AND PhysicalStats.page_count > @MinimumPageCount
-- Exclude primary keys, which should not be removed
AND Indexes.is_primary_key = 0

```

For more information on this, see Identifying Unused Indexes in a SQL Server Database

## 6. Identifying and Removing Duplicate or Near Duplicate Indexes

In addition to removing unused indexes, you can also remove duplicate indexes or near-duplicate indexes. Obviously, a duplicate index is unnecessary, but sometimes a near-duplicate index can be removed without any adverse effects. I have found two indexes that used all of the same columns in the index itself and only differed in the columns that were “included” in the index (but not indexed). If there are two indexes that only differ by one included column and the index is large, then you may be able to remove the simpler index. Of course, it’s a good practice to save a script that can quickly regenerate the index if you find that it is actually necessary after all.

Ok, so here’s the query:

```

WITH IndexSummary AS
(

SELECT DISTINCT sys.objects.name AS [Table Name],
sys.indexes.name AS [Index Name],
SUBSTRING((SELECT ', ' +  sys.columns.Name as [text()]
FROM sys.columns
INNER JOIN sys.index_columns
ON sys.index_columns.column_id = sys.columns.column_id
AND sys.index_columns.object_id = sys.columns.object_id
WHERE sys.index_columns.index_id = sys.indexes.index_id
AND sys.index_columns.object_id = sys.indexes.object_id
AND sys.index_columns.is_included_column = 0
0RDER BY sys.columns.name
FOR XML Path('')), 2, 10000) AS [Indexed Column Names],
ISNULL(SUBSTRING((SELECT ', ' +  sys.columns.Name as [text()]
FROM sys.columns
INNER JOIN sys.index_columns
ON sys.index_columns.column_id = sys.columns.column_id
AND sys.index_columns.object_id = sys.columns.object_id
WHERE sys.index_columns.index_id = sys.indexes.index_id
AND sys.index_columns.object_id = sys.indexes.object_id
AND sys.index_columns.is_included_column = 1
0RDER BY sys.columns.name
FOR XML Path('')), 2, 10000), '') AS [Included Column Names],
sys.indexes.index_id, sys.indexes.object_id
FROM sys.indexes
INNER JOIN SYS.index_columns
ON sys.indexes.index_id = SYS.index_columns.index_id
AND sys.indexes.object_id = sys.index_columns.object_id
INNER JOIN sys.objects
ON sys.OBJECTS.object_id = SYS.indexES.object_id
WHERE sys.objects.type = 'U'
)

SELECT IndexSummary.[Table Name],
IndexSummary.[Index Name],
IndexSummary.[Indexed Column Names],
IndexSummary.[Included Column Names],
PhysicalStats.page_count as [Page Count],
CONVERT(decimal(18,2), PhysicalStats.page_count * 8 / 1024.0) AS [Size (MB)],
CONVERT(decimal(18,2), PhysicalStats.avg_fragmentation_in_percent) AS [Fragment %]
FROM IndexSummary
INNER JOIN sys.dm_db_index_physical_stats (DB_ID(), NULL, NULL, NULL, NULL)
AS PhysicalStats
ON PhysicalStats.index_id = IndexSummary.index_id
AND PhysicalStats.object_id = IndexSummary.object_id
WHERE (SELECT COUNT(*) as Computed
FROM IndexSummary Summary2
WHERE Summary2.[Table Name] = IndexSummary.[Table Name]
AND Summary2.[Indexed Cols] = IndexSummary.[Indexed Cols]) > 1
0RDER BY [Table Name], [Index Name], [Indexed Column Names], [Included Column Names]

```

For more information on this, see Query to Identify Duplicate or Redundant Indexes in SQL Server

## 7. Checking Identity Columns for Overflow Risk

If your database is using integer identity columns, then the largest ID that can be stored is 2,147,483,647 (2.1 billion). Most tables never come close to overflowing, but it is definitely better to make sure than to not be sure and experience an overflow one day. So, here’s an easy query to quickly check:

```

SELECT sys.tables.name AS [Table Name],
last_value AS [Last Value],
CASE (MAX_LENGTH)
WHEN 1 THEN 'TINYINT'
WHEN 2 THEN 'SMALLINT'
WHEN 4 THEN 'INT'
WHEN 8 THEN 'BIGINT'
ELSE 'UNKNOWN'
END AS DataType,
CAST(cast(last_value as int) / 2147483647.0 * 100.0 AS DECIMAL(5,2))
AS [Percentage of ID's Used]
FROM sys.identity_columns
INNER JOIN sys.tables
ON sys.identity_columns.object_id = sys.tables.object_id

```

For more information, see Checking Integer Identity Columns in SQL Server for Overflow Risk

## Conclusion

Alright, so there are some tips that I hope you will find helpful. Good luck!