Get indexes usage for all collections of all databases of your MongoDB instance

One possible area of ​​improvement when you encounter bottlenecks when inserting or updating in MongoDB is to limit the number of indexes on your collections.

Even though indexes are good for queries against collections, they add an extra step when inserting or updating, which may take some time. Given this fact, it may be wise to delete unused indexes.

So, today, I’m going to give you a Mongo shell script that I use to get statistics on indexes usage for all collections on a MongoDB instance and to identify those that are not used.

What does the script do?

The script will give you the following informations for each collection of each database:

  • Database name
  • Collection name
  • Index name
  • Date from which statistics has been gathered (basically, the startup date of your MongoDB instance or the date of index creation)
  • Number of time the index has been used since the gathering start date

The output is in CSV format.

Consideration

The script is based on the aggregation pipeline stage “$indexStats” which have some restrictions.

  • It exists only since MongoDB 3.2
  • Before MongoDB 3.2.3, the number of time the index has been used does not take into account “$match” or “mapReduce” operations.
  • The statistics does not include internal operations like deletion via TTL Indexes or chunk split and migration operations (for sharded cluster environment).

If you are working in a sharded cluster environment, the script must be run while logged in through mongos to take into account statistics for all shards.

Because the use of indexes depends on the activities of your database, you must make sure that the load is representative of the normal activity of your database before making any decision.

Script

First, you need to connect to your mongo instance (or mongos) with a user which have privileges that include indexStats action.

mongo --host localhost:27017 --authenticationDatabase admin -u admin -p admin

Then, just run this script and wait for the end.

db.adminCommand('listDatabases').databases.forEach(function(e){
	if ((e.name == "admin" || e.name == "config" || e.name == "local")) return;
	var database=e.name;
	context=db.getSiblingDB(database);
	context.getCollectionNames().forEach(function(collection){
		records=context.getCollection(collection).aggregate( 
				[ 
					{ $indexStats: { } },
					{
						"$group" :
						{
							_id : { name: "$name"},
							accesses:{$sum:"$accesses.ops"},
							since:{$min:"$accesses.since"},
						}
					},
					{
						"$project": 
						{
							_id:0,
							name:"$_id.name",
							since:{ $dateToString: { format: "%Y-%m-%d-%H:%M:%S", date: "$since" } },
							accesses:"$accesses",
						}
					}
				]
			).forEach(function(index){
				idx=index.name;
				since=index.since;
				accesses=index.accesses;
				print(database+";"+collection+";"+idx+";"+since+";"+accesses);
			});
	});
});

Output will look like:

arotest;crunchbase;_id_hashed;2018-12-06-13:37:35;0
arotest;crunchbase;_id_;2018-12-06-13:37:35;0
arotest;crunchbase;offices.location_2dsphere;2018-12-06-13:37:35;0
arotest;enron_messages;_id_hashed;2018-12-06-13:37:35;0
arotest;enron_messages;_id_;2018-12-06-13:37:35;1
arotest;pokemon;testunique;2018-12-06-13:37:35;0
arotest;pokemon;_id_hashed;2018-12-06-13:37:35;0
arotest;pokemon;_id_;2018-12-06-13:37:35;0
arotest;test;_id_hashed;2018-12-06-13:37:35;0
arotest;test;verified_1;2018-12-06-13:37:35;0
arotest;test;_id_;2018-12-06-13:37:35;0
arotest;testrestore;_id_;2018-12-06-13:37:40;0

The columns order is:”database name”;”collection name”;”index name”;”gathering start date”, “number of operations that use the index”.

(This is a test environment so indexes are not often used …)

Explanations

There is not much to explain about the results, but you may have noticed the use of “$group” in the script.

In fact, this is only useful in sharded cluster environment because “$indexStats” returns one json document per shard. With “$group”, we can aggregate the data and get only one json document for each index.

“$project” is only used for date formatting. If the date format does not suits your needs, you can modify it with the “$dateToString” format specifiers.

 

I hope you find this script useful. Stay tuned for more DBA stuff!

Leave a Reply

Your email address will not be published. Required fields are marked *