How to deal with MongoDB shell output in your shell scripts – Part 1/2: produce valid JSON output

Today, I will show you how to easily deal with MongoDB shell output in your shell (bash, ksh, …) scripts.

For readybility purpose, this topic will be divided into 2 posts. This part will focus on how to produce valid JSON output from your MongoDB shell using the “–eval” option. Second part will focus on how to use “jq” to manipulate JSON data.

Why is it not that simple?

Because you can’t just do:

mongo --quiet --eval "printjson(db.getSiblingDB('mydatabase').mycollection.find())"

or

mongo --quiet --eval "print(db.getSiblingDB('mydatabase').mycollection.find())"

If you execute this command on your MongoDB installation, you will get an unexpected result! (I will not show the output because it is not very interesting. If you are curious, you can try it by yourself).

In fact, when you run these commands, you tell MongoDB to print the cursor details, not the documents the cursor retrieves.

Ok, but how can I get JSON output from my query?

Built-in methods

MongoDB provides you 2 built-in methods which allow you to return all the documents from a cursor:

  • shellPrint()
  • toArray()

In our case, “shellPrint()” is not possible because we want JSON format but the function return the results as strings.

The “toArray()” method is more suitable because it returns an array of JSON objects that contains all the documents from the cursor.

Let’s give it a try with my “pokemon” collection from my “aro” database:

mongo --quiet --eval "db.getSiblingDB('arotest').pokemon.find().toArray()"

Output:

[
        {
                "_id" : ObjectId("58f56171ee9d4bd5e610d6b7"),
                "id" : 116,
                "num" : "116",
                "name" : "Horsea",
                "img" : "http://www.serebii.net/pokemongo/pokemon/116.png",
                "type" : [
                        "Water"
                ],
                "height" : "0.41 m",
                "weight" : "8.0 kg",
                "candy" : "Horsea Candy",
                "candy_count" : 50,
                "egg" : "5 km",
                "spawn_chance" : 1.13,
                "avg_spawns" : 113,
                "spawn_time" : "02:53",
                "multipliers" : [
                        2.23
                ],
                "weaknesses" : [
                        "Electric",
                        "Grass"
                ],
                "next_evolution" : [
                        {
                                "num" : "117",
                                "name" : "Seadra"
                        }
                ]
        }
]

Looks great, but did you notice the format of the “_id” field in the output? Because ObjectId(“58f56171ee9d4bd5e610d6b7”) is not surrounded by double quotes, that is not valid JSON! (you can use the excellent jsonlint if you need to validate JSON data).

We will see in the last part how to handle this.

Custom function

There are many ways to get JSON output with a custom function. I will only show you my favorite:

mongo --quiet --eval "db.getSiblingDB('arotest').pokemon.find().forEach(printjson)"

Output:

{
        "_id" : ObjectId("58f56171ee9d4bd5e610d6b7"),
        "id" : 116,
        "num" : "116",
        "name" : "Horsea",
        "img" : "http://www.serebii.net/pokemongo/pokemon/116.png",
        "type" : [
                "Water"
        ],
        "height" : "0.41 m",
        "weight" : "8.0 kg",
        "candy" : "Horsea Candy",
        "candy_count" : 50,
        "egg" : "5 km",
        "spawn_chance" : 1.13,
        "avg_spawns" : 113,
        "spawn_time" : "02:53",
        "multipliers" : [
                2.23
        ],
        "weaknesses" : [
                "Electric",
                "Grass"
        ],
        "next_evolution" : [
                {
                        "num" : "117",
                        "name" : "Seadra"
                }
        ]
}

Basically, the “forEach” function will iterate throught all records returned by the cursor. Then, the “printjson” function print the record in JSON format.

The main difference with the “toArray()” method is that we do not get an array of JSON documents but JSON documents directly (notice the absence of the brackets in the output). That is more convenient to parse with “jq”.

Unfortunately, for the same reason we saw with the “toArray ()” method, the function does not produce valid JSON! We will focus on that point on the next part.

How to get valid JSON

This is a crucial point if you need to parse your JSON with “jq”. If you give to “jq” invalid JSON data, you will get an error like this one:

parse error: Invalid numeric literal at line 2, column 19

In my example, the only problematic field was “_id”. You have two ways to handle this.

The lazy way

Because “_id” is the problem, you can just exclude the field from the output of your query:

mongo --quiet --eval "db.getSiblingDB('arotest').pokemon.find({},{_id:0}).toArray()"

or

mongo --quiet --eval "db.getSiblingDB('arotest').pokemon.find({},{_id:0}).forEach(printjson)"

If you don’t need to manipulate “_id” in your script, it is the simplest solution!

Now, if you need to manipulate this field, you will have to add a little piece of javascript in your “forEach” function.

The “hard” way

In fact, that’s not that hard. We just have to convert the “ObjectId” javascript object to a valid JSON string.

You have two ways to achieve this. Both does not produce the same result.

First way

mongo --quiet --eval 'db.getSiblingDB("arotest").pokemon.find().forEach(function(results){results._id=results._id.valueOf();printjson(results)})'

Output:

{
        "_id" : "58f56171ee9d4bd5e610d6b7",
        "id" : 116,
        "num" : "116",
        "name" : "Horsea",
        "img" : "http://www.serebii.net/pokemongo/pokemon/116.png",
        "type" : [
                "Water"
        ],
        "height" : "0.41 m",
        "weight" : "8.0 kg",
        "candy" : "Horsea Candy",
        "candy_count" : 50,
        "egg" : "5 km",
        "spawn_chance" : 1.13,
        "avg_spawns" : 113,
        "spawn_time" : "02:53",
        "multipliers" : [
                2.23
        ],
        "weaknesses" : [
                "Electric",
                "Grass"
        ],
        "next_evolution" : [
                {
                        "num" : "117",
                        "name" : "Seadra"
                }
        ]
}

Here, we use the javascript method “valueOf ()” to get the primitive value of the field “_id”. In other words, we extract the value of the object “ObjectId” and convert it to a string. As you can see, the field “_id” no longer contains “ObjectId”.

Second way

mongo --quiet --eval 'db.getSiblingDB("arotest").pokemon.find().forEach(function(results){results._id=results._id.toString();printjson(results)})'

Output:

{
        "_id" : "ObjectId(\"58f56171ee9d4bd5e610d6b7\")",
        "id" : 116,
        "num" : "116",
        "name" : "Horsea",
        "img" : "http://www.serebii.net/pokemongo/pokemon/116.png",
        "type" : [
                "Water"
        ],
        "height" : "0.41 m",
        "weight" : "8.0 kg",
        "candy" : "Horsea Candy",
        "candy_count" : 50,
        "egg" : "5 km",
        "spawn_chance" : 1.13,
        "avg_spawns" : 113,
        "spawn_time" : "02:53",
        "multipliers" : [
                2.23
        ],
        "weaknesses" : [
                "Electric",
                "Grass"
        ],
        "next_evolution" : [
                {
                        "num" : "117",
                        "name" : "Seadra"
                }
        ]
}

Here, we use the javascript method “toString()” to get a string representing the object. As you can see, “ObjectId” is still present in the “_id” field but this time, as a string. It is now a valid JSON output.

Both ways are good. Just choose the one that best suits your needs.

You now know how to produce valid JSON from your MongoDB shell with the “-eval” option. But most often, you have to manipulate the JSON output in order to extract specific information. This is what I will show you in my next post.

Leave a Reply

Your email address will not be published. Required fields are marked *