How to deal with MongoDB shell output in your shell scripts – Part 2/2: parsing JSON with jq

If you read my last post, you now know how to get a valid JSON output from your MongoDB shell.

But raw JSON is often useless in your scripts because you have to manipulate it to extract only the information you need.

For this purpose, I personalty use a free tool called “jq“. It is very powerful and has many advanced features. I will only show you some basic concepts that will allow you to simply deal with JSON data in your scripts.

How it works

From the creator’s page:

jq is like sed for JSON data – you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text.

Basically, jq works as a filter on JSON data: it takes a JSON as first input and produces an output that takes into account the filters you pass to it. You can combined filters, modify data on the fly, … (have a look at the jq manual if you need a complete overview of all available options).

For the purpose of this post, the most important part to understand is that jq processing is based on “stage” (or “pipeline”). Each stage takes as input the result of the previous stage. Different stages are separated by a pipe (“|”).

It’s a bit difficult to understand at the moment, but it will be clearer with the following examples.

Context

For the rest of this article, I will work on the collection “pokemon” of my database “aro” (you can get the pokemon on github). A typical document looks like this:

{
        "_id" : ObjectId("58f56171ee9d4bd5e610d6b7"),
        "id" : 116,
        "num" : "116",
        "name" : "Horsea",
        "img" : "http://www.serebii.net/pokemongo/pokemon/116.png",
        "type" : [
                "Water"
        ],
        "height" : "0.41 m",
        "weight" : "8.0 kg",
        "candy" : "Horsea Candy",
        "candy_count" : 50,
        "egg" : "5 km",
        "spawn_chance" : 1.13,
        "avg_spawns" : 113,
        "spawn_time" : "02:53",
        "multipliers" : [
                2.23
        ],
        "weaknesses" : [
                "Electric",
                "Grass"
        ],
        "next_evolution" : [
                {
                        "num" : "117",
                        "name" : "Seadra"
                }
        ]
}

The collection contains 151 pokemons.

I will not repeat the MongoDB command I use to produce the output for readability purpose. It will be replaced by “mongo” but the full command behind will always be (except for one example, where I will mention it):

mongo --quiet --eval "db.getSiblingDB('arotest').pokemon.find({},{_id:0}).forEach(printjson)"

(Yes, the lazy way to deal with “ObjectID”…)

Now, let’s start playing with jq!

Examples of data manipulation

Format JSON

That is the most basic usage of jq. If you do not apply filters, it allows you to format and colorized bad formatted JSON.

mongo |jq '.'

Extract JSON from an array of JSON objects

Only for this example, I will use the “toArray()” function from MongoDB to print JSON data.

If you need to extract JSON objects from an array of JSON objects as produced by the “toArray()” function, you can use:

mongo --quiet --eval "db.getSiblingDB('arotest').pokemon.find({},{_id:0}).toArray()"|jq '.[]'

With this code, we tell jq to output the content of the array passed as input. If you plan to use “toArray()” to work with MongoDB shell and JSON, you will need to add “.[]” as the first stage of each jq commands (and for each of my examples).

Extract specific field

If you want to extract the “name” field from all documents returned by MongoDB:

mongo||jq '.name'

Output (truncated):

"Electrode"
"Mr. Mime"
"Horsea"

Note: in your shell scripts, it may be easier to deal with string without double quotes. To achieve that, simply add “-r” option to jq:

mongo|jq -r '.name'

Output:

Electrode
Mr. Mime
Horsea

Filter the data

Simple condition

Here we just want to get all the information for the 116th Pokemon:

mongo|jq 'select(.num=="116")'

Note the double quotation marks around “116”. The “select” condition must match the data type of your JSON input. In my “pokemon” collection, the “num” field is a string. I could have used the “id” field instead (which is a numeric data type) and replaced the select command with:

mongo|jq 'select(.id==116)'

You can use all usual operators: =, !=, <, >, …

Many conditions

We can also filter on multiple values:

mongo|jq 'select(.num == ("116", "117"))'

Or use boolean operators in order to add multiple conditions:

mongo|jq -r 'select((.candy=="Magnemite Candy") and .egg=="Not in Eggs")'

Filter on the contents of an array embedded in a JSON object

My documents also contain arrays like:

"type" : [
                "Electric"
        ],

With jq, you can easily apply filters on the content of an array embedded in your JSON object! Here, I want to retrieve all “Eletric” Pokemons.

mongo|jq -r 'select(.type[]?=="Electric")'

Note the question mark after the square brackets. This is very useful if you are unsure whether each JSON object contains the field on which you are filtering. If you do not set the question mark and the field does not exist, you will get the following error:

jq: error: Cannot iterate over null

Filter and projection

This time, we want only the name and the weight of the 116th Pokemon not the entire document:

mongo|jq 'select(.num=="116")|.name,.height'

We added an additional stage “.name,.height” that will apply to the data returned by the first stage “select”.

A more complex example: we want the Pokemon number for the next evolution of Pokemon 116 (remember: the next evolution number is embedded in an array in the document):

mongo|jq 'select(.num == "116")|.next_evolution[]|.num'

I think these examples are sufficient to cover the majority of situations that a DBA may encounter in its shell scripts. But, there is one more thing to know in order to be fully ready.

How to pass variables to jq

You can not put shell variables directly into your jq code. You must pass them as an argument to jq. It’s really simple!

Here, I want to retrieve information about the Pokemon 116 but I do not want to hardcode it in the jq code:

mongo|jq --arg pokemon "116" 'select(.num == $pokemon)'

We can also pass multiple arguments:

mongo|jq --arg type "Electric" --arg weaknesses "Fire" 'select((.type[]?==$type) and .weaknesses[]?==$weaknesses)|{id:.id,name:.name}'

Small bonus: watch the last “stage”. With this syntax, the output will be valid JSON:

{
  "id": 82,
  "name": "Magneton"
}
{
  "id": 81,
  "name": "Magnemite"
}

I hope this post has been useful. Stay tuned for more DBA stuff!

Leave a Reply

Your email address will not be published. Required fields are marked *