A Blog

A Blog

Where I write anything about everything every once in a while

18 May 2015

Nested Documents in ElasticSearch

ElasticSearch is an incredibly powerful tool, going well beyond just full text search. Flat JSON documents can take you a long way, but sometimes you need more. Nested documents in ElasticSearch let you model more complex data in the index. By default, ES will flatten child objects so that

{
  "id": 1,
  "children": [
    { "name": "Ben", "age": 10 },
    { "name": "Jenny", "age": 12 }
  ]
}

would become

{
  "id": 1,
  "children.name": [ "Ben", "Jenny" ],
  "children.age": [ 10, 12 ]
}

The association of the data has been obscured. In many cases, this may not be a problem. If this isn’t going to work, you will need to move to nested documents. The ES documentation has a great explanation of nested documents and how to define them.

Setting up

We have documents that have an array of features. The mapping for this looks like this:

"my_type": {
  "id": 1234,
  "properties": {
      "features": {
          "type": "nested",
            "properties": {
        "type": { "type": "long" },
                "id": { "type": "long" },
                    "score": { "type": "double" }
      }
    }
  }
}

With this mapping in place, ES will not flatten our child documents. It’s important to realize that each nested document is its own document. If you add a single document with 3 features, you have actually added 4 documents to the index. ES handles this under the covers.

Filtering

The ES documentation is very helpful with the syntax, but the concept of the nested document is what has given me a headache or two. Let us consider a filter based on a particular feature with the ID 1234. You can find the documentation and syntax for nested filter queries here.

"filter" : {
  "and" : {
    "filters" : [ {
      "nested": {
        "path": "features",
        "filter": {
          "bool": {
            "must": { "term": { "features.id": 1234 } }
          }
        }
      }
    } ]
  }
}

This will work as expected. What if we want to filter based on feature 1234 and feature 9876?

"filter" : {
  "and" : {
    "filters" : [ {
      "nested": {
        "path": "features",
        "filter": {
          "bool": {
            "must": [
              { "term": { "features.id": 1234 } },
              { "term": { "features.id": 9876 } }
            ]
          }
        }
      }
    } ]
  }
}

This will return no results. In fact, this would never return any results at all. This is where nested documents get tricky. It helped me to understand by imagining we have navigated down to an individual nested document. That document has one ID, but we are asking for a nested document that has ID 1234 and ID 9876, which is impossible.

It turns out, if we want this to work as we expect, we need multiple nested filters. While uglier, it does work.

"filter" : {
  "and" : {
    "filters" : [ {
      "nested": {
        "path": "features",
        "filter": {
          "bool": {
            "must": { "term": { "features.id": 1234 } }
          }
        }
      }, {
      "nested": {
        "path": "features",
        "filter": {
          "bool": {
            "must": { "term": { "features.id": 1234 } }
          }
        }
      }
    } ]
  }
}

Remember, when you “nest”, you are navigating to a new document, and are no longer matching the parent document. It turns out there may be a shortcut to the above. While it works, I could not locate documentation to support why.

"filter" : {
  "and" : {
    "filters" : [ {
      "nested": {
        "path": "features",
        "filter": {
          "bool": {
            "must": [
              { "term": { "features.id": [ 1234, 9876 ] } }
            ]
          }
        }
      }
    } ]
  }
}

Aggregations

Aggregations are the replacement for facets, giving you more flexibility in grouping your data for analysis. Of course, you can do aggregations on nested documents using the usual syntax.

Like filtering and querying, though, you must be careful with nested documents or you will be left wondering why you have no results. Let’s imagine we want to cross features of type 1 with features of type 2 and get the counts. Your query might look like this (pseudo-‘d for readability):

aggregations
  nested "features"
  aggregations
    filter type 1
    aggregate features.id
    aggregations
      filter type 2
      aggregate features.id

Just like with filtering, we have navigated to a single nested document which does not have two ID’s. Thankfully, ES has given us an escape hatch. The Reverse Nested Aggregation means after an aggregation, we navigate back up to the parent document. Now we can aggregate again like so:

aggregations
  nested "features"
  aggregations
    filter type 1
    aggregate features.id
    aggregations
      reverse nested
      aggregations
        nested "features"
        aggregations
          filter type 2
          aggregate features.id

Conclusion

Nested documents are a very powerful concept in ES, but when you are faced with “no results” when you are expecting them, be certain you are treating them correctly in your queries.

Categories