Elasticsearch aggregations with subaggregation, filter and nested object

GET _search
{
“aggs” : {
“companies” : {
“filter”: {
“nested”: {
“path”: “companies”,
“query” : { “term” : { “companies.identifier” : “5481738821” } }}},
“aggs” : {“identifiers”: {
“nested” : {
“path” : “companies”
},
“aggs” : {“names”:
{“terms”:{
“field”:”companies.identifier”},
“aggs”:{“namesbyidentifier”:{“terms”:{“field”:”companies.denomination”}}}}
}
}}
}}, “size”: 0
}

other resource:

http://blog.qbox.io/elasticsearch-aggregations

Advertisements

Elasticsearch highlighting — something to keep in mind

Within elasticsearch the search term can be highlighted within the full text.

But for me, this has gone wrong a couple of times.

I had some fine running queries, giving the correct result, but the highlighting was completely messed up. Or no highlighting was returned, or every single word of the text was highlighted instead of only the search term.

The reason for this was that, by mistake, in my query, there was a non existing field (e.g. pubdate instead of pub_date). So the query was running fine without errors, but the highlighting was not correct!

Elasticsearch language analyzers

To get an idea on how words containing accents are analyzed by different language analyzers.

Original word English French Dutch
fosséprez fosséprez foseprez fosseprez
gâteaux gâteaux gateau gâteaux

Remark: the ‘missing’ ‘s’ for the French analyzer for the word ‘fosséprez’ is not a typo, the result contained only one ‘s’

curl -XGET ‘localhost:9200/lang/_analyze?analyzer=french’ -d ‘fosséprez’
{“tokens”:[{“token”:”foseprez”,”start_offset”:0,”end_offset”:9,”type”:””,”position”:1}]}

Things to remember in elasticsearch

1. Custom analyzers

You cannot use a custom analyzer until it is referenced by an index. You would need to create a mapping that uses the analyzer and then use that index in the analyzer call. There is no need to index any documents to that index.

curl -XGET ‘localhost:9200/SOMEINDEX/_analyze?analyzer=angram’

2. Analyzing

You can check how a search term will be analyzed using the _analyze option

e.g.

sandy@hers:~/tmp$ curl -XGET ‘localhost:9200/ocr/_analyze?analyzer=test’ -d ‘Ilse’
{“tokens”:[{“token”:”llse”,”start_offset”:0,”end_offset”:4,”type”:”<ALPHANUM>”,”position”:1}]}

But keep in mind:

“The term query doesn’t analyze.  text/query_string/field queries do analyze.”

Basic elasticsearch SEARCH queries

Below you find an overview of some elasticsearch SEARCH queries:

MATCH_ALL

curl -XGET 'localhost:9200/test/doc/_search' -d'
{"query" : { "match_all" : {} } }
'

TERM QUERY


{"query": { "term" : {"pub_id":"10795726"} } }

QUERY STRING QUERY

{ "query" : {
"query_string" : {
 "query" : "green OR yellow"
 }
 }
}

EXISTS FILTER

{ "query": {
"constant_score" : {
"filter" : { "exists" : { "field" : "notaries" } }
}
}
}

MISSING FILTER

{ "query": {
 "constant_score" : {
 "filter" : { "missing" : { "field" : "notaries" } }
 }
 }
}

Elasticsearch mapping changes

You could consider elasticsearch to be schema-less. And when just exploring elasticsearch this is great! You can index documents containing different fields without having to worry about the different fields. If the field does not exist yet in elasticsearch, it is automatically added to the ‘mapping’ <= this is how a schema is called in es. But when you start doing a bit more advanced things, you have to be really careful with this ‘automatic mapping’. The default settings for a field that are defined by elasticsearch, might not always be exactly what you need. And changing for example a field from type string to type date requires you to reindex all documents. Unless, you just create a new field with a slightly different name. I guess this is the easiest way regarding to elasticsearch and as long as not a lot of coding for that field has been done, refactoring your code might not take that much time.  Of course, in case all your documents contain/need a value for this field, they all still need to be reindexed…

But being sometimes a kind of a perfectionist, I don’t like to rename fields to something that I consider as not the right name for a field.

So there I was, having a field called “colors” as being automatically mapped as a string:

“colors”: {
“type”: “string”
}

But now I wanted to make a facet of it and a multi-field type would be more appropriate in my case.

A while ago, I had been reading the article “changing mapping with zero downtime” on the elasticsearch website. So I was already thinking of reindexing all the documents using an alias.

But I have learned that elasticsearch is continuously improving and to read the documentation again first before doing something I haven’t done before.

Especially because I was not so keen on having to reindex all documents again. Although it would have been a good practice in case it would be needed urgently somewhere in the future…

Anyway, I found myself really lucky when reading the following in the put mapping api of the es documentation:

“core type mapping can be upgraded to multi_field type”

So instead of spending time on reindexing documents, I just ran the following code and my problem was solved.

curl -XPUT ‘http://localhost:9200/clothes/doc/_mapping&#8217; -d ‘
{
“doc” : {
“properties” : {
“colors”: {
“type”: “multi_field”,
“fields”: {
“colors”: {
“type”: “string”
},
“untouched”: {
“type”: “string”,
“index”: “not_analyzed”
}
}
}
}
}
}

Great!

By the way: to view your mapping you have to use the _mapping command:

e.g curl -XGET curl -XGET ‘http://localhost:9200/clothes/doc/_mapping?pretty=true&#8217;

and I suggest you use the Chrome Sense plugin for this