+1

Understanding Elasticsearch II

In Part I we covered the basic of elasticsearch data structure and also took a quick look at search API. In this part we'll dive more into search API and Query DSL.

Search API

To perform search in elasticsearch, make GET or POST request to /_search endpoint. You can perform search on one, many or all indices, and one, many or all types like this.

GET /index_2017*/type1,type2/_search
{}

To paginate a result you can use from and size parameters like this.

GET /_search
{
  "from": 30,
  "size": 60
}

Query DSL

The Query DSL is a flexible, expressive search language that Elasticsearch uses to expose most of the power of Lucene through a simple JSON interface. It is what you should be using to write your queries in production. It makes your queries more flexible, more precise, easier to read and easier to debug. To use the query DSL, pass a query in the query parameter like this.

GET /_search
{
  "query": QUERY_NAME
}

An empty {} query equal to match_all query which matches all documents.

Structure of a query clause

A query clause typically has the following structure.

{
  QUERY_NAME: {
    ARGUMENT: VALUE,
    ....
  }
}

or to references one particular field

{
  QUERY_NAME: {
    FIELD_NAME: {
      ARGUMENT: VALUE,
      ...
    }
  }
}

for example to match anime with title bleach

GET /animes_index/_search
{
  "query": {
    "match": {
      "title": "bleach"
    }
  }
}

Query clauses are simple building blocks, that can be combined with each other to create complex queries. Clauses can be:

  • leaf clauses that are used to compare a field to a query string such as match
  • compound clauses that are used to combine other query clauses. For instance, a bool clause allows you to combine other clauses that either must match, must_not match, or should match if possible. For example to search for animes which contain bleach in the title, not romance type and should be published between 2000 to 2010.
{
  "bool": {
    "must": { "match": { "title": "bleach" } },
    "must_not": { "match": { "genres": "romance" } },
    "should": { "range": { "year": { "gte": 2000, "lte": 2010 } } }
  }
}

It is important to note that a compound clause can combine any other query clauses, including other compound clauses. This means that compound clauses can be nested within each other, allowing the expression of very complex logic.

Type of Query

Although we refer to the Query DSL, in reality there are two DSLs: the Query DSL and the Filter DSL. Query clauses and filter clauses are similar in nature, but have slightly different purposes.

A filter perform by evaluated a true for false condition on the documents for example:

  • is anime status is completed
  • does genres contains romance

A query on the other hand perform by evaluated how well the documents match query criteria like: find the best match for a phrase the sword. it could be magical swords, my sword or even perhaps excalibur depending on the analyzer that you used when indexing your document.

The concept of relevant and _score really sit well with full text search engine like elasticsearch and you'll probably find yourslef spending a lot of time working on and tweaking _score to get the best result.

Common filters & queries

While Elasticsearch comes with many different queries and filters, there are just a few which you will use frequently. Below is a list of the most commonly used queries and filters.

term filter

The term filter is used to filter by exact values, be they numbers, dates, booleans, or not_analyzed exact value string fields.

{
  "term": { "title": "Sword Art Online Movie: Ordinal Scale" }
}
terms filter

Like term filter but can be use with multiple values

{
  "terms": { "genres": ["action", "shounen", "fantasy"] }
}
range filter

The range filter allows you to find numbers or dates which fall into the specified range

{
  "range": { "published_at": { "gte": 2000, "lte": 2017 } }
}
exists & missing filter

The exists and missing filters are used to find documents where the specified field either has one or more values (exists) or doesn’t have any values (missing). It is similar in nature to IS_NULL (missing) and NOT IS_NULL (exists)in SQL. These filters are frequently used to apply a condition only if a field is present, and to apply a different condition if it is missing for example like.

{
  "exists": { "field": "title" }
}
bool filter & query

This filter is used to combine multiple filter clauses using boolean logic. Only three parameters are allowed: must, must_not and should and each of these parameters can accept a single filter or array of filters clauses. For instance to search for all animes with any genres except action that are published in the past 3 years and it would be nice if the title contains high school, then we could do

{
  "bool": {
    "must": {
      "range": { "published": { "gte": 2015, "lte": 2017 } }
    },
    "must_not": {
      "term": { "genres": "action" }
    },
    "should": {
      "term": { "title": "high school" }
    }
  }
}
match_all query

The match_all query simply matches all documents. It is the default query which is used if no query has been specified and is frequently used in combination with a filter.

{
  "match_all": {}
}
match query

This is a query that you want to use all the time when you want to query for a full text or exact value. It is so flexible in that if run it against a full text field, it will the query string and using the correct analyzer for that field, otherwise if the field is not a full text field it will perform exact value search.

{
  "match": { "title": "7-nin no Majo" }
}
multi_match query

The same as match, but allow you to run the same query on multiple fields.

{
  "multi_match": {
    "query": "Yuri Yuri",
    "fields": ["title", "plot", "genres"]
  }
}
Combine filter & query

To combine a query & filter we need to wrap it into another query call filtered like the following example

{
  "query": {
    "filtered": {
      "query": { ... },
      "filter": { ... }
    }
  }
}

Conclusion

Now we have covered enough information to start running simple queries effectively. But, search and the query DSL are big subjects and the more time you invest in understanding how they work, the better your search results will be. We will dive much deeper into how results are sorted by relevance and how you can control & tweak the sorting process in Part III.


All rights reserved

Viblo
Hãy đăng ký một tài khoản Viblo để nhận được nhiều bài viết thú vị hơn.
Đăng kí