Elasticsearch Learning (4): Mappings

“Today, we are going to dive into one of the most important settings when we use Elasticsearch, mappings, which is related to how is ES interpreting our document JSON, how is ES analyzing our field and indexing document, how is ES searching for our request.”, Tony said in the technology sharing time.

Mapping and Analysis

“As we all known, there exists different data types in common databases, and ES also has some common data types.”, Tony.

Text: string for full text search
Keyword: string for exact match
Whole number: byte, short, integer, long
Floating-point: float, double
Boolean: boolean
Date: date

"Text type and Keyword type seems different, can you explain what’s the difference?", someone asked.

Tony, “Good question. But before we explain their differences, we need to understand there exists two large categories of data types: exact values and full text”

Exact Values vs Full Text

"Data in Elasticsearch can be broadly divided into two types: exact values and full text.

"Exact values are exactly what they look like. Examples are a date or a user ID, but can also include exact strings such as a username or an email address. The exact value Foo is not the same as the exact value foo. The exact value 2014 is not the same as the exact value 2014-09-15.

"Full text, on the other hand, refers to textual data—usually written in some human language — like the text of a tweet or the body of an email, which often means that we seldom want to match the whole full-text field exactly and we should split it to words/chars.

"As we can expect that each of the core data types—strings, numbers, booleans, and dates—might be indexed slightly differently. And this is true: there are slight differences. To facilitate queries on full-text fields, Elasticsearch first analyzes the text, and then uses the results to build an inverted index. On the other hand, exact values will be indexed as it is.

“So, back to the question of what’s the differences between text and keyword, the differences lays whether it will be analyzed and how it is store in inverted index.” Tony said.

Indexing

"We have talked that ES use mapping to determine whether to analyze our document fields. But before that, in order to be able to treat date fields as dates, numeric fields as numbers, and string fields as full-text or exact-value strings, Elasticsearch needs to know what type of data each field contains. And this is actually the first functionality of mapping – interpret our JSON string to different types.

“When we index a document that contains a new field or a entire fresh types of document —one previously not seen—Elasticsearch will use dynamic mapping to try to guess the field data type from the basic datatypes available in JSON. But notice, dynamic mapping may hide some bugs of our programs and will cost much performance of indexing. If our document fields is fixed and need no change, we can disable dynamic mapping by:” Tony said.

PUT /test_idx
{
  "settings": {
    "index.mapper.dynamic": false # disable type creation
  },
  "mappings": {
    "test_type": {
      "dynamic": "strict", # disable field addition
      "properties": {
        "field1": {
          "type": "string"
        }
      }
    }
  }
}

Completion

“Understanding the basic functionality of mappings, we focus on how to handle a special cases – auto completion. Auto completion (i.e. search as you type) is a very useful and handy functionality used to assist the user input. In order to use completion assist by Elasticsearch, we can use completion suggester.” Tony said.

“I have heard that there exists prefix search in ES, what’s differences between prefix search and completion suggester?” someone asked.

“Thanks for your question. The difference between completion suggester and prefix search is the speed: completion suggester is built static, occupy more space but fast enough to provide instant response; prefix search is not fast for this use case.” Tony added.

Mapping

"To use this feature, we have to add another field in mapping, which will pre-process our target field for fast completion.

"Saying we want to auto complete user name when user searching, we can put user mapping like this:

PUT user
{
    "mappings": {
        "user" : {
            "properties" : {
                "name-suggest" : {
                    "type" : "completion"
                },
                "name" : {
                    "type": "keyword"
                }
            }
        }
    }
}

Indexing Document

"In Elasticsearch, we have to index the suggestion by ourselves.

PUT user/user/1?refresh
{
    "name-suggest" : {
        "input": [ "tony", "tom" ],
        "weight" : 34
    }
}

“It looks somewhat strange and tedious, because it seems should be done automatically by binding the this suggestion field to our target field, and sync with the target field when we adding/removing value, which has following advantages:”

convenient
no need to sync suggestion and target

“And I am not sure why ES design completion like this.” Tony added.

Query

POST user/_search?pretty
{
    "suggest": {
        "user-suggest" : {
            "prefix" : "to",
            "completion" : {
                "field" : "name-suggest"
            }
        }
    }
}

“Query works like common search, except we use suggest query. The weight of suggestions will be the _score of the match.”

Fuzzy

“The completion query also supports fuzzy query, i.e. allow some typo in query, which is very handy features to use:”

POST user/_search?pretty
{
    "suggest": {
        "user-suggest" : {
            "prefix" : "to",
            "completion" : {
                "field" : "suggest"
                "fuzzy" : {
                    "fuzziness" : 2
                }
            }
        }
    }
}

"For more fuzziness configuration, here is what you want.

“Thanks for coming, that’s all” Tony said.

Ref

Written with StackEdit.

On teh way

Blog Search