Please enable JavaScript.
Coggle requires JavaScript to display documents.
ElasticSearch (Vocabulary (Similarity algorithm (takes into account (Term…
ElasticSearch
Vocabulary
Index (noun)
As explained previously, an index is like a database in a traditional relational database. It is the place to store related documents. The plural of index is indices or indexes.
Index (verb)
To index a document is to store a document in an index (noun) so that it can be retrieved and queried. It is much like the INSERT keyword in SQL except that, if the document already exists, the new document would replace the old.
Inverted index
Relational databases add an index, such as a B-tree index, to specific columns in order to improve the speed of data retrieval. Elasticsearch and Lucene use a structure called an inverted index for exactly the same purpose.
By default, every field in a document is indexed (has an inverted index) and thus is searchable. A field without an inverted index is not searchable.
Mapping
A mapping defines the fields within a type, the datatype for each field, and how the field should be handled by Elasticsearch. A mapping is also used to configure metadata associated with the type.
The mapping, like a database schema, describes the fields or properties that documents of that type may have, the datatype of each field—such as string, integer, or date—and how those fields should be indexed and stored by Lucene.
Score
relevance score is represented by the floating-point number returned in the search results as the _score
Relevance
is the algorithm that we use to calculate how similar the contents of a full-text field are to a full-text query string
Similarity algorithm
term frequency/inverse document frequency, or TF/IDF
takes into account
Term frequency
How often does the term appear in the field? The more often, the more relevant. A field containing five mentions of the same term is more likely to be relevant than a field containing just one mention.
-
Field-length norm
How long is the field? The longer it is, the less likely it is that words in the field will be relevant. A term appearing in a short title field carries more weight than the same term appearing in a long content field.
-
-
Doc values
-
Essentially, it stores all the values for a single field together in a single column of data,
-
Index alias
is like a shortcut or symbolic link, which can point to one or more indices,
-
Refresh
-
By default, every shard is refreshed automatically once every second
This is why we say that Elasticsearch has near real-time search: document changes are not visible to search immediately, but will become visible within 1 second.
-
Translog
-
By default, the translog is fsync'ed every 5 seconds and after a write request completes (e.g. index, delete, update, bulk)
-
Limitations
-
-
-
-
Types share the same mapping and each index contains a single, flat schema for all fields
-
Queries
Important queries
-
-
-
-
-
-
match_phrase
keeps only documents that contain all of the search terms, in the same positions relative to each other
Clauses
-
-
should
If these clauses match, they increase the _score; otherwise, they have no effect. They are simply used to refine the relevance score for each document.
filter
Clauses that must match, but are run in non-scoring, filtering mode.
These clauses do not contribute to the score, instead they simply include/exclude documents based on their criteria.
-
-
-
Questions
durability/consistency, quorum
-
-
Properties
Elasticsearch has near real-time search: document changes are not visible to search immediately, but will become visible within 1 second
-
-