Please enable JavaScript.
Coggle requires JavaScript to display documents.
DIST DOC STORE & SEARCHING (NODE (SH&REP interaction (when…
DIST DOC STORE & SEARCHING
INDEX DATA is stored in MULTIPLE shards
How ES knows WHICH SHARD to store a new doc
shard = hash(routing) % number_of_primary_shards
routing here could be ID of a document
NODE
contains SHARDS & REPLICA
SH&REP interaction
when indexing, deleting, creating, document is indexed in the PRIMARY COPY SHARD first before coping to other REPLICAS
when UPDATING, a document is REINDEXED first in the PRIMARY SHARD first before UPDATING in other REPLICAS
when RETRIEVING, a document could be found in any REPLICAS or PRIMARY search
replica of a Shard NEVER in the same node with the shard
every node know exactly the location of each document in the same cluster
SEARCHING
Recall
Document Content is INDEXED in ES
EVERY FIELD is INDEXED so it is SEARCHABLE
Document returns in decreasing RELEVANT score order
RESPONSE
hits
took
_shards
"hits" array
max_score for the response && score for each document
timeout
PAGINATION
Returned results are SORTED.
if searching for TOP K results, each SHARD returns TOP K results first. Then, these results are sent to an COORDINATING NODE to sort (K * number_of_SHARDS) results to select TOP K results
So, the problem of retrieving too many result is HUGE
Searching Types
Structured Search
Searching on numbers, dates, time
so the answer is either YES/NO
NO relevance or similarity here
Full-Text Search
Finding RELEVANT documents
Return RELEVANCE score here
INVERTED INDEX && ANALYZERS matter here