Please enable JavaScript.

Coggle requires JavaScript to display documents.

Efficiency in ElasticSearch (Hot, Warm, Cold Architecture (Example…

- - - - Indexing Load: High
      - Query Load: High
      - CPU: High
      - Storage: Very Fast (SSDs)
      - Data Volume: Low
    - - Indexing Load: None
      - Query Load: Medium
      - CPU: High/Medium
      - Storage: Fast (HDDs/SSDs)
      - Data Volume: Medium
    - - Indexing Load: None
      - Query Load: Low
      - CPU: Medium/Low
      - Storage: Medium/Slow (HDDs)
      - Data Volume: High
  - - - For high availability I need 3 nodes
    - - Very large latency values: ~10s for search
      - 39.8TB; 4 nodes; 360 shards
      - JVM heap on node: ~30GB, usage 75%
      - Index memory on node: Lucene 20GB
  - - - What limits storage density?
        
        Query Latency Requirements
        
        Random reads disk performance
        
        Low page cache hit ration
        
        Comment: cache will not help, usually very different data
        
        Data & query patterns
        
        Comment: e.g. time windows can influence a lot the query time
        
        Network and Recovery Performance
        
        Cluster Size
        
        Recovery Window Length
        
        Comment: We can lose data if we lose the node within the recovery window. Cluster nodes with a lot of data can be problematic since it takes lot of time to replicate the data.
        
        Heap usage (most important one)
        
        Circuit Breakers
        
        Comments: Circuit breakers are designed to deal with situations when request processing needs more memory than available
        
        Static Lucene Data
        
        Transient Data
        
        Comment: Elasticsearch builds the transient data structures of each shard of a frozen index each time that shard is searched, and discards these data structures as soon as the search is complete
    - - Each shard has overhead
      - Ideal size depend on Use-Case
        
        Each shard max. 2 billion docs
      - Practical guideline: 50GB-200GB
    - - ALWAYS optimize mappings
        
        Comments: Defaults on Logstash are focused for flexibility
      - Avoid nested and parent child mappings
      - Watch out for high cardinality keyword fields
        
        Comment: Avoid storing high cardinality keywords, like unique IDs, text with timestamps, etc.
    - - I/O intensive
        
        Comment: the larger the shard, the longer it takes
      - Only run on read-only indexes
        
        Comment: For example when you go from Hot to Warm node
      - Reduce Heap Usage
    - - Document IDs have different profiles
        
        Comment: Auto generated IDs are not always ideal
        
        https://www.elastic.co/blog/efficient-duplicate-prevention-for-event-based-data-in-elasticsearch
      - Heap Space vs disk usage trade off
      - Use Case dependent
- - - - Disk Usage
      - Off-heap memory
      - On-heap memory
    - - indices/segments and indices/fileddata
    - - Terms dictionary
        
        Read into heap memory
      - Posting Lists
        
        Read from disk into byte buffers
    - - Per-document values
        
        Stored on disk, with compression
      - Numeric values stored in a byte buffers
      - Keyword values use a dual structure
        
        term dictionary (on heap)
        
        per document ordinals (in byte buffers)
    - - Lucene indexes are written as multiple small indexes called segments
      - Each segment will have its own set of data structures
      - Most structures can be used per-segment
      - Keyword docvalues need to re-map ordinals to a global structure (on heap) - field data or global ordinals
        
        Really big usage of heap space with field data/global ordinals
        
        Global ordinals are unnecessary if you only have one segment
    - - Term dictionaries of high cardinality fields
      - Global ordinals - can be avoided if you forceMerge
  - - - Index storing a mapping from content, such as words or numbers, to its locations in a document or set of documents
      - :check: Most disk space is used here
      - :check: Most heap usage is used here
    - - Per document information used for sorting and aggregation
      - :check: Most heap usage is used here
    - - Range quering
    - - Original document
      - :check: Most disk space is used here
    - - Per document inverted index
        
        Uses a lot of space
        
        Disabled by default