Please enable JavaScript.

Coggle requires JavaScript to display documents.

system design - Coggle Diagram

- - - - Consistent hashing: to spread load across servers
    - - Data replication: to N server (unique server, and for better, not belong to the same physical node, and choose in the distinct data center
      - Multi-datacenter setup
    - - vector clocks : Versioning and conflict resolution
    - - Consistent Hashing
    - - Consistent Hashing
    - - Consistent Hashing
    - - Quorum consensus
    - - Sloppy quorum and hinted handoff
    - - Merkle tree
    - - Cross-datacenter replication
    - - Data partition
        
        Consistent hashing
        
        Distribute data across multiple servers evenly.
        
        Minimize data movement when nodes are added or removed.
        
        Adv
        
        Automatic scaling: servers could be added and removed automatically depending on the load.
        
        Heterogeneity: the number of virtual nodes for a server is proportional to the server capacity. For example, servers with higher capacity are assigned with more virtual nodes.
        
        Nodes are distributed on a ring using consistent hashing.
      - Data replication
        
        replicate: to N server (unique server, and for better, not belong to the same physical node, and choose in the distinct data center
      - Consistency
        
        Quorum consensus : guarantee consistency for both read and write operations
        If W + R > N, strong consistency is guaranteed because there must be at least one overlapping node that has the latest data to ensure consistency.
        W/R = A write/Read quorum of size W/R
        The configuration of W, R and N is a typical tradeoff between latency and consistency
        
        Eventual consistency: this is a specific form of weak consistency. Given enough time, all updates are propagated, and all replicas are consistent
        
        Consistency models
        
        Strong consistency: any read operation returns a value corresponding to the result of the most updated write data item. A client never sees out-of-date data. this approach is not ideal for highly available systems because it could block new operations
        
        Weak consistency: subsequent read operations may not see the most updated value.
        
        Eventual consistency: this is a specific form of weak consistency. Given enough time, all updates are propagated, and all replicas are consistent. it allows inconsistent values to enter the system and force the client to read the values to reconcile. The next section explains how reconciliation works with versioning
        
        Inconsistency resolution
        
        Vector clock
        
        Quorum consistency is used in systems where consistency is more important than availability (CAP theorem) for write and read
      - Handling failures
        
        Failure detection :
        
        Gossip protocol : Decentralized the detection method
        Each node maintains a node membership list
        Each node periodically increments its heartbeat counter.
        Each node periodically sends heartbeats to a set of random nodes, which in turn propagate to another set of nodes.
        Each node periodically increments its heartbeat counter
        If the heartbeat has not increased for more than predefined periods, the member is considered as offline.
        
        Handling temporary failures
        
        sloppy quorum: the system chooses the first W healthy servers for writes and first R healthy servers for reads on the hash ring. Offline servers are ignored
        
        Handling permanent failures
        
        we implement an anti-entropy protocol to keep replicas in sync. Anti-entropy involves comparing each piece of data on replicas and updating each replica to the newest version.
        
        A Merkle tree is used for inconsistency detection and minimizing the amount of data transferred. it start by comparing the root hashes. If root hashes match, both servers have the same data. If root hashes disagree, then the left child hashes are compared followed by right child hashes. You can traverse the tree to find which buckets are not synchronized and synchronize those buckets only
      - Handling data center outage
        
        Data center outage could happen due to power outage, it is important to replicate data across multiple data centers. Even if a data center is completely offline, users can still access data through the other data center
      - find Write path
        
        SSTable
        
        Data is saved in the memory cache.
        When the memory cache is full or reaches a predefined threshold, data is flushed to SSTable
      - find Read path
        
        Bloom filter
        
        If the data is not in memory, Bloom filter is used to find out which SSTable contains the key in an efficient way