Please enable JavaScript.
Coggle requires JavaScript to display documents.
system design - Coggle Diagram
system design
Unique ID Generator In Distributed Systems
Multi-master replication
UUID
Ticket Server
Twitter snowflake approach :check:
0+timestamp+datacenter_id+machine_id+seq_number
1bit+ 41bit+5bit+5bit+12bit
Sign bit: 1 bit. It will always be 0. This is reserved for future uses
Timestamp: 41 bits. Milliseconds since the epoch or custom epoch
Datacenter ID: 5 bits, which gives us 2 ^ 5 = 32 datacenters.
Machine ID: 5 bits, which gives us 2 ^ 5 = 32 machines per datacenter.
Sequence number: 12 bits. For every ID generated on that machine/process, the sequence number is incremented by 1. The number is reset to 0 every millisecond
Summary
distributed key-value store
Ability to store big data
Consistent hashing
: to spread load across servers
High availability reads
Data replication
: to N server (unique server, and for better, not belong to the same physical node, and choose in the distinct data center
Multi-datacenter setup
Highly available writes
vector clocks
: Versioning and conflict resolution
Dataset partition
Consistent Hashing
Incremental scalability
Consistent Hashing
Heterogeneity
Consistent Hashing
Tunable consistency
Quorum consensus
Handling temporary failures
Sloppy quorum
and hinted handoff
Handling permanent failures
Merkle tree
Handling data center outage
Cross-datacenter replication
Techniques
Data partition
Consistent hashing
Distribute data across multiple servers evenly.
Minimize data movement when nodes are added or removed.
Adv
Automatic scaling
: servers could be added and removed automatically depending on the load.
Heterogeneity
: the number of virtual nodes for a server is proportional to the server capacity. For example, servers with higher capacity are assigned with more virtual nodes.
Nodes are distributed on a ring using consistent hashing.
Data replication
replicate: to N server (unique server, and for better, not belong to the same physical node, and choose in the distinct data center
Consistency
Quorum consensus
: guarantee consistency for both read and write operations
If
W + R > N, strong consistency
is guaranteed because there must be at least one overlapping node that has the latest data to ensure consistency.
W/R = A write/Read quorum of size W/R
The configuration of W, R and N is a typical tradeoff between
latency and consistency
Eventual consistency
: this is a specific form of weak consistency. Given enough time, all updates are propagated, and all replicas are consistent
Consistency models
Strong consistency
: any read operation returns a value corresponding to the result of the most updated write data item. A client never sees out-of-date data. this approach is not ideal for highly available systems because it could block new operations
Weak consistency
: subsequent read operations may not see the most updated value.
Eventual consistency
: this is a specific form of weak consistency. Given enough time, all updates are propagated, and all replicas are consistent. it allows inconsistent values to enter the system and force the client to read the values to reconcile. The next section explains how reconciliation works with versioning
Inconsistency resolution
Vector clock
Quorum consistency
is used in systems where consistency is more important than availability (CAP theorem) for write and read
Handling failures
Failure detection
:
Gossip protocol
: Decentralized the detection method
Each node maintains a node membership list
Each node periodically increments its heartbeat counter.
Each node periodically sends heartbeats to a set of random nodes, which in turn propagate to another set of nodes.
Each node periodically increments its heartbeat counter
If the heartbeat has not increased for more than predefined periods, the member is considered as offline.
Handling temporary failures
sloppy quorum
: the system chooses the first W healthy servers for writes and first R healthy servers for reads on the hash ring. Offline servers are ignored
Handling permanent failures
we implement an anti-entropy protocol to keep replicas in sync. Anti-entropy involves comparing each piece of data on replicas and updating each replica to the newest version.
A
Merkle tree
is used for inconsistency detection and minimizing the amount of data transferred. it start by comparing the root hashes. If root hashes match, both servers have the same data. If root hashes disagree, then the left child hashes are compared followed by right child hashes. You can traverse the tree to find which buckets are not synchronized and synchronize those buckets only
Handling data center outage
Data center outage
could happen due to power outage, it is important to replicate data across multiple data centers. Even if a data center is completely offline, users can still access data through the other data center
find
Write path
SSTable
Data is saved in the memory cache.
When the memory cache is full or reaches a predefined threshold, data is flushed to SSTable
find
Read path
Bloom filter
If the data is not in memory, Bloom filter is used to find out which SSTable contains the key in an efficient way
Design A URL Shortener