Please enable JavaScript.
Coggle requires JavaScript to display documents.
Design a key value store - Coggle Diagram
Design a key value store
Single server key value store
Hashable
Store frequently used data in memory and the rest in disk
Data compression
Distributed server key value store
CAP Theorem
3 Properties
Consistency
All of clients will see the same data at the same time no matter with node they are connecting
Availability
All of requests will receive the response even there are some node are down
Partition Tolerance
The system continues to operate despite the network partitions
3 kinds
CA
Support Availability and Consistency and sacrificing Partition Tolerance
CP
Supports Consistency and Partition Tolerance and sacrificing Availability
AP
Supports Availability and Partition Tolerance and sacrificing Consistency
Ideal Situation
No network issue between nodes, when new data is written to a node, it will be synchronized to the other nodes
Real world distributed system
When there is issue between nodes
AP
It will continue to return data even the data is out of date
CP
It will return error util the data is synchonized and update to date
System Components
Data partition
Consistent Hashing Ring
Data availability
The data need to be copied to N next servers (the replicas) when moving clockwise on the hash ring
The replica should be on the different data center
Data consistency
N: the number of replicas
W: the number of replicas need to acknowledge to consider the writing is successful
R: the number response of the replicas to consider the reading is successful
W = 1: means fast write
R = 1 means fast read
W + R > N means strong consistency
W + R <= N means weak consistency
Consitancy models
Strong consistency
All of read will return up to date data
Weak consistency
The read may return out of date data
Eventual consistency
A kind of weak consistency but give enough time to process propagating to the replicas
Inconsistency resolution: versioning
The data will have vector clock [server, version].
Exp: D4([Sx, 2], [Sz, 1]) means this data has version 2 for Sx
and version 1 for Sz
D3([Sx, 2], [Sy,1]) and D2([Sx,2]) no conflicts
D3([Sx, 2], [Sy,1]) and D4([Sx,2], [Sz,1]) conflicts
Issues: The vector clocks grow very fast so need to remove the old vectors
Handling failures
Detection
Each node maintains a node membership list, which contains member IDs and heartbeat counters
If the heartbeat has not increased for more than predefined periods, the member is considered as offline.
When a node is detected offline, the current node will ask the random others to confirm and mark the node is down if all agree
Temporary failures
The system chooses the first W healthy servers for writes and first R healthy servers for reads on the hash ring.
Permanent failures
We need to update latest version to the replicas
A Merkle tree is used for inconsistency detection and minimizing the amount of data transferred
Data center outages
The replicas should be on difference data center to avoid the system down when the data center outages
Write path
The write request is persisted on a commit log file
Data is saved in the memory cache
When the memory cache is full data is flushed to disk
Read path
The system first checks if data is in memory
If data is not in memory, the system checks the disk