System design
steps
- High level design
- Design deep dive
- Understand problem, establish design scope
- Wrap up
feature need to build
clarify non-functional requierement
who is user
clarify requirements
consistensy
scalability
security
performance
freshness
availability
Top-down approach
High level desing diagram
Data model and schema
APIs - contract between user and backend
Review all high level design
follow RESTful convention
check requests and responses
Load balancer -> servises ->DB
data access paterns
read/write ratio
find bottle necks
come up with at least 2 solutions
discuss the trade-offs of the solutions
articulate the problem
pick the solution and discuss it with interwier
repeat for all problem
summirise a design
Resilient architecture
aspects
Disaster recovery
avalability
depends on
BCP
disaster type
geografical impact
business continuity plan - set of actions when disaster
impact by
network issues
load spikes
components failure
click to edit
Scalability
user-><-server - latensy, CPU
Verticcal scaling
Horizontal scaling
increas processing power of single server
host the app in more servers that run in parallel
pros
corn
data consistency
fast inter-proccess cominications
single point of failure
economic(hardware) limits of scaling
pros
corns
more reliable
data inconsistancy
difficult to manage
RPC communications (network communication) - slower
scales lineary
HI availability = 99,999% = 5 min of downtime per year
Availability paterns
failover
replication
db & servers
relatives DB
how we shift from failing component to others
active-passive patern
there are 2 servers active(recives requests) and passive(sleep and liasten heartbeets of active), one active is fall, passive start works - it's increase latency
active - active patern
there is load balancer and both servers are managing traffic, no downtimes - if one fall another is working
paterns
primary & replica
primary & primary
split read and write requests - write to one primary DB and read from replicas
pros
corns
fast, don't affect performans
split read and write requests
problem with primary - we can scale it only verticaly
primary is single point of failure
for each server his own primary DB for read/write(which are replicas for each other), servises are scale horizontaly
corns
pros
scale write requests by adding new nodes - horizontal scaling
if one fall - another is steal alive
asyncronus replications
some transaction can lost
backups for the same data
CDN- Content Delivery Networks
послуга від Selectel, Acamai, Amazon s3, Cloudflare
географічно розподілена мережова інфраструктура, що забезпечує шв доствавку веб контенту користувачам веб-сайтів
сервери географіч розташовані так щоб мініміз латенсі та зробити шв доступ до даних
origin
головний сервер на якому зберіг вихідні дані, що роздаються через CDN
PoP - point of presence
кешуючий сервер в складі CDN
дигамічний контент - формується на сервері в момент запиту
статичний контент - зберігається на сервері в незмінному вигляді(бінарні файли, відео,JS, css)
є стрімінгові CDN, що передають дані на кешуючі CDN(PoP)
контект розміщ не для зберіг а для кешування
для загрузки контенту використ 2 технології
GeoDNS
AnyCast
за IP-адресою запит переправ на неоюхід сервер
маршутизація відбув на свої сервери в межах регіону
зменш навантаж на осн сервер, що допом зменш пікові навантаж
CDN допомаг зниз
ISP - internet service provider
have boxes with cash - open connect box - which is local
Netflix extend concept of cash and put it on ISP
user make request to netflix with some film -> isp -> open connect box with cash without calling Netflix
need load balancer
Message/task queue - monitor heartbeats and when one of the servers don't response - redirect request to another server using CONSISTENT HASHING
Rabbit MQ
Zero MQ
JMS
Monolit architecture
can run on diff serverse and have few DB
pros
more context required - a lot of logic and processes for newcommer
complecated deployments
too much responsebility on each server - SINGLE POINT OF FAILURE
Microservices
one microservise - cover one buissiness neddes. Conected with DB
PROSES
corns
STACKOWERFLLOW
AMAZON
DB
Horizontal partitioning - buy id
vertical partitioning - by columns
sharding - partitioning - divide data, one pice - 1 server (by location, id....)
consistency
availability
sharding
problems
join - cros schunks - get data from two separete shard is very expencive
memhashed (not only about consistent hashing can be applyied on bussiness logic) can't have flexibele ammount of shade - have FIXED AMOUNT OF SHARD -to fixe it can use HIEARCHICKAL SHARDING(split one shard to smaller pieses) - it gives flexability
advantage
good peformance - get data in scope of one shard
use index - high performance
have master slaves for reading, writing - master. When master fall can choose master from slaves
NETFLIX
click to edit
video
different formats - codac the way in what you commpress video
different resolutions
Netfix breakes video into shots (4s) -> collect them into scenes
Amazon s3 - netflix use to store video content - static data
fast access
different content for the diff zones
handle 90% of requests
most popular content
can update content of the boxes
Tinder architecture
Main points
click to edit
reccomend matches - number of active users
note matches -~10^-3 of active users
store profile (Images - 5 per user)
dirrect messaging
how to store images
file
blob - binary large object
DB gives us
transaction - ACID
indexes for search
mutability
unnesessary for images
access control
useless for images
useless for images - image saves in format of 0 and 1
useless - you can get same secure mechanism for files as for DB
Gave us
faster - save images separetly, quick access
CDN
cheeper
save parh to image in table but image stores in DFS - distributed file system
user -> reqest (update name) need to have autentification process(name, token) -> Gate way service-> profile servise -> DB->...-> response -> user
user-> request - update images -> gateway serv-> image servise-> DB (store user id, image id, name, path to DFS) and DFS
user1->request (messageTo(user2) XMPP or HTTP-> gateway-> ..->XMPP(peer-to-peer protocol - it push user when message is comes but not user pushes server)
connectiong - TCP or web socket
every user has his conection id
user-> gateway service-> session servise->DB - store user id and connection id
Mather servise - DB keep user id to user id
Gateway-> Profile service -> DB (gender, age, location) - better use distributed DB (Casandra/ Amazon Dynamo) or Sharding in relative DB(horizontal partitioning) we use sharding to get better matching by location - we pull one small chank with our location and filter them by age and gender
for partitioning need to have master slave architecture with copies
sharding is more complicated when Casandra which gives all that fichures in one place
matchers service -> DB which store people what maches and update their location every ~2 hours
Distributed cashing
cash pros
decrease network calls
avoid recalculations
reduce DB load
to speed up responses
save most relevant info, predicted data
cash polisy -adding and evicting(delete) data from/to cash
LRU
Least resently used
new put on top, delete from bottom
Sliding window cash policy
cash corns
trashing - inputing - outputting cash without reading results
data inconsistancy
extra call (if data not in cash)
where to save
close to the DB
close to the serverse even inmemory of the server
benefits
click to edit
click to edit
faster
separate
global cashe - IT IS RADIS
accurate, more resiliant for crashes
scale cashe independently without scaling servises
all serverse heet global cash if data upsent - heet DB
Updating data
write through
write back
check if data exist in cashe if yes - update cashe -> update DB
update data in DB -> check if exist in cashe if yes -> update cashe
in case of INMEMORY CASHE
might be inconsistensy better to use write-back or hybrid of both
fichers
sent, delivered, read reciepts
online/last seen
image sharing
chats are permanent or temporary
group messaging
same as in Tinder
one to one chat
common approach - SENT MESSAGE
user A
gateways - security checks, diff users can be connected to dif gateways
sessions microservisec - prevent single point of failure
DB - save data what use connected to what box
session servise - using TCP - WEBSOCKETS(WSS)
appropriate gateway
user B
HTTP - send request from client to server and client get response, but it dont give abbility sent request from service to user
allows pair to pair communication
message status update
when session service sent mes to gateway userB it send notification to gateway userA that message sent
when userB get mes - it sent notification to session server that mes was got(delivered) -> userA
same for read notification
message stores i DB before it delivered to userB
update user status(online/last seen)
online = last activity ~20 sec ago
user - send request to gateway (requests can be made by user and by app - need some flag and track only users requests)
Last Seen servise
DB - save user id and time stamp of the last activity(in a pair)
userB - send request about curr status of userA
gateway
lastseen service
DB
... userB
group messaging
user sent message to groupA
gateway - after gateway befor session service can impliment PARSER and UNPARSER service which decrease load of gateway
Session server
- group server
DB - store users id which belongs to groupId (one to MAny) - using consisting hashing
sent it to sesion server
click to edit
click to edit
DB - find number of gateway of every user
to made group chats fast need limit numbers of user in the group ~200
send message to users
To avoid losing messaging - good to use MESSAGE QUEUE
NEW YEAR
deprioritizing messaging - deliver only nessesary message, exclude unnessesary fiatures
API - APPLICATION PROGRAMMABLE INTERFACE
contract that describe what you will do but not how
external function that called by user
important things
don't use additional parrametrs
return only that needed
naming of the functions
custom errors
no side effects - do only what needed
doing everything in one function
atomicity
big responce hoe to deal with
pagination
fragment the API
YOUTUBE - ESTIMATED DAILY STORAGE REQUIRAMENT
N of users - 1B
ratio of uploaders:users - 1:1000
N of uploaders - 1B:1000 = 1M
avarage length of video 10min
total uploaded video - 1M*10min = 10M min
total video size = 3 MB * 10M = 30 TB
2 hour video = 4 GB
optimized codac and compretion - 90%
2 hours video optimized size = 4:10 = 0,4 GB
1 min avrg size = 0,4:120 = 3 MB
Event Driven services
use Publisher - Subscriber module - use MESSAGE BROKER (send message asynchroniusly between servises) - Kafka
to avoid
failure latency
inconsistant data
strong coupling
click to edit
when we have multiple services which communicate and waiting for response from last to first
advanteges
simplifies interactions - easy to understand
some transaction capability - garanty when you send a message once and get response - ok it 100% will be done in future
decoupling
easaly scalable
disadvantage
idempotency still required
not good idea for banking account, financial app
poor consistency, there is not atomicity
DB as a message queue(between servers) - ANTIPATERN when system has a lot of users
problem
if services talk a lot between themselfs - need to clean data
large system -
polling interval
frequent polling
high load of DB
long intervals
inefficient - bad user expirience
complicated process
1 DB can not handle large load -> need add 2... DB - problem - communication between servers that belong to different DB - - need perform difficult logic
solution -
MESSAGE QUEUES
sutable
small app
mostly read operations between servers - optimize DB for readding
for using smt new we need additional cost - choose if better to leave DB or not
Amazon s3
Reliable
easy to use
Cheep
SINGLE POINT OF FAILURE
When app is not resilient
prevent
add additional nodes
as backup - when first fail 2 - start works
(good for DB)
click to edit
master-slave architecture
multiple regions
EVENT DRIVEN SYSTEMS
when some events occure -> other service made some actions
Git
React
Node.js
Game servers
all services store events in their DB
server ->event->event bus->server...
advanteges
availability
easy roll-back
problems
consistency
replacements
Transsactions
stores intent
n/a gateways
less controll
hidden flow
migration
all services publish events
REQUEST_RESPONCE ARCHITECTURE
services asks for data, for...
NoSQL DB
SQL DB
we have foreign key mapping, have relationship between objects
store data in format - key-value(blob of data(object in JSON format))
insert all value for one id at same moment
select all info about object by request
use join for data select(expensive operation)
cons
expensive join operation
pros
cheep
easy insert and read
flexible
value - it just JSON, and no matter schema
object has straight schema
schema is easily chanchable
built for scale
built for finding metrics/getting inteligent data/agregations
cons
consistency
not read optimized
not good for lot of updates
pros
Consistency
ACID
ACID is not garanted
relations are not implicit
joins are hard (all manual)
relations between obj
simple joins
CASANDRA(FB)
id = key
casandra claster consist of 5 nodes, Keyes destribute by hash function
gives us load balansing (save data replicas in 2 or more nodes)
redundancy/replicas
QUORUM
if quorum = 2 - request to node5-> save data in 5, 1, 2 -> 5 crash-> serch data in 1 and 2. If both dont have data -> error, if have-> send responce
Request to CASANDRA
save data
data format - key-value pair
store data in a memory as a log file
periodicly this data damp in sst table(sorted string table) - ley is sorted
sst is immutable
update
update will create copy of the record with timestamp - if you want to read - need take data with latest timestamp
to deal with that
COMPACTION
splash SST to save space
remove
set flag tumstone(ts)
thats mean thats record is dead
Health service
check heartbeats of other services - two ways heartbeats
every~ 5 sec ask server is it life-
if NO
1 NO - mark as suspitious
2 NO - mark as dead
-> Load balancer
ask someone restart server
redirect requests
every `5 sec server send message - i am alive
Master-slave architecture
make copy of DB
ways of pulling data from master to copy(slave)
synchronous
asynchronous
any db that take write operations - is MASTER
SLAVE - for read
MASTER - for write
Master-master architecture
problem
server that answer for db comunication is broken and both db think that they are only one master
SPLIT BRAIN
Solve
add 3 DB - when connection between A and B is broken ->write operation to A -> path update to B -> cant connect-> path update to C-> C is updated write operation to B->path update to A->cant conect->path update to C -> compare previous state of B with own -> they not same-> B roll back transaction-> update state to C state -> write new data and only than C take update from B....
Distributed consensus
way in which few DB agree in a particular value
2PC
3PC
MVCC
SAGA
Fitures
Store/get images
Like and comment image
Follow someone
Publish a news feed
Similar as Tinder
can you like comment
click to edit
can you live comment on comment(recursivly)
+
-
Post table
Comment table
Like table
timestamp
user id
activity id (id of post or comment)
active or delited
Type(comment or post)
id
text
timestamp
user id
id
text
timestemp
click to edit
image
Activity table
id
likes
id
Table
followee id
timestamp
follower id
user - getUserFeed(id)
using HTTP
gateway(reverse proxy)
internal protocol
UserFeedSeervice - need multiple servers to make system resiliant
Load balancer(use hashing of id to figure out what service needed)
PostsServ (multiple)
FolowServ(multiple)
getUsersFollowedByUserId(id)
getPostsByUser(id)
limit post to 20
Better way - PRECOMPUTE feed
when user from list of folowee add new post -> PostService -> UserFeedService-> update feed for user with limit of 20
place to save feed for every user - in DB or cashe
Post of celebrity
push updates in batch
pull updates by users
hybrid
push approach
pull aproaacah
Problem with failure
cascading failure problem
way to avoid
request queue for servers - limit number of requests to server
click to edit
set capacity for every server - request per second
if queue gets more requests then capacity -> request falied but others stay in a queue
message that get user with failed request with error
errors
temporary
permanent
then crash 1 server -> distribution of load for existing servers-> crash 2 server->...
viral/black Friday event(predictable load increase)
solutions
autoscale
rate limiting
rescale
job scheduling
send email to 1M users
break job into smaller pieces - chanks of users
celebrity(popular) post
batch processing order
add jitter
don't show all metadata - just show an approximation(aproximate statistics) - general tendency - it decrees number of requests to DB
more solutions
cashing(key-value)
auto scale
batch processing
pre scaling
aproximate statistics
rate limiting
gradual deployments
coupling
decriase load of the authentification service-> save user token and name per session
Food delivery
Location based DB
consepts of zipcodes
measurable distance
proximity - find people in radius ?km
use quad tree or R-tree
uniform assigment
scalable granularity - brake regions on smallest regions
Virtualisation and containers
application need
memory
processor
i/o
disk
TikTok
Fitures
Storage for video
distribution
ingestion of videos(upload)
click to edit
need ask about
fault tolerance
high available
consistensy
performance of upload - low SLAs
need eventually
+
low latency between upload and user visibility, low latency when distributed video on stream
10M viewers and 100k people who download video
user app
user data, user metadata, user video
tiktok service
DB
user profile
user metadata
video
key-vslue
casandra, redis
amazon s3
click to edit
MySQL
put all requests fro the user to the queue
gives us
availability
CDN
reliability
file storage
faster read operation then MySQL
schema flexible
need to save different formats and resolutions of video
storage - 200K video per day
200K*1Mb = 200GB for one format - for 4 formats we need 600GB - some videos would be smaller
diff resolutions ~ 1,2 TB per day
click to edit
in parallel
- split video in two chanks -> convert each chank in different formats(if video short - less than 10sec -> we will not split it)
convert that formats into different resolutions
after that all the chanks will be combined together( result 8-16 copies depends on amount of resolutions)
save video and diff formats and resolutions to s3 in diff regions(CND)(save replicas in diff zones)
click to edit
Remote code execution engine
take code from different users across the world to the remote server with some problem statement, it has test cases witch will give some answer(accept or not) and results are recorded
it's system which can evaluate your code
problems
lot of requests
rate limiting
use message queue, create events
bad user expirience
user get response that their file was uploaded and waiting for the processing
file execute later when server has capacity - asynchronous processing of events
check output file and write results to the DB
security
solve -
virtualization - create virtual system in which code can run
containers - create for every request
CAP teorem
CONSISTENCY(DATA)
PARTITION TOLERANCE
AVAILABILITY
ACID
CONSISTENCY
ISOLATION
ATOMICITY
DURABILITY
2 phase commit
have veraity of servers - 1 is leader and others followers - leader made update, others only read
leader sand request with commit to other servers - they give ACK(we get) to leader -> leader send command with prepare statements - commit _> others give ACK
1 phase
prepare
eventual consistency
corns
eckspensive
low performance
need put locks to read and write operations on rows that in transactions
Managing system health
Metrics
number of requests
error rate
request queue size
click to edit
number of processes
sent all metrics to some engine that analyse them - then some exident happen engine send message
implement Service Mech - side car
takes metrics from all services and put them to the engine
need to built isolation tree - tree of decision when appear risky situation
Streaming events(video)
protocol - RTMP
TCP
reliable(guaranted delivery)
comparatively slower
sequencing of data(packets arrive in order)
UDP
comparatively faster
non-reliable(no guaranted delivery)
no sequenced delivery
Real time Messaging protocol
used by Facebook, Youtube life
Transformation service
converting to different resolution and formats
need Job Assigner(message queue) - that will create job that convert all videos
assign jobs to different workers
when job is done - sent message with results to the message queue(other)
and save new format of video in DFS
subscribers will took video from queue
send notification to the Transformation service that job job os done
workers pulls another task from first queue
SINGLE CODE REPOSITORY
CODE BROKEN INTO PICES(SERVICES)
easier to scale
esear for new team member
separation of concerns
coding is easear
working in parallel
easier deployment
when moved to microservice
when team is scaled, big size, different positions
contracts are in one place
less complex system
less dublicates - code in the same box
good for small team
faster
breaking changes
click to edit
need scilled architects
click to edit
not easy to design
STEPS OF MIGRATIONS FROM MONOLITE TO MICROSERVICES
Steps
Integrate DBs
redirect users
Build services
EASEAR WAY
then create new features
slip it to new services with own DB
- define contracts for each microservice
- need routers - load balancer - help talk services with each other
- simplify deployment(automated deployment)
CI/CD pipeplines tools jenkins
service dashboards
containers
- communication between services
- logging
messaging queue
batch messaging
request-response architecture
NEED TAKE CARE OFF
EVERY COMPONENT HAS SINGLE SOURCE OF TRUTH
when you have dedicated datastore for every microservice
Condence bussines responsibilities to the single place
initial infrastructure cost is high
amount of people
company
big company
startup
1-2 person per service
2 person per service
4 person per service
System design is the process of defining the architecture, interfaces, and data for a system that satisfies specific requirements.