System design

steps

  1. High level design
  1. Design deep dive
  1. Understand problem, establish design scope
  1. Wrap up

feature need to build

clarify non-functional requierement

who is user

clarify requirements

consistensy

scalability

security

performance

freshness

availability

Top-down approach

High level desing diagram

Data model and schema

APIs - contract between user and backend

Review all high level design

follow RESTful convention

check requests and responses

Load balancer -> servises ->DB

data access paterns

read/write ratio

find bottle necks

come up with at least 2 solutions

discuss the trade-offs of the solutions

articulate the problem

pick the solution and discuss it with interwier

repeat for all problem

summirise a design

Resilient architecture

aspects

Disaster recovery

avalability

depends on

BCP

disaster type

geografical impact

business continuity plan - set of actions when disaster

impact by

network issues

load spikes

components failure

click to edit

Scalability

user-><-server - latensy, CPU

Verticcal scaling

Horizontal scaling

increas processing power of single server

host the app in more servers that run in parallel

pros

corn

data consistency

fast inter-proccess cominications

single point of failure

economic(hardware) limits of scaling

pros

corns

more reliable

data inconsistancy

difficult to manage

RPC communications (network communication) - slower

scales lineary

HI availability = 99,999% = 5 min of downtime per year

Availability paterns

failover

replication

db & servers

relatives DB

how we shift from failing component to others

active-passive patern

there are 2 servers active(recives requests) and passive(sleep and liasten heartbeets of active), one active is fall, passive start works - it's increase latency

active - active patern

there is load balancer and both servers are managing traffic, no downtimes - if one fall another is working

paterns

primary & replica

primary & primary

split read and write requests - write to one primary DB and read from replicas

pros

corns

fast, don't affect performans

split read and write requests

problem with primary - we can scale it only verticaly

primary is single point of failure

for each server his own primary DB for read/write(which are replicas for each other), servises are scale horizontaly

corns

pros

scale write requests by adding new nodes - horizontal scaling

if one fall - another is steal alive

asyncronus replications

some transaction can lost

backups for the same data

CDN- Content Delivery Networks

послуга від Selectel, Acamai, Amazon s3, Cloudflare

географічно розподілена мережова інфраструктура, що забезпечує шв доствавку веб контенту користувачам веб-сайтів

сервери географіч розташовані так щоб мініміз латенсі та зробити шв доступ до даних

origin

головний сервер на якому зберіг вихідні дані, що роздаються через CDN

PoP - point of presence

кешуючий сервер в складі CDN

дигамічний контент - формується на сервері в момент запиту

статичний контент - зберігається на сервері в незмінному вигляді(бінарні файли, відео,JS, css)

є стрімінгові CDN, що передають дані на кешуючі CDN(PoP)

контект розміщ не для зберіг а для кешування

для загрузки контенту використ 2 технології

GeoDNS

AnyCast

за IP-адресою запит переправ на неоюхід сервер

маршутизація відбув на свої сервери в межах регіону

зменш навантаж на осн сервер, що допом зменш пікові навантаж

CDN допомаг зниз

ISP - internet service provider

have boxes with cash - open connect box - which is local

Netflix extend concept of cash and put it on ISP

user make request to netflix with some film -> isp -> open connect box with cash without calling Netflix

need load balancer

Message/task queue - monitor heartbeats and when one of the servers don't response - redirect request to another server using CONSISTENT HASHING

Rabbit MQ

Zero MQ

JMS

Monolit architecture

can run on diff serverse and have few DB

pros

more context required - a lot of logic and processes for newcommer

complecated deployments

too much responsebility on each server - SINGLE POINT OF FAILURE

Microservices

one microservise - cover one buissiness neddes. Conected with DB

PROSES

corns

STACKOWERFLLOW

GOOGLE

FACEBOOK

AMAZON

DB

Horizontal partitioning - buy id

vertical partitioning - by columns

sharding - partitioning - divide data, one pice - 1 server (by location, id....)
consistency
availability

sharding

problems

join - cros schunks - get data from two separete shard is very expencive

memhashed (not only about consistent hashing can be applyied on bussiness logic) can't have flexibele ammount of shade - have FIXED AMOUNT OF SHARD -to fixe it can use HIEARCHICKAL SHARDING(split one shard to smaller pieses) - it gives flexability

advantage

good peformance - get data in scope of one shard

use index - high performance

have master slaves for reading, writing - master. When master fall can choose master from slaves

NETFLIX

click to edit

video

different formats - codac the way in what you commpress video

different resolutions

Netfix breakes video into shots (4s) -> collect them into scenes

Amazon s3 - netflix use to store video content - static data

fast access

different content for the diff zones

handle 90% of requests

most popular content

can update content of the boxes

Tinder architecture

Main points

click to edit

reccomend matches - number of active users

note matches -~10^-3 of active users

store profile (Images - 5 per user)

dirrect messaging

how to store images

file

blob - binary large object

DB gives us

transaction - ACID

indexes for search

mutability

unnesessary for images

access control

useless for images

useless for images - image saves in format of 0 and 1

useless - you can get same secure mechanism for files as for DB

Gave us

faster - save images separetly, quick access

CDN

cheeper

save parh to image in table but image stores in DFS - distributed file system

user -> reqest (update name) need to have autentification process(name, token) -> Gate way service-> profile servise -> DB->...-> response -> user

user-> request - update images -> gateway serv-> image servise-> DB (store user id, image id, name, path to DFS) and DFS

user1->request (messageTo(user2) XMPP or HTTP-> gateway-> ..->XMPP(peer-to-peer protocol - it push user when message is comes but not user pushes server)

connectiong - TCP or web socket

every user has his conection id

user-> gateway service-> session servise->DB - store user id and connection id

Mather servise - DB keep user id to user id

Gateway-> Profile service -> DB (gender, age, location) - better use distributed DB (Casandra/ Amazon Dynamo) or Sharding in relative DB(horizontal partitioning) we use sharding to get better matching by location - we pull one small chank with our location and filter them by age and gender

for partitioning need to have master slave architecture with copies

sharding is more complicated when Casandra which gives all that fichures in one place

matchers service -> DB which store people what maches and update their location every ~2 hours

Screenshot 2023-03-23 at 17.47.17

Distributed cashing

cash pros

decrease network calls

avoid recalculations

reduce DB load

to speed up responses

save most relevant info, predicted data

cash polisy -adding and evicting(delete) data from/to cash

LRU

Least resently used

new put on top, delete from bottom

Sliding window cash policy

cash corns

trashing - inputing - outputting cash without reading results

data inconsistancy

extra call (if data not in cash)

where to save

close to the DB

close to the serverse even inmemory of the server

benefits

click to edit

click to edit

faster

separate

global cashe - IT IS RADIS

accurate, more resiliant for crashes

scale cashe independently without scaling servises

all serverse heet global cash if data upsent - heet DB

Updating data

write through

write back

check if data exist in cashe if yes - update cashe -> update DB

update data in DB -> check if exist in cashe if yes -> update cashe

in case of INMEMORY CASHE

might be inconsistensy better to use write-back or hybrid of both

WhatsApp

fichers

sent, delivered, read reciepts

online/last seen

image sharing

chats are permanent or temporary

group messaging

same as in Tinder

one to one chat

common approach - SENT MESSAGE

user A

gateways - security checks, diff users can be connected to dif gateways

sessions microservisec - prevent single point of failure

DB - save data what use connected to what box

session servise - using TCP - WEBSOCKETS(WSS)

appropriate gateway

user B

HTTP - send request from client to server and client get response, but it dont give abbility sent request from service to user

allows pair to pair communication

message status update

when session service sent mes to gateway userB it send notification to gateway userA that message sent

when userB get mes - it sent notification to session server that mes was got(delivered) -> userA

same for read notification

message stores i DB before it delivered to userB

update user status(online/last seen)

online = last activity ~20 sec ago

user - send request to gateway (requests can be made by user and by app - need some flag and track only users requests)

Last Seen servise

DB - save user id and time stamp of the last activity(in a pair)

userB - send request about curr status of userA

gateway

lastseen service

DB

... userB

group messaging

user sent message to groupA

gateway - after gateway befor session service can impliment PARSER and UNPARSER service which decrease load of gateway

Session server

  1. group server

DB - store users id which belongs to groupId (one to MAny) - using consisting hashing

sent it to sesion server

click to edit

click to edit

DB - find number of gateway of every user

to made group chats fast need limit numbers of user in the group ~200

send message to users

To avoid losing messaging - good to use MESSAGE QUEUE

NEW YEAR

deprioritizing messaging - deliver only nessesary message, exclude unnessesary fiatures

API - APPLICATION PROGRAMMABLE INTERFACE

contract that describe what you will do but not how

external function that called by user

important things

don't use additional parrametrs

return only that needed

naming of the functions

custom errors

no side effects - do only what needed

doing everything in one function

atomicity

big responce hoe to deal with

pagination

fragment the API

YOUTUBE - ESTIMATED DAILY STORAGE REQUIRAMENT

N of users - 1B

ratio of uploaders:users - 1:1000

N of uploaders - 1B:1000 = 1M

avarage length of video 10min

total uploaded video - 1M*10min = 10M min

total video size = 3 MB * 10M = 30 TB

2 hour video = 4 GB

optimized codac and compretion - 90%

2 hours video optimized size = 4:10 = 0,4 GB

1 min avrg size = 0,4:120 = 3 MB

Event Driven services

use Publisher - Subscriber module - use MESSAGE BROKER (send message asynchroniusly between servises) - Kafka

to avoid

failure latency

inconsistant data

strong coupling

click to edit

when we have multiple services which communicate and waiting for response from last to first

advanteges

simplifies interactions - easy to understand

some transaction capability - garanty when you send a message once and get response - ok it 100% will be done in future

decoupling

easaly scalable

disadvantage

idempotency still required

not good idea for banking account, financial app

poor consistency, there is not atomicity

TWITTER

DB as a message queue(between servers) - ANTIPATERN when system has a lot of users

problem

if services talk a lot between themselfs - need to clean data

large system -

polling interval

frequent polling

high load of DB

long intervals

inefficient - bad user expirience

complicated process

1 DB can not handle large load -> need add 2... DB - problem - communication between servers that belong to different DB - - need perform difficult logic

solution -

MESSAGE QUEUES

sutable

small app

mostly read operations between servers - optimize DB for readding

for using smt new we need additional cost - choose if better to leave DB or not

Amazon s3

Reliable

easy to use

Cheep

SINGLE POINT OF FAILURE

When app is not resilient

prevent

add additional nodes

as backup - when first fail 2 - start works
(good for DB)

click to edit

master-slave architecture

multiple regions

EVENT DRIVEN SYSTEMS

when some events occure -> other service made some actions

Git

React

Node.js

Game servers

all services store events in their DB

server ->event->event bus->server...

advanteges

availability

easy roll-back

problems

consistency

replacements

Transsactions

stores intent

n/a gateways

less controll

hidden flow

migration

all services publish events

REQUEST_RESPONCE ARCHITECTURE

services asks for data, for...

NoSQL DB

SQL DB

we have foreign key mapping, have relationship between objects

store data in format - key-value(blob of data(object in JSON format))

insert all value for one id at same moment

select all info about object by request

use join for data select(expensive operation)

cons

expensive join operation

pros

cheep

easy insert and read

flexible

value - it just JSON, and no matter schema

object has straight schema

schema is easily chanchable

built for scale

built for finding metrics/getting inteligent data/agregations

cons

consistency

not read optimized

not good for lot of updates

pros

Consistency

ACID

ACID is not garanted

relations are not implicit

joins are hard (all manual)

relations between obj

simple joins

CASANDRA(FB)

id = key

casandra claster consist of 5 nodes, Keyes destribute by hash function

gives us load balansing (save data replicas in 2 or more nodes)

redundancy/replicas

QUORUM

if quorum = 2 - request to node5-> save data in 5, 1, 2 -> 5 crash-> serch data in 1 and 2. If both dont have data -> error, if have-> send responce

Request to CASANDRA

save data

data format - key-value pair

store data in a memory as a log file

periodicly this data damp in sst table(sorted string table) - ley is sorted

sst is immutable

update

update will create copy of the record with timestamp - if you want to read - need take data with latest timestamp

to deal with that

COMPACTION

splash SST to save space

remove

set flag tumstone(ts)

thats mean thats record is dead

Health service

check heartbeats of other services - two ways heartbeats

every~ 5 sec ask server is it life-

if NO

1 NO - mark as suspitious

2 NO - mark as dead

-> Load balancer

ask someone restart server

redirect requests

every `5 sec server send message - i am alive

Master-slave architecture

make copy of DB

ways of pulling data from master to copy(slave)

synchronous

asynchronous

any db that take write operations - is MASTER

SLAVE - for read

MASTER - for write

Master-master architecture

problem

server that answer for db comunication is broken and both db think that they are only one master

SPLIT BRAIN

Solve

add 3 DB - when connection between A and B is broken ->write operation to A -> path update to B -> cant connect-> path update to C-> C is updated write operation to B->path update to A->cant conect->path update to C -> compare previous state of B with own -> they not same-> B roll back transaction-> update state to C state -> write new data and only than C take update from B....

Distributed consensus

way in which few DB agree in a particular value

2PC

3PC

MVCC

SAGA

Instagram

Fitures

Store/get images

Like and comment image

Follow someone

Publish a news feed

Similar as Tinder

can you like comment

click to edit

can you live comment on comment(recursivly)

+

-

Post table

Comment table

Like table

timestamp

user id

activity id (id of post or comment)

active or delited

Type(comment or post)

id

text

timestamp

user id

id

text

timestemp

click to edit

image

Activity table

id

likes

id

Table

followee id

timestamp

follower id

user - getUserFeed(id)

using HTTP

gateway(reverse proxy)

internal protocol

UserFeedSeervice - need multiple servers to make system resiliant

Load balancer(use hashing of id to figure out what service needed)

PostsServ (multiple)

FolowServ(multiple)

getUsersFollowedByUserId(id)

getPostsByUser(id)

limit post to 20

Better way - PRECOMPUTE feed

when user from list of folowee add new post -> PostService -> UserFeedService-> update feed for user with limit of 20

place to save feed for every user - in DB or cashe

Post of celebrity

push updates in batch

pull updates by users

hybrid

push approach

pull aproaacah

Problem with failure

cascading failure problem

way to avoid

request queue for servers - limit number of requests to server

click to edit

set capacity for every server - request per second

if queue gets more requests then capacity -> request falied but others stay in a queue

message that get user with failed request with error

errors

temporary

permanent

then crash 1 server -> distribution of load for existing servers-> crash 2 server->...

viral/black Friday event(predictable load increase)

solutions

autoscale

rate limiting

rescale

job scheduling

send email to 1M users

break job into smaller pieces - chanks of users

celebrity(popular) post

batch processing order

add jitter

don't show all metadata - just show an approximation(aproximate statistics) - general tendency - it decrees number of requests to DB

more solutions

cashing(key-value)

auto scale

batch processing

pre scaling

aproximate statistics

rate limiting

gradual deployments

coupling

decriase load of the authentification service-> save user token and name per session

Food delivery

Location based DB

consepts of zipcodes

measurable distance

proximity - find people in radius ?km

use quad tree or R-tree

uniform assigment

scalable granularity - brake regions on smallest regions

Virtualisation and containers

application need

memory

processor

i/o

disk

TikTok

Fitures

Storage for video

distribution

ingestion of videos(upload)

click to edit

need ask about

fault tolerance

high available

consistensy

performance of upload - low SLAs

need eventually

+

low latency between upload and user visibility, low latency when distributed video on stream

10M viewers and 100k people who download video

user app

user data, user metadata, user video

tiktok service

DB

user profile

user metadata

video

key-vslue

casandra, redis

amazon s3

click to edit

MySQL

put all requests fro the user to the queue

gives us

availability

CDN

reliability

file storage

faster read operation then MySQL

schema flexible

need to save different formats and resolutions of video

storage - 200K video per day
200K*1Mb = 200GB for one format - for 4 formats we need 600GB - some videos would be smaller
diff resolutions ~ 1,2 TB per day

click to edit

in parallel

  1. split video in two chanks -> convert each chank in different formats(if video short - less than 10sec -> we will not split it)

convert that formats into different resolutions

after that all the chanks will be combined together( result 8-16 copies depends on amount of resolutions)

save video and diff formats and resolutions to s3 in diff regions(CND)(save replicas in diff zones)

click to edit

Remote code execution engine

take code from different users across the world to the remote server with some problem statement, it has test cases witch will give some answer(accept or not) and results are recorded

it's system which can evaluate your code

problems

lot of requests

rate limiting

use message queue, create events

bad user expirience

user get response that their file was uploaded and waiting for the processing

file execute later when server has capacity - asynchronous processing of events

check output file and write results to the DB

security

solve -

virtualization - create virtual system in which code can run

containers - create for every request

CAP teorem

CONSISTENCY(DATA)

PARTITION TOLERANCE

AVAILABILITY

ACID

CONSISTENCY

ISOLATION

ATOMICITY

DURABILITY

2 phase commit

have veraity of servers - 1 is leader and others followers - leader made update, others only read

leader sand request with commit to other servers - they give ACK(we get) to leader -> leader send command with prepare statements - commit _> others give ACK

1 phase

prepare

eventual consistency

corns

eckspensive

low performance

need put locks to read and write operations on rows that in transactions

Managing system health

Metrics

number of requests

error rate

request queue size

click to edit

number of processes

sent all metrics to some engine that analyse them - then some exident happen engine send message

implement Service Mech - side car

takes metrics from all services and put them to the engine

need to built isolation tree - tree of decision when appear risky situation

Streaming events(video)

protocol - RTMP

TCP

reliable(guaranted delivery)

comparatively slower

sequencing of data(packets arrive in order)

UDP

comparatively faster

non-reliable(no guaranted delivery)

no sequenced delivery

Real time Messaging protocol

used by Facebook, Youtube life

Transformation service

converting to different resolution and formats

need Job Assigner(message queue) - that will create job that convert all videos

assign jobs to different workers

when job is done - sent message with results to the message queue(other)

and save new format of video in DFS

subscribers will took video from queue

send notification to the Transformation service that job job os done

workers pulls another task from first queue

SINGLE CODE REPOSITORY

CODE BROKEN INTO PICES(SERVICES)

easier to scale

esear for new team member

separation of concerns

coding is easear

working in parallel

easier deployment

when moved to microservice

when team is scaled, big size, different positions

contracts are in one place

less complex system

less dublicates - code in the same box

good for small team

faster

breaking changes

click to edit

need scilled architects

click to edit

not easy to design

STEPS OF MIGRATIONS FROM MONOLITE TO MICROSERVICES

Steps

Integrate DBs

redirect users

Build services

EASEAR WAY

then create new features

slip it to new services with own DB

  1. define contracts for each microservice
  1. need routers - load balancer - help talk services with each other
  1. simplify deployment(automated deployment)

CI/CD pipeplines tools jenkins

service dashboards

containers

  1. communication between services
  1. logging

messaging queue

batch messaging

request-response architecture

NEED TAKE CARE OFF

EVERY COMPONENT HAS SINGLE SOURCE OF TRUTH

when you have dedicated datastore for every microservice

Condence bussines responsibilities to the single place

initial infrastructure cost is high

amount of people

company

big company

startup

1-2 person per service

2 person per service

4 person per service

System design is the process of defining the architecture, interfaces, and data for a system that satisfies specific requirements.