Please enable JavaScript.
Coggle requires JavaScript to display documents.
LIL - Architecting Big Data Applications: Real-Time Application Engineering
LIL - Architecting Big Data Applications: Real-Time Application Engineering
Real-Time Big Data
Real-Time Big Data
What is real time?
response time
"latency"
time take for data to move from the point of data collection to the point of insight-based action
ex
time for login to be validated
Cost drivers
Hardware
clusters
Sotfware
parallelism
Network
latency
Engineering
Real-Time Challenges
4V's
Volume
Velocity
Variety
Veracity
Differing throughput capabilities
User exeprience
should not be affected by pipeline issues
Strategies for Real-Time Big Data Processing
Synchronous vs Asynchronous
Asynchronous
How?
Source places the request
Does not wait for a response
After processing, the pipeline pushes the response asynchronously to the source
Characteristics
Design for average load
Focus on avg response time
Not as expensive
Synchronous
How?
Source places request
Waits for response
The pipeline needs to process the request and respond back
Characteristics
Design for spikes
Focus on maximum response time
Very expensive
Parallel Processing
Build horizontally scalable systems that maximize parallel processing
Beffering Queues
Use beffering queues between producers and consumers to adjust for differences in throughput
Stateless Services
Service components in the architecture should not store state
They receive requests, process, respond, and forget
This makes them candidates for horizontal scaling behind a load balancer
Data Clusters for State
Store state in data clusters that have a shared-nothing architecture
They provide state storage while scaling for big data needs
Time-to-Live (TTL) Mgt
Data in real-time networks is valid only for a specified time
Drop data that is stale to reduce load