Please enable JavaScript.
Coggle requires JavaScript to display documents.
Chapter2: Basic2 - Coggle Diagram
Chapter2: Basic2
-
Types of
data platform
Batch
processing
platform
Definition: computer completes batches of jobs, simultaneously, non-stop, sequential order. Large jobs-> small parts-> efficient for debugging.
- enable batch processing function without interaction at specific time need.
- Monitor: one job delays, next job cannot start, monitor will generate an exception.
Stream
processing
Definition: data is collected and processed in realtime or near realtime.
- Require: low latency, high throughput processing.
- Stateless streaming: current and previous process are independent.
- Stateful streaming: current and previous process share the same state/same processing.
- Functionality: process unbounded data (taking fixed chunk), data manipulation: simple operations, aggregation.
Lambda
architecture
Definition: mixture of batch and stream processing
- 3 layers: Batch, Stream, Serve layers (combine the result from B&S to get general insights)
Use cases: combined historical and streaming data processing (IoT to detect abnormal issues, Fraud detection, Social media recommendation...)
Kappa
architecture
Definition: solely for streaming processing
- 2 layers: streaming a serve layers.
-