Please enable JavaScript.
Coggle requires JavaScript to display documents.
Kafka, Anotação 2020-08-29 153008 - Coggle Diagram
Kafka
Use Cases
Messaging
Activity Tracking
Metrics Gatherner
Logs Gatherner
Stream processing (Streams API / Spark)
Integration with Big Data (Hadoop/Flink/Storm/Spark)
Consumers
Read data from topics
Auto recover on Broker failure
Automatically know which
broker and partition to read from
Data is read in order
Groups
Each consumer from a group
read exclusivily from a partition
Offsets
Delivery semantics
At most once
At least once
Exactly once
offsets topic
Brokers
(Kafka Cluster)
Analogous to servers
Identified by ID (integer)
Holds
Partitions
Golden Rule
3+ Broker
3+ Replicas for each Partition
Leader Broker
Can be only one for a
given partition at each time
The only one that can receive
and serve data for a partition
Zookeeper auto manage it
Producers
Write data to topics
Automatically know which
broker and partition to write to
Auto recover on Broker failure
Acknowledgments
acks=0 no ack
acks=1 only leader ack
acks=all all replicas ack
Message Keys
If not present data is sent round robin
if present all messages go to same partition
Why
Horizontal Scale
High performance/ low latency / real time
Distributed, resilient architecture, fault tolerant
Proven! (Linkein, Uber, Netflix, Walmar)
Topics
made of
Partitions
Data is assigned randomly to a partion unless a
Key
is provided
Data is kept for Limited Time
Data is Immutable
Offset
(Incremental ID for each message)
Ordered
Identified by a name
Similar to a table without constraints
Particular Stream of data
Zookeeper
Manages brokers
Leader election