Please enable JavaScript.
Coggle requires JavaScript to display documents.
APACHE KAFKA - Coggle Diagram
APACHE KAFKA
PRODUCER
-
Parameters for ProducerRecord (topic,broker,key(optional),partition_id(optional))
-
Response message: topic, partition, and the offset of the record within the partition. If the broker failed to write the messages, it will return an error.
Producers write data to topics.
- Producers know which broker and partition to write to
- Write is done only in the leader replica broker
- In case of leader broker failure one of the brokers in the ISR will take the lead
Producers can choose to receive ack of data writes
- ack=0 will not wait for ack
- ack = 1(default),will wait for leader ack
- ack = all, all ISR acknowledge the receipt of data
CONSUMERS
-
Consumer Group: -split up the work of each consumer within that group
- more partitions then consumer -> 1 consumer read more partition
- 2 groups with 1 consumer each and 2 partitions -> each consumer read the 2 partitions
-more consumers then partition -> consumer remain idle
-one consumer 4 partitions, consumer read from all partitions
-consumer in same group can't read same partition
Group Coordinator - broker which receives heartbeats.
- every consumer group has a group coordinator.
- if consumer stops sending heartbeats ,coordinator will trigger re-balance of partitions in the group.
GROUP LEADER - responsible for assigning a subset of partitions to each consumer.
- When a consumer wants to join a consumer group, it sends a JoinGroup request to the group coordinator.
- The group leader receives a list of all consumers in the group from the group coordinator
- The leader is the only client process that has the full list of consumers in the group and their assignments.
- After deciding on the partition assignment, the consumer group leader sends the list of assignments to the GroupCoordinator which sends this information to each consumers
Offsets in Consumer groups
When a Consumer reads the messages from the Partition it lets Kafka know the Offset of the last consumed message.
Enable.auto.commit (default true)
-
-
-
Partitioning strategies
(how you know in what partition to write so that balance data and not overload a node?)
Own Strategy (Client side): you can send the data to a specific partition (partition id is specified);
Use messages with Keys: All messages belonging to the same key will be placed in the same;
partition. When keys are present, each key messages are stored in the same partition (murmur hash)
-
-
PARTITION
LEADER
One partition has 1 leader and multiple ISR ( insync replicas). Please note the leader is part of ISR.
-
-
For each partition, we store in Zookeeper the current leader and the current ISR.
What if all replicas fail? What leader will be select?
Choose the first replica (not necessarily in the ISR) that comes back to life as the leader. Set unclean.leader.election.enable = true;
- Topics are split/saved on disk in partitions (the way to save data in a distributed system)
- A partition is a single log, which resides on a single disk on a single machine (it may be replicated)
- In each partition each message gets an unique id (incremental) called offset
-
DELETION
-
Log compaction provides an alternative such that it maintains the most recent entry for each unique key,
rather than maintaining only recent log entries.
TOPICS
- Similar to table concept.
- Each topic has (name, replication factor, number of partitions)
-
Offsets: unique per topic + partition (each partition has its own set of offsets).
Message is uniquely identified by (topic, partition, offset).
-