Please enable JavaScript.
Coggle requires JavaScript to display documents.
Apache Kafka A high-throughput distributed messaging system (內容 (基本概念…
Apache Kafka
A high-throughput distributed messaging system
基本資料
講員:
Chen-en Lu
投影片
錄影
內容
什麼是Kafka
LinkedIn開發
消息系統
Queue
Pub/Sub
使用Scala語言開發
特色
Scability
水平擴展
Topic is partitioned(shared)
High Availability
Partition 可以被 replicated
High Throughput
Durability
所有資料都被保存(persisted)
循序讀寫(類似log file)
消費端會保存消息的編移(offset) 類似 file descriptor
log 檔案可以被 rotated(類似logrotate)
過期的消息只能被刪除(類似logrotate)
支援批次載入和即時使用
大數據是件中的定位
設計類似
Message Queue
實作類似
分散式 Log 擋案
基本概念
實體組成元件
Producer: 發送消息到 Broker
Consumer: 從 Broker 消費消息
Broker
Kafka Cluster 中的一個節點
ZooKeeper: 協調 Kafka Cluster 和 Consumer Group
邏輯組成元件
Topic
The named destination of partition?
Partition
一個 topic 可以有多個 partition
平行處理單位
Message
Key/Value Pair
Message Offset
情境
Queue(One Partition One Consumer)
Pub/Sub(One Partition Multiple Consumer)
Multiple Partition
Consumer Group
Group of Worker
共享 offset
offset 同步到 zooKeeper
自動平衡負載
Log Compaction
Replication
Replication單位是 Partition
每個Partition有一個Leader和0+ Follower
讀寫都是透過 Leader
消息傳遞
Broker to Consumer
At least once - Store the offset after handling the message
Exactly once - At least once + Idempotent operation
At most once - Store the offset before handling the message
Producer to Broker
At most once - Async send
At least once - Sync send (with retry count)
Exactly once! - Idempotent delivery (v0.9+)
情境
At most once: Messages may be lost but are never redelivered.
At least once: Messages are never lost but may be redelivered
Exactly once: each message is delivered once and only once. (this is what people actually want)
消息排序
為什麼Kafka很快?
使用Kafka進行編程
應用場景
Stream Processing
Stream Processing Framework
Strom
Samza
Spark Streaming
分散式日誌收集
Logstash
Flume
Lambda 架構