Druid architecture

Real-time nodes

Ingest & query event streams

Events indexed via these nodes are immediately
available for querying

ZooKeeper to coordinate with the cluster

Kafka stream of events to ingest real-time data as single end-point

Historical Nodes


periodically hand off immutable
batches of events they have collected to the cluster

load and serve the immutable blocks of data (segments) created by real-time nodes.

they are the main workers in a druid cluster

shared-nothing architecture

no single point of contention

load/drop/serve immutable segments

announe their state and the data they serve to ZooKeeper


Historical nodes can support read consistency because they only deal with immutable data

Broker nodes


query routers to historical and real-time nodes

contains a cache with an LRU invalidation strategy


In the event of a total Zookeeper outage, data is still queryable. If broker nodes are unable to communicate to Zookeeper, they use their last known view of the cluster and continue to forward queries to real-time and historical nodes. Broker nodes make the assump-tion that the structure of the cluster is the same as it was before the outage

Coordinator nodes


in charge of data management and distribution on historical nodes


tell them to load new data, drop outdated data, replicate data, and move data to load balance.

The coordinator nodes load a set of rules from a rule table in the MySQL database.

Data storage

Segments

Data Query

Boolean operations on Bitmaps