Druid architecture
Real-time nodes
Ingest & query event streams
Events indexed via these nodes are immediately
available for querying
ZooKeeper to coordinate with the cluster
Kafka stream of events to ingest real-time data as single end-point
Historical Nodes
periodically hand off immutable
batches of events they have collected to the cluster
load and serve the immutable blocks of data (segments) created by real-time nodes.
they are the main workers in a druid cluster
shared-nothing architecture
no single point of contention
load/drop/serve immutable segments
announe their state and the data they serve to ZooKeeper
Historical nodes can support read consistency because they only deal with immutable data
Broker nodes
query routers to historical and real-time nodes
contains a cache with an LRU invalidation strategy
In the event of a total Zookeeper outage, data is still queryable. If broker nodes are unable to communicate to Zookeeper, they use their last known view of the cluster and continue to forward queries to real-time and historical nodes. Broker nodes make the assump-tion that the structure of the cluster is the same as it was before the outage
Coordinator nodes
in charge of data management and distribution on historical nodes
tell them to load new data, drop outdated data, replicate data, and move data to load balance.
The coordinator nodes load a set of rules from a rule table in the MySQL database.
Data storage
Segments
Data Query
Boolean operations on Bitmaps