Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data Pipeline
Data Pipeline
Tasks
Introducing Kafka
Purpose
Streaming logs
Go production
Fluentd to Kafka
https://github.com/fluent/fluent-plugin-kafka
TODO
Understanding Kafka infrastructure and limitations
Lab
Using Kafka with Docker
Testings
Data retention
Consumer rebalancing
Monitoring
Introducing Secor
Purpose
TODO
Data cleaning :!?:
Database data
Event log (API log)
Metering log
Access log (Access log)
Providing data with current infrastructure
Cost estimate
Data serialization
Data quality monitoring
Goal
Realtime
High data quality
High availbility
Low coupling
Scability
Pipeline
http://xyz.insightdataengineering.com/blog/pipeline_map.html
https://databricks.com/blog/2016/10/11/using-aws-lambda-with-databricks-for-etl-automation-and-ml-model-serving.html