Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data Pipeline (Tasks (Introducing Kafka (Testings (Data retention,…
Data Pipeline
Tasks
Providing data with current infrastructure
Introducing Kafka
Purpose
Streaming logs
Go production
Fluentd to Kafka
https://github.com/fluent/fluent-plugin-kafka
TODO
Understanding Kafka infrastructure and limitations
Lab
Using Kafka with Docker
Testings
Data retention
Consumer rebalancing
Monitoring
Introducing Secor
Purpose
TODO
Data cleaning :!?:
Database data
Event log (API log)
Metering log
Access log (Access log)
Cost estimate
Data serialization
Data quality monitoring
Goal
Realtime
High data quality
High availbility
Low coupling
Scability
Pipeline
http://xyz.insightdataengineering.com/blog/pipeline_map.html
https://databricks.com/blog/2016/10/11/using-aws-lambda-with-databricks-for-etl-automation-and-ml-model-serving.html