Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data Pipeline (Considerations in Design p158 (Security (encrypting,…
Data Pipeline
Considerations in Design
p158
Timeliness
Reliability
Scalability
Data Format
Transformations
Extract-Transform-Load
downstreams lose truncated information
save time and storage space
Extract-Load-Transform
maximum flexibility of transformation for downstream
store maximum information as source data
spread cost of data transformation to downstreams
Security
encrypting
authenticating
authorizing
Failure Handling
prevent faulty records as early as possible in the pipeline
recovering faulty records
Coupling and Agility
decouple the data source and data targets
Ad-hoc pipelines
Loss of Metadata
Kafka Connect
p164
scalable and reliable way to move data between Kafka and other datastores
runs as a cluster of worker processes moving data in parallel more efficiently
REST API to manage connectors
run Connect on
separate servers
other than brokers