Please enable JavaScript.
Coggle requires JavaScript to display documents.
Apache Beam - Coggle Diagram
Apache Beam
Pipeline
PCollections(data set)
TextIO
PTransform(operations or steps)
ParDo(generic parallel processing)
Function
Formatting or type-converting of each element in a data set
Filtering a data set.
Extracting parts of each element in a data set
Computations on each element in a data set.
Implement
MapElements
(one-to-one mapping accept lambda function)
DoFn in-line
(Lightweight)
DoFn Class( logic for each element) :fountain_pen:
GroupByKey(aggregattion)
Output as Pair(Map) of unique key and a collection
CoGroupByKey(Join multi data set)
Combine (Join and Merge as 1 Value)
CombineFn
Create Accumulator
Add Input
Merge Accumulators
Extract Output
Flatten
BeamSql API
BeamRecord(Convert from Pojo Entity)
PipelineRunner
DataflowPipelineOptions
project id
Staging location
Runner
From Command-line
PipelineOptionsFactory.fromArgs(args).withValidation().create();