Please enable JavaScript.
Coggle requires JavaScript to display documents.
DP-203 - Chapter 8 - Ingesting and Transforming Data - Coggle Diagram
DP-203 - Chapter 8 - Ingesting and Transforming Data
Transforming data by using
Apache Spark
Resilient Distributed Dataset (RDD)
:star:
DataFrame
:star:
Dataset
TSQL
ADF
Schema transformations
select
aggregate
derived columns
Row transformations
filter
alter row
sort
IO transformations
union
join
conditional split
ADF templates
Synapse Pipelines
Stream Analytics
Chapter 10, Designing and Developing a Stream Processing Solution
Scala
Cleansing data
missing/null
substituting with default
filtering out
trimming
derived column -> trim()
standardizing
outliers
derived column: subtitute with avg or median value
deduping
Splitting data
Conditional split
Cloning
(new branch)
File splits
round-robin
hash
Dynamic range
Fixed range
Key
Shredding JSON
extracting values from JSON using
Spark
SQL
ADF
flatten transformation
unroll by
Encoding and decoding data
Errorhandling
ADF
Sink
Setting tab:
error row handling
activity
Failure output
Normalizing and denomalizing values
denormalizing
(rows to columns)
Pivot
normalizing
(columns to rows)
Unpivot
Exploratory data analysis (EDA)