Please enable JavaScript.
Coggle requires JavaScript to display documents.
EMR - Coggle Diagram
EMR
Apache Spark on EMR
-
-
-
-
-
-
-
-
Running EMR
-
-
-
-
job orchestration (Oozie, airflow, aws step functions etc)
-
-
-
Intro to Apache Spark
-
-
-
-
Architecture
Spark Core
-
Advance file format like parquet, orc etc
-
-
API - Dataframes
are successors to RDD - available in Scala, java, pays-ark, R
-
-
more optimisations - catalyst optimiser, reduced serialisation tasks, better GC
-
-
-
-
-
-
-
Oldest spark data structure - available in Scala, Java, Pyspark
-
-
-
-
-
-
-
-
What is EMR
Easily run spark, hive presto, base, Flink and more big data apps on AWS
Support for Popular OSS Like spark, flink, Hudi, iceberg
-
-
-