Please enable JavaScript.
Coggle requires JavaScript to display documents.
Day1, Day4 - Coggle Diagram
Day1
RDD
Lineage Graph
Action and Transformation
Lazy Evaluation
Spark
Features of Spark
Benefits of Spark Over MapReduce
Big Data, HDFS, MapReduce, Cluster,Hadoop, Yarn
Batch and Stream Processing
Day4
PySpark
Day2
Core and Threads, Partition and Slots
pyspark
Spark Session
Spark Context
Spark Architecture
Task
DAG
Cache
Task Scheduler
Transformation
Day3
Create Spark Session
Default
Set Partition
Existing
Shared Variable
Broadcast
Accumulator
RDD
Create
Delete
Transform
DAY5
Split
MapType
Array Type in pyspark
Row method
Name alignment
Class based
Select
Day 6
ACID Transaction
withColumn
ORC file
Pyspark Filter
GroupBy
DAY7
Casting
Applying condition in data frame
Group By in multiple column
Applying condition in data frame
Window Function
DAY8
Caching
Persisting
Union
Different level of storage
Sorting
User Defined Function(UDF)
Dataframe
JSON
Read from RDD
CSV
Parquet
Dataframe to parquet file
Convert pyspark data frame into pandas data frame
Create RDD from text file
Repartition
Coalesce
Nested StructType
StructType and StuctField