Please enable JavaScript.
Coggle requires JavaScript to display documents.
APACHE SPARK (Solve a ML problem (a DATAFRAME will be returned with SCHEMA…
APACHE SPARK
Solve a ML problem
- a DATAFRAME will be returned with SCHEMA
3a. datatype of each columns would be string, float, etc...
- notice that SQLContext is needed to read a csv file. Similar to read a csv file in pandas
- call method for a DataFrame to process data
- Download necessary packges such as spark-csv, ...
-
- DataFrame with continuous && categorical features
a program
- Apply MAP/REDUCE functions by calling methods of a RDD
2a. We can specify the number of partitions OR Spark will automatically specify the number of partions of a RDD for us
-
- use the SparkContext to call a function to convert input to RDD
-
- create a SparkContext (built in Spark Shell OR import in PySpark)
DataStructure
-
RDD
-
-
-
Operations
Transformations
-
only apply, NOT change RDD
-
-
-