Please enable JavaScript.
Coggle requires JavaScript to display documents.
Apache Spark, Francisco Osejo - Coggle Diagram
Apache Spark
Data Types
Dataframes
Datasets
RDD
Accumulater
Broadcast Variable
Vectors
Primary Types
Byte
ShortType
IntegerType
LongType
Setup
Cluster Mode
Stand-alone Mode
Local Mode
Managed Services
Databricks
Google Dataproc
Machine Learning (Mlib)
Supervised Learning
Classification
Logistic Regression
Decision TreeClasssifier
GBTClassifier
MultilayerPerceptronClassifier
LinearSVG
OneVsRest
NaiveBayes
Unsupervised Learning
Regression
Linear Regression
GeneralisedLinearRegression
DecisionTreeRegression
RandomForestRegressor
GBTRegressor
AFTSurvivalRegression
IsotonicRegression
DeepLearning
MultilayerPerceptronClassier
TensorFrames
BigDL
Deeplearning4
DeepLearning Pipelines
NLP
Spark NLP
Tokenization
Normalizer
Stermmer
Lemmatizer
Reger Matching
Data Matcher
Chunking
Data Source
Streaming
Kafka
file
socket
SQL
Relational DBs
Hive Tables
NoSql
MongoDB
Casandra
HDFS
Cloud Storage
Amazon S3
Azure Blob Store
Google Cloud Storage
Programming Languages
Scala
Python
Java
R
Spark Streaming
Structured Streaming
Legasy Streaming
Use Case
Real Time Reporting
Incremental ETL
Notifications And Alerting
Serving ML model
Spark API
Structered API
Dataframes
Datasets
SparkSql
Low Level API
RDD
Francisco Osejo