Please enable JavaScript.
Coggle requires JavaScript to display documents.
Big Data Engineering for Analytics - Coggle Diagram
Big Data Engineering for Analytics
Big Data
Characteristics
(2 p22-27)
Variety
Volume
Velocity
Variability
Veracity
Value
Architecture / Layers
(2 p33,34)
(3 p8,18, 19)
Storage
NoSQL
(6 p23)
Types
Document store
(6 p26, 32, 33)
Graph store
(6 p27, 34)
Column family store
(6 p25, 31)
Key-value store
(6 p24, 29, 30)
Distribution data style
(6 p36-39)
Sharding
Replication
Master slave
Peer to peer
File formats
(13 p4-41)
CSV
JSON
Avro
Parquet
ORC
Ingest
Batch
Real time
Processing
Analytics
Hadoop
(3 p30-43)
Apache Spark
(4 p5-8)
Spark Core
(4 p12)
Abstraction layers
RDD
(4 p13-14)
(8 p5-58)
Operations
(4 p30-32)
(8 p6-32)
Transformations
Actions
fault-tolerant collection of data elements partitioned across the cluster nodes
DAG
implements stage-oriented scheduling
Architecture
(4 p21-27)
Spark Driver
Spark Executors
Cluster Manager
Standalone
YARN
Kubernetes
Execution modes
Cluster
Client
Local
Spark Streaming
(4 p17)
Spark ML and MLlib
(4 p16)
(10 p20-26)
Spark SQL
(4 p15)
(7 p3-14)
Kubernetes
(3 p45-49)
Business Analytics
(2 p11-19)
Analytics Lifecycle
(14 p14-26)