Please enable JavaScript.
Coggle requires JavaScript to display documents.
Big Data - Coggle Diagram
Big Data
Evolution of Big Data Technologies
Traditional Databases
Data Warehouses
Hadoop + MapReduce
Hive/Pig
Spark
Kafka/Streaming
Cloud Analytics
AI + Real-Time ML
Big Data Pipeline
Data Collection
Data Storage
Data Processing
Analytics
Visualization
Decision Making
Intro
Why Traditional Systems Fail
Problem
Vertical Scaling (Scale-Up)
Solution
Horizontal Scaling (Scale-Out)
Distributed Systems
What is Distributed Computing
Why Distributed Systems Exist
What is a Distributed File System
Traditional Databases vs Big Data Systems
Characteristics of Big Data
Volume
Variety
Velocity
Veracity
Value
Variability
What is Big Data
Why Big is Relative
Data Vs Information
Hadoop
Distributed storage
Distributed processing
Core Components
HDFS
Responsibilitie
Replication
Fault tolerance
Storage
Architecture
NameNode
DataNode
Blocks
Why Blocks
Replication
Why Replication Matters
MapReduce
MapReduce Stages
Input Split
Mapping
Shuffling
Reducin
Limitations of MapReduce
SPARK
What is MapReduce
YARN
Responsibility
Resource management.
CPU allocation
Memory allocation
Job scheduling
Cluster monitoring
Important Components
ResourceManager
NodeManager
Container
ApplicationMaster
Hadoop Ecosystem
Storage
HDFS
Batch processing
MapReduce
SQL analytics
Hive
Why Hive
HiveQL
SQL like language
Internal Workflow
Hive Query
Hive Compiler
MpaReduce/ Spark Job
HDFS
Data scripting
Pig
Why Pig
Pig Vs Hiv
NoSQL database
HBase
Why HBas
Streaming
Kafka
Resource management
YARN
Data transfer (Import/Export Data)
Sqoop
Fast Processing
Spark
Log Collection
Flume
Why Hadoop was created
Why Hadoop Ecosystem Expanded
Slow processing, Writes intermediate data to disk
Complex Java coding, Difficult for analysts
Batch-only processing, No real time analytics
Real Time Analytics
Batch Analytics
Streaming Analytics
Cloud Big Data
Why Cloud
Popular Platforms