Please enable JavaScript.
Coggle requires JavaScript to display documents.
BIG DATA (Only adds value when analysed (Predictive (neural networks,…
-
Hadoop Ecosystem
-
-
-
Data management
HDFS, HBase, YARN
Data access
MapReduce, Hive, Pig
Data ingestion and integration
Flume, Sqoop, Kafka, Storm
Data monitoring
Ambari, Zookeeper, Oozie
Data governance and security
Falcon, Ranger, Knox
-
NoSQL
CAP
Consistency, Availability, Partition
ACID
Atomicity, Consistency, Isolation, Durability
-
-
-
Schema-less, do not define strict data structure
-
Key-value, redis is an example
Document, XML, JSON, EDI, SWIFT
Column, row with many columns
Graph, node edges and properties on both odes and edges
Apache Spark
-
-
-
-
-
-
Spark SQL, Spark Streaming, MLlib, GraphX
-
Use Cases
Big Data in Banking
-
-
Customer management
using customer 360 degree view for driving sales
boosting retention
improving service + identifying needs
-
-
-
-
Data ingestion and integration
Flume, Sqoop, Kafka, Storm
-
Sqoop (SQL to Hadoop)
support bulk import of data into HDFS from structured data stores such as RDBM
sqoop has connectors for popular RDBM systems
Flume, streaming data. Log files or social media to HDFS
Kafka, distributed streaming platform
lets you publish and subscribe to streams of records
Storm, real time message computation system
-
-
Hadoop
-
-
Apache Hadoop
open source software framework for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware
-
-
Data monitoring
Ambari, Zookeeper, Oozie
Provision, manage, monitor and operate
Ambari, provision, manage and monitor hadoop clusters
Oozie, workflow engine that schedules, runes and manages jobs on hadoop
Zookeeper, provides co-ordination and operational services between distributed processes on a hadoop cluster of nodes
-
Real-time analytics
-
Real time data sources include social media, sensors, IOT devices, business transaction activity
-
-
-
-
-
Data access
MapReduce, Hive, Pig
Hive, DWH, ETL
SQL like language HiveQL
-
Pig, HL programming language. Used to analyse large data sets
-