Please enable JavaScript.
Coggle requires JavaScript to display documents.
Hadoop open-source Java framework for distributed applications and data…
Hadoop open-source Java framework for distributed applications and data-intensive management
History:
founded in 2008
2018 merged with hortonworks
2021 bought by PE
Advantage
handle vast amounts of data
processing software move to the data
Disadvantage
Ecosystem
abstraction languages
SQL on Hadoop
Hive
Pig
Computational models
MapReduce
Tez
Real-time processing tools
Storm
Spark Streaming
Databases
HBase
Cassandra
Streaming ingestion tools
Kafka
Flume
Data integration tools
Sqoop
Talend
Workflow coordination tools
Oozie
Control M
Distributed service coordination tools
Zookeeper
Cluster administration tools
Ranger
Sentry
UI tools
Hue
Jupyter
Content indexing tool
ElasticSearch
Splunk
Distributed file systems
HDFS
Resource managers
YARN
MESOS
Features
public cloud
private cloud
Cloudera Data Platform
Self-service (core)
Multi-cloud
Security
Cloudera Distribution of Hadoop CDH
Impala
interactive SQL query engine
HDFS, HBase, S3
Search
Solr
full-text searches
Hortonworks Data Platform HDP
public cloud
Machine learning
Public cloud services
data engineering
management tool
pipeline monitoring
based on spark
visual debugging
streamline ETL processes
airflow
data hub
machine learning
data marts
db
machine learning
data visualization
operational database
model training
data replication
model
data warehouse
model management
PaaS