Architecture:
Ingestion layer ia datele de la sursa lor si le face disponibile pentru stocare/procesare: SQOOP(duce data din babe de date relational in HDFS), FLUME(very old), Streamsets, Nifi(claudera) Kinesys(only in claud de la amazon)
Storage Layer: File Storage, distribute data batch sistem(noSQL)
Processing/Computing Layer: MapReduce, Spark, Tez
Analysis Layer: put sql peste date ca sa nu scrim java e.g : Hive(are nevoie de map reduce or spark), Spark Sql, Impala, Drill
apache Presto, superSet( use paste Druif database)
Visualization Layer Zoomdata(trebuie platit), Tableau, Zeppelin, Qlink,
Other LayerResource manager, Security - User authorisation and personal authentication(apache Atlas si apache Ranger cele man cunoscute)
Schedule Layer e.g airflow, similar cu uzi(?)
Unele SQL Engine is fan singure procesarea, altele se bazeaza pe solutii externe(e.g Apache Hive)