Please enable JavaScript.
Coggle requires JavaScript to display documents.
Big Data / ETL and Business Intelligence/Data Visualization ((5) Data…
Big Data / ETL and
Business Intelligence/Data Visualization
(2) ETL Tools
Amazon Kinesis (
https://aws.amazon.com/kinesis/
)
Apache NiFi (
https://nifi.apache.org/minifi/
) - powerful, scalable directed graphs of data routing and transformation
Elastic Kibana (
https://www.elastic.co/products/kibana
)
Apache Sqoop - efficiently transfer bulk data between Hadoop and relational databases
Apache Kafka - real-time data pipelines and streaming apps
Apache Flink - distributed, high performance data streaming applications
Spark Streaming - scalable, fault-tolerant streaming applications
Oracle Warehouse Builder (OWB)
Top 14 (Legacy) ETL Tools
Oracle Warehouse Builder
SAP Data Services
IBM Infosphere Information Server
SAS Data Management
PowerCenter Informatica
Elixir Repertoire for Data ETL
Data Migrator (IBI)
SQL Server Integration Services (SSIS)
Talend Studio for Data Integration
Sagent Data Flow
Pervasive Data Integrator
Open Text Integration Center
Oracle Data Integrator
Cognos Data Manager
ETL Modern Tools List
(1) Incumbent/Legacy Batch ETL Tools
IBM InfoSphere DataStage
Informatica PowerCenter
Microsoft SSIS
Oracle Data Integrator Enterprise Edition
(2) Cloud Native ETL Tools
Alooma
Fivetran
Matillion
Snaplogic
Stitch Data
(3) Open Source ETL Tools
Apache Airflow
Apache Kafka
Apache NiFi
Talend Open Studio
(4) Real-Time ETL Tools
Alooma
Confluent
StreamSets
Striim
(5) Data Visualization/Business Intelligence
Yellowfin
D3.js
Tableau
QlikSense
Plot.ly
Graphviz.org
Coggle.it
Rawgraphs.io
Microsoft Visio
draw.io
Matplotlib
(3) Data Storage
Vocabulary
Database - a structured set of data held in a computer, especially one that is accessible in various ways.
RDBMS
Microsoft SQL Server (Port 1433)
SQLite
MySQL (Port 3306)
Microsoft Access
Microsoft Excel
PostreSQL
MariaDB (MySQL fork)
Oracle DB Server
Amazon Aurora (Port 3306)
Non-Relational Database (NoSQL)
Amazon DynamoDB
Apache Cassandra
Apache HBase
Oracle NoSQL
MongoDB
Neo4j
Microsoft HDInsight
Microsoft DocumentDB
Microsoft Azure (Key/Value Pairing)
Data Warehouse - a large store of data accumulated from a wide range of sources within a company and used to guide management decisions. Usually an OLAP data warehouse
DataLake - A data lake is a storage repository that holds a vast amount of raw data in its native format until it is needed. While a hierarchical data warehouse stores data in files or folders, a data lake uses a flat architecture to store data.
Data Mart - a subset of a data warehouse
File System
Microsoft FAT32 (File Allocation Table)
Microsoft NTFS (New Technology File System)
Microsoft exFAT (Extended File Allocation Table)
Microsoft ReFS (Resilient File System)
Google Big Table
Apache Hadoop File System (HDFS)
Linux Ext2 (Second Extended File System)
Linux Ext3 (Third Extended File System)
Linux Ext4 (Fourth Extended File System)
Linux Btrfs (B Tree File System)
Amazon S3 (Simple Storage Service)
Amazon EBS (Elastic Block Storage)
Amazon EFS (Elastic File System)
(1) Application Orchestration
Apache Oozie - Workflow scheduling system
Amazon SWF - Simple Workflow Services
Microsoft SSIS (SQL Server Integration Services)
(4) Analytics Engines
Apache Storm - Distributed realtime computation system
Apache Spark - Analytics engine for big data processing
Splunk - software for searching, monitoring, and analyzing machine-generated big data, via a Web-style interface.
ElasticSearch - real-time search for machine-generated big data
(6) Artificial Intelligence
TensorFlow - TensorFlow is an open-source software library for dataflow programming across a range of tasks.
Google Machine Learning Crash Course