Data Engineering Road Map

Suggested Pre-requsites

👥 Data Engineer Roadmap

Cluster Computing Fundamental

HDFS

MapReduce

Lambda & Kappa Architecures

Managed Hadoop

Cloud - based

Amazon EMR (Elastic Map Reduce)

Google Dataproc

Azure Data Lake

Data Processing

Batch

Apache Pig

Data Build Tool

Hybrid

Apache Spark

Apache Beam

Streaming

Apache Kafka

Amazon Kinesis

Apache Strom

IaC - Infrastructure as Code

Containers

Dockers

LXC

Container Orchestration

Kubernetes

GKE

Apache Mesos

CI/CD

Github Actions

Jenkis

Database

SQL Database

Document

MongoDB

ElasticSearch

Azure CosmosDB

Key - Value

Redis

DynamoDB

Wide Column

Apache Cassandra

Google Bigtable

Graph

Neo4j

No-SQL Database

MySQL

PostGre SQL

MariaDB

Amazon Aurora

Database Fundamentals

SQL

Normalization

CAP Theorem

OLTP vs OLAP

Horizontal vs Vertical scaling

Data Warehouse

Snowflake

Amazon Redshift

Google BigQuery

Azure Synapse

Cloud Provider Certification

Azure

Azure Database Administrator Associate

GCP

Professional Data Engineer

AWS

AWS Certified Big Data - Specialty

AWS Certified Solutions Architect - Associate

IBM

IBM Certified Data Architect - Big Data

Cloud Fundamentals

Basic Terminal Usage

APIs / RestAPIs

Overview

Data Structures & Algorithms

🐍 Python

☕ Java

GO

Scala

R Programming

Git - Version Control

Shell Scripting

Apache Hadoop

Networking Basics

Operating System baics/ Virtual Machines

Linux CLI Basic

Programming Lang

...

Python

Easy to learn

Applicable to many domans

Java

R Programming