Please enable JavaScript.
Coggle requires JavaScript to display documents.
AWS Machine learning Speciality - Coggle Diagram
AWS Machine learning Speciality
Data Engineering (20%)
Storage
AWS S3
Overview
S3 Allows to store objects (files) in buckets (directories)
Buckets mush have globally unique name
Objects have a key. the key is full path, Max Object size is 5TB
Partitioning by Date / Product ID
Object Tags (Key /Value pair - up to 10 ) -useful for security /lifecycle
Object Storage supports any file formats for ML : Eg: CSV, JSON, Parquet, ORC, Avro, Protobuf
Data Partitioning
Pattern for speeding up range queries Eg: AWS Athena
Data Partition will be handled by tools like AWS Glue
S3 Storage Tiers
Standard - General Purpose (used for frequently accessed data )
S3 Standard - Infrequent Access (IA) (Availability 3 zones)
S3 One Zone - Infrequent Access (Availability 1 zone)
S3 Intelligent Tiering
Amazon Glacier (Archives)
S3 Lifecycle Rules
Transition Actions
Expiration Action
S3 Encryption for Objects
SSE-S3: keys handled & managed by AWS
SSE-KMS: use AWS Key management service
SSE-C: own encryption keys
Client Side Encryption
S3 Security
User Based
IAM Policies
Resource Based
Bucket Policies
Object Access Control List (ACL) - finer grain
Bucket Access Control List - Less Common
Others
Networking - VPC Endpoint Gateway
Logging & Audit
Tagged Based (Combined with IAM Policies & Bucket policies)
Data Transformation
Streaming
AWS Kinesis
Kinesis Data Streams: Low Latency streaming ingest at scale
Kinesis Analytics : Perform real-time analytics on streams using SQL
Kinesis Firehose: Load streams into S3, Redshift, Elastic Search & Splunk
Kinesis Video Streams: Meant for streaming video in real-time
Workflows
Exploratory Data Analysis ( 24%)
Data Science
Analysis Tools
Python
Feature Engineering
Modeling (36%)
Deep Learning
SageMaker
High-Level AI Services
Evaluating & Tuning
Machine Learning Implementation and Operations (20%)
Sagemaker Operations