Please enable JavaScript.
Coggle requires JavaScript to display documents.
ML Certification - Coggle Diagram
ML Certification
Kinesis
= streaming/realtime
replicated synchronously to 3 AZs
Kinesis Data Streams
low latency streaming ingest at scale
realtime
for building realtime applications
automatic scaling w/ on-demand
replay capability due to storage
Shards (partitions)
Data retention 24h default
up to 365d
immutable data
records up to 1MB in size
capacity modes
provisioned
choose # shards
shard -> 1MB/in
shard -> 2MB out
pay per shard per hour
on-demand
default capacity 4MB in
scales automatically based
on throughput peak in last 30d
Scale by adding shards
1MB/write per shard
2MB read per shard
5 API calls /s per shard
Kinesis Analytics
realtime analytics on streams using SQL
streaming ETL
continuous metric generation
responsive analytics
Data Stream or Firehose
use SQL on the stream
or Flink
Machine Learning
RANDOM_CUT_FOREST
SQL function used for anomaly detection on numbers columns in a stream
uses recent data & changes
HOTSPOTS
locate & return info about dense regions in your data
not changing
Managed Service for Apache Flink
underlying tech for Kinesis Analytics
connect to reference table
output stream
error stream
Kinesis Data Firehose
load streams into S3, Redshift, ElasticSearch & Splunk
ingestion service
for putting data into AWS
near realtime
batch writes into target DB
near realtime
Amazon S3
3rd party partner destinations
including Splunk
ElasticSearch
Redshift (done via copy through S3)
failed data into S3 backup
fully managed
automatic scaling
data conversions
data transformations via Lambda
supports compression into S3
GZIP, ZIP, SNAPPY
Kinesis Video Streams
streaming video in realtime
for ML - detecting a burglar
Producers
one producer per video stream
video playback
Consumers
Storage = S3
Object store - any data format
Max object size is 5TB
Data partitioning speeds up range queries
organise data so can find quickly
Storage classes
All 11x9 Durability
S3 Standard
99.99% availability
S3 Infrequent Access
lower cost pay on retrieval
99.9% availability
Glacier
pay storage + retrieval
Instant Retrieval
ms retrieval
min storage 90d
Flexible Retrieval
Expedited 1-5 mins
Standard 3-5h
Bulk 5-12h (free retrieval)
min storage 90d
Deep Archive
Standard 12h
Bulk 48h
Min storage 180d
S3 One Zone Infrequent Access
same durability
but lost when AZ destroyed
99.5% availability
You can use lifecycle rules to move objects between classes
transition
expiration/deletion
S3 Intelligent Tiering
also use Amazon S3 Analytics for storage class analysis
Security
user based (IAM)
resource-based
bucket access control list (ACL)
object access control list
bucket policies
JSON resources, effect, actions, principal
object access control list
encrypt objects at rest using encryption keys
SSE
Amazon S3-managed keys
enabled by default for new buckets & new objects
KMS keys stored in AWS KMS
user control & audit key usage
specify header for kms
limited by throughput
throttled by capacity limits of keys
Customer provided keys
https needed
encryption key in every header
client side encryption
encrypted before sending to S3
keys and encryption managed by client
encryption in transit (SSL/TLS)
can be forced by bucket policy
To access in private VPC use VPC endpoint gateway
allows custom access controls