Please enable JavaScript.
Coggle requires JavaScript to display documents.
AWS messaging - Coggle Diagram
AWS messaging
Kinesis
To
Collect, process, analyze
Data stream
Shard: to scale
Producer (Flow: App, client, SDK, agent)
-(record { partition key, dataBlob}
1 MB/s | 1000 msg / shard) ->
Stream
-(record {partKey, seqNo, dataBlob},
2 Mb/s shared | enhanced)->
App, Lambda
retain: 1d -> 365d
replay, immutable (data can't be deleted), same part -> same shard (ordering)
Producer: AWS SDK, Kenesis Producer Lib (KPL), Kinesis Agent
Consumer: Kinesis Client Library (KCL), AWS SDK
Capacity mode
Provisioned mode
choose nr shard
1 shard -> 1Mb/s in, 2Mb/s out
pay per shard per hour
On-demand mode
Default: 4 Mb/s in | 4000 record / s
scale auto based peek 30 days
Pay per stream / hour & data in/out / Gb
Security
HTTPS, KMS, client side
VPC Endpoint avail
Monitor API call use CloudTrail
Producer
PutRecord API | batch
hash function -> avoid hot partition
ProvisionedThroughputExceeded
use high distributed key
retry expo backoff
increase shard
Consumer
Shared
GetRecords - pull
2 Mb / shard; latency: ~200 ms; max 5 GetRecords / s, min $, < 10Mb (then throttle 5s) | < 10000 records
Enhanced
SubscribeToShard - push
2 Mb/s per consump / shard; ~70ms; higher $; HTTP/2; limit 5 consumer (KCL) / stream (soft limit)
AWS Lambda
Support both
config: batch size, batch window
KCL
each shard -> 1 KCL instance
Process is checked point -> DynamoDB
Operation
Shard Splitting
increase stream capacity
divide hot shard
no auto scaling
1 operation -> / 2 only
old remove when data expired
Merge shard
Use tos group 2 shard low traffic (cold shard)
decrease capacity & cost
data FIREhose
data stream -> data firehorse -> lambda | batch write (-> S3, opensearch, mongodb..., http endpoint)
fully managed, serverless
near realtime: 60s batch | 1 MB
vs Data Stream
no data storaege, no replay, autoscale
Data Analytics
(for SQL app)
Data Stream | Firehose -> Data Analytic (SQL) + S3 -> Stream | Firehose
fully managed, auto scaling
usecase: time-series, real-time metric, dashboard
Managed service for
Apache Flink
Data Stream | MSK -> Flink (java, scale, SQL)
cannot read from Firehose
Ordering data into Kensis
Partition key to split shard
(same key -> same shard)
vs SQS FIFO: Group ID
SNS
(PubSub)
limit: 12.5m sub / topic, 100k topic
many AWS service -> SNS -> SQS, lambda, HTTP endpoint ...
Publish
Topic Publish (SDK)
create topic
Create sub
Publish
Direct publish (mobile app SDK)
Create platform app, endpoint
GCM, APNS, amazon ADM
Security
HTTPS, KMS, client side
Access control: IAM
SNS access policy
Fan out
SNS FIFO
same limit SQS FIDO
Message filtering
JSON policy filter message
SQS
Producers -(SendMessageAPI)-> SQS (persisted) -(pool max 10 message - ReceiveMessages)-> Consumers -> DeleteMessage
Characteristic
fully managed
unlimited throughput & number message
default retain: 4 days, max: 14 days, min: 1min
low latency < 10ms
limit 256kb / msg
can have duplicate msg (at least once)
can have out of order msg
ASG
CloudWatch Metric: ApproximateNumberOfMessages
Security
Encryption
In-flight: HTTPS
At-rest: KMS key, SSE-SQS
Client encryption
Access control
IAM policy
SQS Access Policies
cross-account
allow other servicer SNS, S3
to write to queue
SQS - FIFO Queue
Limited: 300 msg/s w/o batch; 3000 msg / s w batch
exactly once delivery
name: must end with .fifo
Advance
Deduplication
(interval 5min)
MUST
Method
Content base: SHA-256 hash
Explicit Message Deduplication ID (MessageDeduplicationId)
use when multiple send same ID | content
Message grouping
MessageGroupID: 1 consumer, msg in order
Multiple consumer -> multiple message group id
1 consumer <-> 1 group id
Dead Letter Queue
(DLQ)
Fail -> Visibility TImeout -> over MaximumReceives -> DLQ
Standard -> DLQ: standard; FIFO -> DLQ: FIFO
Feature: Redrive DLQ -> queue
SQL Developer
SQS LongPolling
reduce nr API call
wait time: 1s -> 20s
enable at queue level:
ReceiveMessageWaitTimeSeconds
SQS Extended Client
To send large message
Producer -> SQS -> S3 -> SQS -> S3 (client) -> Consumer
Must know API
CreateQueue (MessageRetentionPeriod), DeleteQueue
PurgeQueue
SendMessage (DelaySeconds). ReceiveMesage, DeleteMessage
MaxNumberOfMessages: default 1, max 10 (for ReceiveMessage API)
ReceiveMessageWaitTimeSeconds: Long Polling (queue level)
ChangeMessageVisibility: change message timeout
Batch API to decrease cost
SQS - Delay Queue
time: 0s -> 15 min
DelaySeconds parameter
SQS Queue - Access policy
S3 write to queue
"ArnLike": {
"aws:SourceArn": "arn:aws:s3:
:
:bucket-name"
},
"StringEquals": {
"aws:SourceAccount": "bucket-owner-account-id"
}
Visibility Timeout
time message invisible
to other consumer
Need more time ChangeMessageVisibility API