Please enable JavaScript.
Coggle requires JavaScript to display documents.
Kinesis Data Stream terminology - Coggle Diagram
Kinesis Data Stream terminology
Data Record
unit of data stored
composed by
sequence number
unique identifier for each record
assigend when
PutRecord(s)
for the same partition key generally increase over time
partition key
used to
segregate
route records to different
shards
specified by data producer
data blob
immutable sequence of bytes
max size 1MB
Data Stream
group of data records
distributed in
data shards
Shard
sequence of data records
base throughput unit
pricing is per shard basis
supports for
reads
5 transactions per second
up to a maximum total data read rate of 2MB per second
shared by all the consumers reading from a given shard
write
1000 records per second
maximum of 1MB per second
fixed unit of capacity
if exceeded
ProvisionedThroughputExceeded exception
retry on the data producer side
scaling the number of shared
Retention Period
data stored for
min
24 hours
max
7 days
Consumers
applications built to use data produced by KDS
built using
API
Amazon Kinesis Client Library
Amazon Kinesis Client Library (KCL)
pre-built library with multiple language support
delivers all records for a given partition key to same record processor
handles complex issues
changes in stream volume
load-balancing streaming data
coordinating distributed services
processing dataa with fault-tolerance
uses a unique DynamoDB table to keep track of the application’s state
Amazon Kinesis Connector Library
pre-built library
integrate Amazon Kinesis Streams with other AWS services and third-party tools
can be replaced by lambda
Producers
who
puts data into KDS
specified
name of streams
partition key
which shard in the stream the data record is added to.
data blob
added via
API/SDK
synchronous operations
PuRecord(s)
HTTP request
Kinesis Producer Library
highly configurable library
provides
layer of abstraction for ingestion data
asynchronous
interface
achieve high producer throughput with minimal client resources
batches messages
aggregates records to increase payload size
improve throughput
integrates with Kinesis Client Library
Writes to one or more Kinesis data streams
with an automatic and configurable retry mechanism
submits
CloudWatch metrics
Kinesis Agent
pre-built java application
installed on
Linux-based server environments
web servers
log servers
datavase servers
configure to
monitor certain files on the disk