Please enable JavaScript.
Coggle requires JavaScript to display documents.
DP-203 - Chapter 10 - Designing and Developing a Stream Processing…
DP-203 - Chapter 10 - Designing and Developing a Stream Processing Solution
Architecture
Event ingestion service
Event hub
IoT hub
Kafka
Stream processing system
Azure Stream Analytics (ASA)
Spark Streaming
output is stream of mini batches
Spark Storm
Apache Flink
reads data from
delivers data to
Analytical data store
Synapse dedicated SQL pool
CosmosDB
HBase
Reporting system
Event hubs
Event generators
HTTP(S)
Advanced Message Queuing Protocol (AMPQ)
data distribution
Partitions
views
Consumer groups
contain
Event receivers
Spark Structured Streaming
modes
complete
append
update
formats
delta
ACID transactions
Time series data
types of timestamps
Event time
Processing time
windowed aggregates
tumbling
non-overlapping windows
hopping
fixed size ovelap
sliding
fixed size
moves forward on event
snapshot
holds all events at a single point in time
Session
max size with timeout
Checkpointing
ASA
internal checkpointing
Event hubs
expensive operation
best after a batch of event processing
establish restart point
Spark
end-to-end exactly-once
replaying from point in time
Transformations using streaming analytics
COUNT and DISTINCT
CAST
LIKE
Handling schema drift
Event hubs
Azure schema registry
register schema
retrieve schema
Spark
Schema evolution
DF.writeStream.option("mergeSchema", "true")
Processing data across partitions
Scaling resources
Event hubs
partitioning
auto-inflate
automatically increases # throughput units
1 throughput unit =
max ingres = 1MB/sec or 1000 events per second
max egress = 2MB/sec or 4096 events/sec
Databricks Spark Streaming
ASA
increase # streaming units
(trial-and-error)
No internal scaling
use Azure Automation
Handling interruptions
Availability Zones
ASA
Event hubs
choose a region that supports availability zones upon creation of the resource
implement back off and retry logic
use Event Hubs SDKs
What?
physically isolated locations
provide resilience to local outages
paired regions
coordinated maintenance across locations
Designing and configuring
exception handling
EventHubsException class
properties
Reason
Is Transient
Upserting data
using ASA
ASA supports UPSERT with CosmosDB
compatibility
level
1.2
support for AMQP
try INSERT
if failed
(ID conflict)
UPDATE
1.0
1.1
partial update as PATCH operation
insert or update at property level within document