Please enable JavaScript.
Coggle requires JavaScript to display documents.
9 Azure Synapse Analytics - Coggle Diagram
9 Azure Synapse Analytics
enterprise data warehouse of SQL
relational tables with columnar storage.
SQL Pools
CPU, memory, and IO are bundled into units of compute scale
the size of SQL pool is determined by
Data Warehousing Units (DWU)
changing the service level, you alter the number of Data Warehouse Units (DWUs)
why
Independently size compute power irrespective of the storage needs.
Grow or shrink compute power without moving data.
Pause compute capacity while leaving data intact, so you only pay for storage.
Resume compute capacity during operational hours.
Workload Groups
define the resources to isolate and reserve resources for its use
Reserve resources for a group of requests
Limit the amount of resources a group of requests can consume
Shared resources accessed based on importance level
Set Query timeout value. Get DBAs out of the business of killing runaway queries
Classification
define the level of importance of the request
Importance
defined in the CREATE WORKLOAD CLASSIFIER command and allows higher priority queries to receive resources ahead of lower priority queries
Result-set Cache
same results are requested on a regular basis
improve performances
Materialized Views
pre-compute, store, and maintain data like a table
automatic update
solutions
Modern Data Warehouse workloads
centralized
data store that provides analytics and decision support services across the whole enterprise using structured, unstructured, or streaming data sources
historical and trend analysis reporting.
reduce stress on production systems
Advanced Analytical Workloads
predictive or preemptive analytics using
Azure Synapse Analytics
Once in a big data store, Hadoop, Spark, and machine learning algorithms prepare and train the data. When the data is ready for complex analysis, SQL Analytics uses PolyBase to query the big data stores.
massively parallel processing concepts
separates the computation from the underlying storage
node-based architecture
The
Control node
runs the
Massively Parallel Processing (MPP)
engine which optimizes queries for parallel processing
the
Data Movement Service (DMS)
is a system-level internal service that moves data across the nodes as necessary to run queries in parallel and return accurate results.
Control node
brain of the data warehouse.
Compute nodes
provide the computational power
Table geometries
Azure Storage
to store data.
--> charges separately for the storage consumption.
sharding patterns:
Hash
: large table
Round Robin
: temporaey table
Replicate
: small table
Start with Round Robin, but aspire to a Hash
Hash-distributed tables
highest query performance
hash function to assign each row to one distribution deterministically
Round-robin distributed tables
distributes data evenly across the nodes
Replicated Tables
fastest query performance for small tables.
caches a full copy on each compute node