Please enable JavaScript.
Coggle requires JavaScript to display documents.
DP-900 - Coggle Diagram
DP-900
Explore Core
Data Concepts
Concepts of relational data
Concepts of non-relational data
Roles and Responsabilities
Database Administrators
Manage data bases
Assiging permisions to users
Storing backups copies of data
Restore data in case of any failures
Data Engineers
applying data cleaning routines
identifying business rules
turning data into useful information
Data Analysts
create visualizations and charts to enable organizations to make informed decisions.
Azure Database Administrator role
responsible for the design, implementation, maintenance, and operational aspects of on-premises and cloud-based database solutions built on Azure data services and SQL Server.
availability and consistent performance and optimizations of the database solutions
availability and consistent performance and optimizations of the database solutions
managing the security of the data in the database, granting privileges over the data, granting or denying access to users as appropriate.
Azure Data Engineer role
design and implement data-related assets that include data ingestion pipelines
cleansing and transformation activities and data stores for analytical workloads
They use a wide range of data platform technologies, including relational and nonrelational databases, file stores, and data streams.
ensuring that the privacy of data is maintained within the cloud and spanning from on-premises to the cloud data stores
management and monitoring of data stores and data pipelines to ensure that data loads perform as expected.
Azure Data Analyst role
enables businesses to maximize the value of their data assets
responsible for designing and building scalable models, cleaning and transforming data
enabling advanced analytics capabilities through reports and visualizations.
processes raw data into relevant insights based on identified business requirements to deliver relevant insights.
Database Administrator
tasks and responsibilities
Installing and upgrading the database server and application tools.
Allocating system storage and planning storage requirements for the database system.
Modifying the database structure, as necessary, from information given by application developers.
Enrolling users and maintaining system security.
Ensuring compliance with database vendor license agreement.
Controlling and monitoring user access to the database.
Monitoring and optimizing the performance of the database.
Planning for backup and recovery of database information.
Maintaining archived data.
Backing up and restoring databases.
Contacting database vendor for technical support.
Generating various reports by querying from database as per need.
Managing and monitoring data replication.
database administrator tools
Azure Data Studio
graphical user interface for managing many different database systems. It currently provides connections to on-premises SQL Server databases, Azure SQL Database, PostgreSQL, Azure SQL Data Warehouse, and SQL Server Big Data Clusters, amongst others
SQL Server Management Studio
enabling you to query data, perform general database administration tasks, and generate scripts for automating database maintenance and support operations
ability to generate Transact-SQL scripts for almost all of the functionality that SQL Server Management Studio provides. This gives the DBA the ability to schedule and automate many common tasks.
Azure portal to manage Azure SQL Database
increasing the database size, creating a new database, and deleting an existing database are done using the Azure portal.
manage and adjust resources such as the data storage size and the number of cores available for the database processing
Data Engineer tasks and responsibilities
Developing, constructing, testing, and maintaining databases and data structures.
Aligning the data architecture with business requirements.
Data acquisition.
Developing processes for creating and retrieving information from data sets.
Using programming languages and tools to examine the data.
Identifying ways to improve data reliability, efficiency, and quality.
Conducting research for industry and business questions.
Deploying sophisticated analytics programs, machine learning, and statistical methods.
Preparing data for predictive and prescriptive modeling.
Using data to discover tasks that can be automated.
data engineering tools
fluent in SQL
create databases, tables, indexes, views, and the other objects required by the database
interact with a database from the command line
sqlcmd utility to connect to Microsoft SQL Server and Azure SQL Database
primary data manipulation tool might be Transact-SQL
Azure Databricks, and Azure HDInsight to generate and test predictive models.
non-relational field, you might use Azure Cosmos DB as your primary data store
use languages such as HiveQL, R, or Python
Concepts of data analytics
Core Data Concepts
What'sdata?
structured
tabular data that is represented by rows and columns
are called relational databases
Each row in a table has the same set of columns.
SQL Server, Azure SQL Database
semi-structured
doesn't reside in a relational database
has some structure to it
include documents held in JavaScript Object Notation (JSON) format
key-value stores
stores Associative arrays. In those arrays, a Key serves as a unique identifier to retrieve a specific value. Those values can be anything from a number or a string to a complex object, like a JSON file.
stores data as a single collection without structure or relation.
graph databases
stores data as a single collection without structure or relation.
contains nodes (information about objects), and edges (information about the relationships between objects).
Cosmos DB
collection of facts such as numbers, descriptions, and observations used in decision making
unstructured
contains nodes (information about objects), and edges (information about the relationships between objects).
Azure Blob storage
Provisioned Services
Access Levels
Read/write
Owner
Read-Only
If the data is sensitive (or secret), you may want to restrict access to a few select users.
Data procesiong Solutions
analytical systems
capturing raw data, and using it to generate insights, capturing raw data, and using it to generate insights
a manufacturing company might indicate trends enabling them to determine which product lines to focus on, for profitability
designed to support business users who need to query data and gain a big picture view of the information held in a database
Data Ingestion
is the process of capturing the raw data. This data could be taken from control devices measuring environmental information such as temperature and pressure, point-of-sale devices recording the items purchased by a customer in a supermarket, financial data recording the movement of money between bank accounts, and weather data from weather stations.
Data Transformation / Data Processing
The raw data might not be in a format that is suitable for querying. The data might contain anomalies that should be filtered out, or it may require transforming in some way
Data Querying
You may be looking for trends, or attempting to determine the cause of problems in your systems. Many database management systems provide tools to enable you to perform ad-hoc queries against your data and generate regular reports.
Data Visualization
You can generate charts such as bar charts, line charts, plot results on geographical maps, pie charts, or illustrate how data changes over time. Microsoft offers visualization tools like Power BI to provide rich graphical representation of your data.
transaction processing systems
a transaction as a small, discrete, unit of work.
high-volume, sometimes handling many millions of transactions in a single day
Splitting tables out into separate groups of columns like this is called normalization. it can make querying more complex.
types of data and data storage
non-relational
store data in a format that more closely matches the original structure
If two customers cohabit and have the same address, in a relational database you would only need to store the address information once
If two customers cohabit and have the same address, in a relational database you would only need to store the address information once
the address would be duplicated in the documents for Jay and Francis Adams. This duplication not only increases the storage required, but can also make maintenance more complex (if the address changes, you must modify it in two documents).
Relational
The simple structure of tables and columns makes them easy to use initially, but the rigid structure can cause some problems.
normalization
data is split into a large number of narrow, well-defined tables (a narrow table is a table with few columns), with references from one table to another
transactional workloads
Relational Database
A primary use of relational databases is to handle transaction processing.
A transaction is a sequence of operations that are atomic
all operations in the sequence must be completed successfully, or if something goes wrong, all operations run so far in the sequence must be undone.
A transactional database must adhere to
the ACID (Atomicity, Consistency, Isolation, Durability)
Atomicity guarantees that each transaction is treated as a single unit
Consistency ensures that a transaction can only take the data in the database from one valid state to another.
Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially
Durability guarantees that once a transaction has been committed, it will remain committed even if there's a system failure such as a power outage or crash.
Many systems implement relational consistency and isolation by applying locks to data when it is updated
Extensive locking can lead to poor performance, while applications wait for locks to be released.
many distributed database management systems relax the strict isolation requirements of transactions and implement "eventual consistency."
analytical workloads
Analytical workloads are typically read-only systems that store vast volumes of historical data or business metrics, such as sales performance and inventory levels
are used for data analysis and decision making
Analytics are generated by aggregating the facts presented by the raw data into summaries, trends, and other kinds of “Business information.”
can be based on a snapshot of the data at a given point in time, or a series of snapshots. They want the bigger picture.
a monthly sales report to identify trends and to make decisions (analytical information).
batch data
Buffering and processing the data in groups is called batch processing.
process data based on a scheduled time interval (for example, every hour), or it could be triggered when a certain amount of data has arrived, or as the result of some other event
Advantages
Large volumes of data can be processed at a convenient time
It can be scheduled to run at a time when computers or systems might otherwise be idle, such as overnight, or during off-peak hours.
Disadvantages
The time delay between ingesting the data and getting the results
All of a batch job's input data must be ready before a batch can be processed
Streaming Data
Processing data as it arrives is called streaming
each new piece of data is processed when it arrives
Examples
A financial institution tracks changes in the stock market in real time
An online gaming company collects real-time data about player-game interactions, and feeds the data into its gaming platform
A real-estate website that tracks a subset of data from consumers’ mobile devices, and makes real-time property recommendations of properties to visit based on their geo-location.
ideal for time-critical operations that require an instant real-time response.
For example, a system that monitors a building for smoke and heat needs to trigger alarms and unlock doors to allow residents to escape immediately in the event of a fire.
differences between batch and streaming
Data Size
Batch processing is suitable for handling large datasets efficiently
Stream processing is intended for individual records or micro batches consisting of few records.
Performance
The latency for batch processing is typically a few hours
Stream processing typically occurs immediately, with latency in the order of seconds or milliseconds
Data Scope
Batch processing can process all the data in the dataset.
Stream processing typically only has access to the most recent data received, or within a rolling time window (the last 30 seconds, for example).
Analysis
batch processing for performing complex analytics
Stream processing is used for simple response functions, aggregates, or calculations such as rolling averages.
Latency is the time taken for the data to be received and processed.
Documentation
https://docs.microsoft.com/en-us/office/troubleshoot/access/database-normalization-description
https://docs.microsoft.com/en-us/azure/azure-sql/database/sql-database-paas-overview
https://docs.microsoft.com/en-us/azure/storage/blobs/storage-blobs-introduction
https://docs.microsoft.com/en-us/azure/cosmos-db/introduction
Explore non-realtional Data in Azure
Explore relational data in Azure
Explore modern data warehose
analytics in Azure