Please enable JavaScript.

Coggle requires JavaScript to display documents.

DP-900 - Coggle Diagram

- - - - Manage data bases
      - Assiging permisions to users
      - Storing backups copies of data
      - Restore data in case of any failures
    - - applying data cleaning routines
      - identifying business rules
      - turning data into useful information
    - - create visualizations and charts to enable organizations to make informed decisions.
    - - responsible for the design, implementation, maintenance, and operational aspects of on-premises and cloud-based database solutions built on Azure data services and SQL Server.
      - availability and consistent performance and optimizations of the database solutions
      - availability and consistent performance and optimizations of the database solutions
      - managing the security of the data in the database, granting privileges over the data, granting or denying access to users as appropriate.
    - - design and implement data-related assets that include data ingestion pipelines
      - cleansing and transformation activities and data stores for analytical workloads
      - They use a wide range of data platform technologies, including relational and nonrelational databases, file stores, and data streams.
      - ensuring that the privacy of data is maintained within the cloud and spanning from on-premises to the cloud data stores
      - management and monitoring of data stores and data pipelines to ensure that data loads perform as expected.
    - - enables businesses to maximize the value of their data assets
      - responsible for designing and building scalable models, cleaning and transforming data
      - enabling advanced analytics capabilities through reports and visualizations.
      - processes raw data into relevant insights based on identified business requirements to deliver relevant insights.
    - - Installing and upgrading the database server and application tools.
        
        Allocating system storage and planning storage requirements for the database system.
        
        Modifying the database structure, as necessary, from information given by application developers.
        
        Enrolling users and maintaining system security.
        
        Ensuring compliance with database vendor license agreement.
        
        Controlling and monitoring user access to the database.
        
        Monitoring and optimizing the performance of the database.
        
        Planning for backup and recovery of database information.
        
        Maintaining archived data.
        
        Backing up and restoring databases.
        
        Contacting database vendor for technical support.
        
        Generating various reports by querying from database as per need.
        
        Managing and monitoring data replication.
    - - Azure Data Studio
        
        graphical user interface for managing many different database systems. It currently provides connections to on-premises SQL Server databases, Azure SQL Database, PostgreSQL, Azure SQL Data Warehouse, and SQL Server Big Data Clusters, amongst others
      - SQL Server Management Studio
        
        enabling you to query data, perform general database administration tasks, and generate scripts for automating database maintenance and support operations
        
        ability to generate Transact-SQL scripts for almost all of the functionality that SQL Server Management Studio provides. This gives the DBA the ability to schedule and automate many common tasks.
      - Azure portal to manage Azure SQL Database
        
        increasing the database size, creating a new database, and deleting an existing database are done using the Azure portal.
        
        manage and adjust resources such as the data storage size and the number of cores available for the database processing
    - - Developing, constructing, testing, and maintaining databases and data structures.
        
        Aligning the data architecture with business requirements.
        
        Data acquisition.
        
        Developing processes for creating and retrieving information from data sets.
        
        Using programming languages and tools to examine the data.
        
        Identifying ways to improve data reliability, efficiency, and quality.
        
        Conducting research for industry and business questions.
        
        Deploying sophisticated analytics programs, machine learning, and statistical methods.
        
        Preparing data for predictive and prescriptive modeling.
        
        Using data to discover tasks that can be automated.
    - - fluent in SQL
      - create databases, tables, indexes, views, and the other objects required by the database
      - interact with a database from the command line
      - sqlcmd utility to connect to Microsoft SQL Server and Azure SQL Database
      - primary data manipulation tool might be Transact-SQL
      - Azure Databricks, and Azure HDInsight to generate and test predictive models.
      - non-relational field, you might use Azure Cosmos DB as your primary data store
      - use languages such as HiveQL, R, or Python
  - - - structured
        
        tabular data that is represented by rows and columns
        
        are called relational databases
        
        Each row in a table has the same set of columns.
        
        SQL Server, Azure SQL Database
      - semi-structured
        
        doesn't reside in a relational database
        
        has some structure to it
        
        include documents held in JavaScript Object Notation (JSON) format
        
        key-value stores
        
        stores Associative arrays. In those arrays, a Key serves as a unique identifier to retrieve a specific value. Those values can be anything from a number or a string to a complex object, like a JSON file.
        
        stores data as a single collection without structure or relation.
        
        graph databases
        
        stores data as a single collection without structure or relation.
        
        contains nodes (information about objects), and edges (information about the relationships between objects).
        
        Cosmos DB
      - collection of facts such as numbers, descriptions, and observations used in decision making
      - unstructured
        
        contains nodes (information about objects), and edges (information about the relationships between objects).
        
        Azure Blob storage
      - Provisioned Services
        
        Access Levels
        
        Read/write
        
        Owner
        
        Read-Only
        
        If the data is sensitive (or secret), you may want to restrict access to a few select users.
    - - analytical systems
        
        capturing raw data, and using it to generate insights, capturing raw data, and using it to generate insights
        
        a manufacturing company might indicate trends enabling them to determine which product lines to focus on, for profitability
        
        designed to support business users who need to query data and gain a big picture view of the information held in a database
        
        Data Ingestion
        
        is the process of capturing the raw data. This data could be taken from control devices measuring environmental information such as temperature and pressure, point-of-sale devices recording the items purchased by a customer in a supermarket, financial data recording the movement of money between bank accounts, and weather data from weather stations.
        
        Data Transformation / Data Processing
        
        The raw data might not be in a format that is suitable for querying. The data might contain anomalies that should be filtered out, or it may require transforming in some way
        
        Data Querying
        
        You may be looking for trends, or attempting to determine the cause of problems in your systems. Many database management systems provide tools to enable you to perform ad-hoc queries against your data and generate regular reports.
        
        Data Visualization
        
        You can generate charts such as bar charts, line charts, plot results on geographical maps, pie charts, or illustrate how data changes over time. Microsoft offers visualization tools like Power BI to provide rich graphical representation of your data.
      - transaction processing systems
        
        a transaction as a small, discrete, unit of work.
        
        high-volume, sometimes handling many millions of transactions in a single day
        
        Splitting tables out into separate groups of columns like this is called normalization. it can make querying more complex.
    - - non-relational
        
        store data in a format that more closely matches the original structure
        
        If two customers cohabit and have the same address, in a relational database you would only need to store the address information once
        
        If two customers cohabit and have the same address, in a relational database you would only need to store the address information once
        
        the address would be duplicated in the documents for Jay and Francis Adams. This duplication not only increases the storage required, but can also make maintenance more complex (if the address changes, you must modify it in two documents).
      - Relational
        
        The simple structure of tables and columns makes them easy to use initially, but the rigid structure can cause some problems.
        
        normalization
        
        data is split into a large number of narrow, well-defined tables (a narrow table is a table with few columns), with references from one table to another
      - transactional workloads
        
        Relational Database
        
        A primary use of relational databases is to handle transaction processing.
        
        A transaction is a sequence of operations that are atomic
        
        all operations in the sequence must be completed successfully, or if something goes wrong, all operations run so far in the sequence must be undone.
        
        A transactional database must adhere to
        the ACID (Atomicity, Consistency, Isolation, Durability)
        
        Atomicity guarantees that each transaction is treated as a single unit
        
        Consistency ensures that a transaction can only take the data in the database from one valid state to another.
        
        Isolation ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially
        
        Durability guarantees that once a transaction has been committed, it will remain committed even if there's a system failure such as a power outage or crash.
        
        Many systems implement relational consistency and isolation by applying locks to data when it is updated
        
        Extensive locking can lead to poor performance, while applications wait for locks to be released.
        
        many distributed database management systems relax the strict isolation requirements of transactions and implement "eventual consistency."
        
        analytical workloads
        
        Analytical workloads are typically read-only systems that store vast volumes of historical data or business metrics, such as sales performance and inventory levels
        
        are used for data analysis and decision making
        
        Analytics are generated by aggregating the facts presented by the raw data into summaries, trends, and other kinds of “Business information.”
        
        can be based on a snapshot of the data at a given point in time, or a series of snapshots. They want the bigger picture.
        
        a monthly sales report to identify trends and to make decisions (analytical information).
    - - Buffering and processing the data in groups is called batch processing.
        process data based on a scheduled time interval (for example, every hour), or it could be triggered when a certain amount of data has arrived, or as the result of some other event
      - Advantages
        
        Large volumes of data can be processed at a convenient time
        
        It can be scheduled to run at a time when computers or systems might otherwise be idle, such as overnight, or during off-peak hours.
      - Disadvantages
        
        The time delay between ingesting the data and getting the results
        
        All of a batch job's input data must be ready before a batch can be processed
    - - Processing data as it arrives is called streaming
      - each new piece of data is processed when it arrives
      - Examples
        
        A financial institution tracks changes in the stock market in real time
        
        An online gaming company collects real-time data about player-game interactions, and feeds the data into its gaming platform
        
        A real-estate website that tracks a subset of data from consumers’ mobile devices, and makes real-time property recommendations of properties to visit based on their geo-location.
        
        ideal for time-critical operations that require an instant real-time response.
        For example, a system that monitors a building for smoke and heat needs to trigger alarms and unlock doors to allow residents to escape immediately in the event of a fire.
    - - Data Size
        
        Batch processing is suitable for handling large datasets efficiently
        
        Stream processing is intended for individual records or micro batches consisting of few records.
      - Performance
        
        The latency for batch processing is typically a few hours
        
        Stream processing typically occurs immediately, with latency in the order of seconds or milliseconds
      - Data Scope
        
        Batch processing can process all the data in the dataset.
        
        Stream processing typically only has access to the most recent data received, or within a rolling time window (the last 30 seconds, for example).
      - Analysis
        
        batch processing for performing complex analytics
        
        Stream processing is used for simple response functions, aggregates, or calculations such as rolling averages.
      - Latency is the time taken for the data to be received and processed.