Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data Management for Analytics - Coggle Diagram
Data Management for Analytics
Data Management Models
DAMA-DMBOK2
(1.0L P36-40)
CMMI Data Management Maturity (DMM) Model
(1.0L p42-44)
DCAM
Gartner
Data Modeling and Design
Techniques
Entity-Relationship (ER) data model
Cardinality
(2.0L p30-42)
Minimum
Optional-to-mandatory relationship
Optional-to-optional relationship
Mandatory-to-mandatory relationship
Maximum
Many-to-many relationship
One-to-many relationship
One-to-one relationship
Describes the relationship between entities and their attributes
Convert to relational data model for implementation
ER Diagram
(2.0L p22-24)
Crow’s Feet Notation
Chen Notation
Unified Modelling Language (UML) Notation
Relationship
Weak
Represented by dotted lines
Entity's existence is independent of other
entities
Strong
Represented by solid lines
Child entity's existence is dependent on
parent
Relational data model
Dimensional data model
(2.0L p-84-102)
Elements
Fact
Business measure, normally numeric, stored in fact table
Dimension
Give who, what, where of a fact
Attribute
Various characteristics of the dimension
Fact table
Holds the data to be analyzed
Dimension table
Stores data about the ways in which the data in the fact table can be analyzed
Schema
(2.0L p95-98)
Star
Snowflake
Steps
Identify Business Process
Identify Grain (level of detail)
Identify Dimensions
Identify Facts
Build Schema
Normalization
(2.0L p5-15)
(2.0A p4-10)
structuring a database in accordance with a series of normal forms
to reduce data redundancy and improve data integrity
Used mainly for relational data models or ER data models
Types
(2.0L p19-21)
Physical
Provide a schema for how the data will be physically stored within a database
Describes the base relations, file organizations, and indexes used to achieve efficient access to the data, and any associated integrity constraints and security measures
Logical
the structure of the data and the relationships among them are defined and the entire database plan is laid out with the connections between entities diagrammed
Conceptual
Goal is to organize ideas and define business rule
Business stakeholders outline what they need the data to provide, and data architects specify ways data can be organized to provide it.
Database
Relational Database (RDB)
(2.0L p54-61)
Mainly uses Online Transaction
Processing (OLTP) system
Properties of transactions: ACID
(2.0L p46-48)
Durability
The effects of a successfully completed (committed) transaction are permanently recorded in the database and must not be lost because of a subsequent failure
Isolation
Transactions execute independently of one another
Atomicity
The entire transaction takes place at once or doesn't happen at all
Consistency
The database must be consistent before and after the transaction
From relational data model
Used for processing a massive number of transactions
Provide data to data mart / data warehouse
CRUD: Creation, Retrieval, Update, Deletion
Data Mart
Data Warehouse
(2.0L p64-82)
Mainly uses Online Analytical Processing (OLAP) system
From dimensional data model
Ideal for data mining, business intelligence and complex analytical calculations
Subject oriented
Integrated
Non-volatile
Time variant
Data Lake
(2.0L p104-118)
Mainly for unstructured data / big data or as a repository
Enterprise Architecture
Framework
Zachman Framework
(1.0L p17-19)
The Open Group Architecture Framework (TOGAF)
(1.0L p12-15)
Key components
Business architecture
Data architecture
Applications architecture
Technology architecture
Gartner
Federal Enterprise Architecture Framework (FEAF)
Practice of analyzing, designing, planning and implementing enterprise analysis to successfully execute on business strategies.
Helps lay out how information, business and technology flow together
Data Integration and Interoperability
Data Lakehouse
(3.0L2 p27-34)
Merge data lake and data warehouse
addresses the key challenges of current data architectures by building on top of existing data lakes
Data Hub
(3.0L2 p51-57)
simplify and scale data flows while ensuring control and consistency
A logical architecture that enables data sharing by connecting producers of data with consumers of data.
Integration
Consolidates data into consistent forms, either physical or virtual
Interoperability
is the ability for multiple systems to communicate
Data Pipeline
(3.0L2 p75-90)
ETL and ELT
(3.0L2 p15-20)
Data acquisition into data mart or data warehouse
Building process
Ingest
Explore
Model
Curate
Catalog
Cloud Data Platform
(3.0L2 p36-49)
Avoid data silos
combining data
uniting the ecosystem
opening access
Data Virtualization
(3.0L2 p61-64)
Integrate data sources across multiple data types and locations, turning it into a single logical view without having the need to do any sort of data replication or movement.
Data Federation
(3.0L2 p66-69)
provide a single form of access to virtual databases with strict data models.
Doesn’t use a data model and can access a variety of data types with extra features, applications, and functions.
DataOps
(3.0L2 p92-101)
An agile, process-oriented methodology for developing and delivering analytics
Improve the communication , integration and automation of data flows between data managers and data consumers across an organization
Data Fabric
(3.0L p103-112)
An architecture that facilitates the end-to-end integration of various data pipelines and cloud environments through the use of intelligent and automated systems.
Uses data virtualization
Follows a metadata-driven approach
Data Replication
(3.0L2 p71-72)
Data Governance
(4.0L p3-24)
Data Quality
When data fits the purpose that it was intended for
Metrics
(4.0L p37-40)
Primary Dimensions
(4.0L p27-28)
Consistency
Accuracy
Format
Timeframe
Integrity
Comprehensiveness
Data Profiling
(4.0L p43-50)
analysis of data to clarify the structure, content, relationships, and derivation rules
Data Dictionary
(4.0L p52-3)
A specification and description of data structures in a database, data model or data source
Data Catalog
(4.0L p54-55)
An inventory of data assets in an organization
Data Lineage
(4.0L p78-80)
Provides details on the data origin, what happens to it and where it moves over time
Data Security
(4.0L p63-
Planning, development and execution of security policies and procedures to provide proper authentication, authorisation, access and auditing of data assets
Data Security Diagram
(4.0L p72-75)
To depict which actor (person, organization or system) can access which enterprise data.
Identity and Access
Management (IAM)
Data
Classification
the process of organizing information assets using an agreed upon categorization , taxonomy or ontology
allows organizations to have the required knowledge about the sensitivity of the data they process
Data Privacy
(4.0L p82-95)
Data Masking
Removes or modifies personally identifiable information
Methods
Substitution
Encryption
Reversible
Scrambling
Number and date variance
Nulling out or deletion
Shuffling
To comply with data protection regulations
Metadata Management
(4.0L p97-98)
Master Data Management
(4.0L p100-103)