Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data Modeling, image - Coggle Diagram
Data Modeling
Data Factory Basic Concepts
What is EPAM Data Factory and its reference architecture? How does it use data?
What is the difference between a data product and a data source?
What are the paths for data sources and data products in EPAM Data Factory reference architecture?
What are the stages of data product development?
Approaches
What’s the difference between data and information?
When can I use different types of databases?
Relational
for analytical platforms and building charts.
Non-Relational
for streaming processes and data processing on the go
Why is data modeling important?
Levels
Conceptual
Audience: business stakeholders
Purpose: to define business rules
Logic
Audience: data architects and business analysts
Purpose: to clarify business rules and create data structures for a specific DBMS
Physical
Audience: database architects and developers
Purpose: to prepare the model for technical implementation on a specific DBMS
Techniques
ER
ER modeling can be used for any level of abstraction.
3NF Modeling
more related to building operational data storage.
3NF modeling is great for storing operational data for ERP and CRM.
To get a 3NF model, you need to put your data through a normalization process to reduce duplication, avoid anomalies, ensure referential integrity, and simplify data management.
Normalization
is the process of organizing the columns (attributes) and tables (relations) of a relational database to reduce data redundancy and improve data integrity. Without normalization, it becomes difficult to handle and update a database without data loss.
The database is considered normalized if it reaches at least the 3NF stage.
first normal form—1NF
4 more items...
second normal form—2NF
2 more items...
third normal form—3NF
1 more item...
Boyce-Codd normal form—BCNF
fourth normal form—4N
fifth normal form—5NF
sixth normal form—6NF
entities, attributes, and the relationships
Dimentional Modeling
measures, facts, and dimensions
Fact is the main table. It contains data about the business process at the same grain level. These are values that are linked to dimensions using relationships with primary keys.
accumulating snapshot
periodic snapshot
transaction
Measures are values in the fact tables.
Dimensions are tables in the structure of a database that contain attributes of actions stored in fact tables. Attributes of dimensions are textual or consist of other logically united values.
Types
Junk dimension
Degenerate dimension
Role-playing dimension
Conformed dimension
Functions
filtering
grouping
labeling
SCDs ( changing dimensions)
Type 0 : The starting point
Type 0 applies to most date dimension attributes.
The dimension attribute value never changes, so facts in the fact table are always grouped by this original value, and the attribute does not have a history, for example, names of countries and continents.
Natural keys are used in type 0 dimensions.
You can add new rows while keeping existing rows unchanged.
Type 1 overwrites the old value
type 2 adds a new record
Type 3: Add a new column
Type 4: Use a historical table
Type 5: A combination of types 1, 2, and 3
A dimensional model is great for consolidating data from multiple sources for planning, problem solving, and decision support.
Star schema
Snowflake
Data Valt
Anchor