Quality-Oriented Data Warehouse Design
Design Approaches DW
Demand-driven or top-down
- conceptual schema without considering data sources
Supply driven or bottom-up
- conceptual design is based on data sources
Hybrid
Attribute vs. Dimension
Difference in realizing real- world scenarios
Summarizability
Conditions
Completeness
- e.g. one customer has at least one job
Disjointness/Strictness
- e.g. every customer has at most one job
Type compatibility
- e.g. do not SUM account balances over time
Drill-Down Incompleteness
- element has no entry in lower hierarchy
(e.g. Andorra has no regions, this leads to SUM of sales are different in regions and countries - 1, to 0, into 1 to 1,
Non-Strictness
- week can be in multiple months, but
never many to many relationships inside dimensions
Roll-Up Incompleteness
- element has no super hierarchy
- e.g. Napkin is not assigned to super group
- introduce category "other"
Multidimensional Normal Forms
Aims
Reasonable
Control over optional dimensions levels
Efficient physical design
1st MNF
- formal definition via restrictions on FDs
- Faithfulness
- Completeness
- Avoidance of redundancies
2nd MNF
- addresses optional levels
- formal definition via context-dependency for every optional level
- conctext-sensititve summarizability
- efficient physical design
3rd MNF
- addresses implicit class hierarchies
- defined via restrictions on context-dependencies
- construction of explicit class hierarchies
DB and DW Design Process
- Requirements analysis and specification
- Conceptual design
- (Logical design)
- (Physical design)
2. Conceptual design
- synthesize fact schema
- synthesize hierarchies
- add context dependencies
- define summarizability constraints
Resulting fact schema are in 3rd MNF