Data Warehouse Architecture
Data Flow Architecture
1. Single DDS (Dimensional Data Store):
2 data stores: Stage & DDS
❤Advantages
Optimize query and data analysis performance
Minimizing system response time
⚠Disadvantages
Not suitable for data dimensions of unformatted or heterogeneous nature.
Not suitable for data dimensions that change continuously.
2. NDS (Normalized Data Store) + DDS
3 data stores: Stage, NDS & DDS
❤Advantages
it is easier to maintain master data in a normalized store like the NDS and publish it from there (because it contains little or no data redundancy)
easily rebuild the main DDS, build a new,smaller DDS
⚠Disadvantages
lexibility for creating and maintaining data stores
the data from the stage needs to be put into the NDS first before it is uploaded into the DDS --> requires more effort compared to the single DDS architecture
3. ODS (Operational Data Store) + DDS
3 data stores: stage, ODS & DDS
⚠Disadvantages
❤ Advantages
The third standard form is thinner than NDS (because it only contains current values)
--> The performance of both ODS ETL and DDS ETL is better than that in the NDS + DDS architecture.
Have a central location to integrate, maintain, and publish master data
Normalized relational data can be updated by end-user applications, making it capable of supporting applications that operate at the transactional level.
To build a new, small DDS,need to get it from the main DDS and can't use the existing DDS ETL to do that.
--> need to write custom queries (create table from select....), which is not preferred for reasons of standardization and consistency, or to build a new ETL, which is also not preferred because effort, especially if it's a disposable, throw-away thing.
4. Fedarate data warehouse
❤ Advantage
⚠Disadvantage
✅ would use an ODS + DDS architecture when you only need a one-way data store and you need a centralized, standardized data store to use for operational purposes such as CRM
can accommodate existing data stores and thus development time will be shorter.
It is very difficult to build a good quality inventory from such diverse stalls found in a data mart or data warehouse.
System Architecture
is about the physical configuration of the servers,
network, software, storage, and clients
2 types
Three-tier architecture:
- Data acquisition software (back-end)
- The data warehouse that contains the data & software
- Client (front-end) software that allows users to access
and analyze data from the warehouse
Two-tier architecture: First two tiers in three-tier architecture is combined into one
❤Advantages
click to edit
click to edit
Improved scalability
Improved Data Integrity
Enhanced Security
⚠Disadvantages
Complexity of Communication
❤Advantages
⚠Disadvantages
Not suitable for complex systems
Less scalable
Limited user interface
Reduced Distribution
Simpler to manage and control
Reduced system complexity
Faster communication process
Data Architecture
click to edit
Bill Inmon’s CIF (Corporate Information Factory
Cons
Expensive development time and maintenance cost: Normalized schemas exhibit greater complexity both in design and maintenance than their denormalized counterparts
More ETL is needed
click to edit
Pros
Single source of truth => avoid data redundancy, less errors
Flexibility. flexible to modifications to the data sources. Because of the ETL process design that leads to normalized data
Ensure data integration
CIF is a top-down design creating an enterprise-wide data model from the source systems to design the EDW. Then build data marts for specific departments
Independent data marts
Cons
Overlapping and redundant activities
It's not integrating data
Create many data silos => inconsistent reports
Pros
Easy to implement and use provided information
Cheaper and faster to deliver
Data marts are limited to a single business group
EDW (Enterprise Data Warehouse)
DW covering all subjects of the entire organization
Cons
inflexible with inevitable changes
It's overloaded and difficult for business people to use
Pros
Reduce redundant and overlapping data
EDWs enable data integration by allowing organizations to bring together data from disparate sources
Consistent and conformed data
Higher-level view of how the enterprise handles its data,
such as how it is categorized, integrated, and stored
Kimball’s Data Bus
Kimball uses a bottom-up business requirements-driven approach to design schemas
for the data marts. Kimball’s architecture assumed the ETL system could perform the 5C’s
Pros
Simplicity and speed. Kimball’s architecture is much simpler and faster to design and set up
Relevancy and understandable: makes all data integrations relevant to the business needs
Cons
No single source of truth => Data redundancy, more errors
Kimball’s architecture is less flexible to changes and adapts more slowly.
Does not offer a complete 360 view of enterprise data
✅ would use a single DDS architecture when you need only a one dimensional store
and you don’t need a normalized data store.