Data Warehouse Architecture

Data Flow Architecture

1. Single DDS (Dimensional Data Store):

2 data stores: Stage & DDS

Advantages

Optimize query and data analysis performance

Minimizing system response time

Disadvantages

Not suitable for data dimensions of unformatted or heterogeneous nature.

Not suitable for data dimensions that change continuously.

2. NDS (Normalized Data Store) + DDS

3 data stores: Stage, NDS & DDS

❤Advantages

it is easier to maintain master data in a normalized store like the NDS and publish it from there (because it contains little or no data redundancy)

easily rebuild the main DDS, build a new,smaller DDS

⚠Disadvantages

lexibility for creating and maintaining data stores

the data from the stage needs to be put into the NDS first before it is uploaded into the DDS --> requires more effort compared to the single DDS architecture

3. ODS (Operational Data Store) + DDS

3 data stores: stage, ODS & DDS

⚠Disadvantages

❤ Advantages

The third standard form is thinner than NDS (because it only contains current values)

--> The performance of both ODS ETL and DDS ETL is better than that in the NDS + DDS architecture.

Have a central location to integrate, maintain, and publish master data

Normalized relational data can be updated by end-user applications, making it capable of supporting applications that operate at the transactional level.

To build a new, small DDS,need to get it from the main DDS and can't use the existing DDS ETL to do that.

--> need to write custom queries (create table from select....), which is not preferred for reasons of standardization and consistency, or to build a new ETL, which is also not preferred because effort, especially if it's a disposable, throw-away thing.

4. Fedarate data warehouse

❤ Advantage

⚠Disadvantage


✅ would use an ODS + DDS architecture when you only need a one-way data store and you need a centralized, standardized data store to use for operational purposes such as CRM

can accommodate existing data stores and thus development time will be shorter.

It is very difficult to build a good quality inventory from such diverse stalls found in a data mart or data warehouse.

System Architecture

is about the physical configuration of the servers,
network, software, storage, and clients

2 types

Three-tier architecture:

  • Data acquisition software (back-end)
  • The data warehouse that contains the data & software
  • Client (front-end) software that allows users to access
    and analyze data from the warehouse

Two-tier architecture: First two tiers in three-tier architecture is combined into one

❤Advantages

click to edit

click to edit

Improved scalability

Improved Data Integrity

Enhanced Security

⚠Disadvantages

Complexity of Communication

❤Advantages

⚠Disadvantages

Not suitable for complex systems

Less scalable


Limited user interface

Reduced Distribution

Simpler to manage and control

Reduced system complexity

Faster communication process

Data Architecture

click to edit

Bill Inmon’s CIF (Corporate Information Factory

Cons

Expensive development time and maintenance cost: Normalized schemas exhibit greater complexity both in design and maintenance than their denormalized counterparts

More ETL is needed

DW_3

click to edit

Pros

Single source of truth => avoid data redundancy, less errors

Flexibility. flexible to modifications to the data sources. Because of the ETL process design that leads to normalized data

Ensure data integration

CIF is a top-down design creating an enterprise-wide data model from the source systems to design the EDW. Then build data marts for specific departments

Independent data marts

Cons

Overlapping and redundant activities

It's not integrating data

Create many data silos => inconsistent reports

Pros

Easy to implement and use provided information

Cheaper and faster to deliver

DW_1

Data marts are limited to a single business group

EDW (Enterprise Data Warehouse)

DW covering all subjects of the entire organization

Cons

inflexible with inevitable changes

It's overloaded and difficult for business people to use

Pros

Reduce redundant and overlapping data

EDWs enable data integration by allowing organizations to bring together data from disparate sources

Consistent and conformed data

DW_0

Higher-level view of how the enterprise handles its data,
such as how it is categorized, integrated, and stored

Kimball’s Data Bus

Kimball uses a bottom-up business requirements-driven approach to design schemas
for the data marts. Kimball’s architecture assumed the ETL system could perform the 5C’s

DW_4

Pros

Simplicity and speed. Kimball’s architecture is much simpler and faster to design and set up

Relevancy and understandable: makes all data integrations relevant to the business needs

Cons

No single source of truth => Data redundancy, more errors

Kimball’s architecture is less flexible to changes and adapts more slowly.

Does not offer a complete 360 view of enterprise data


✅ would use a single DDS architecture when you need only a one dimensional store
and you don’t need a normalized data store.