Please enable JavaScript.
Coggle requires JavaScript to display documents.
Tenth reading - Data Warehousing and Business Intelligence, Ariana…
Tenth reading - Data Warehousing and Business Intelligence
Decision support processes aid knowledge workers in reporting, querying, and analysis.
Business
Drivers
Inputs:
Business Requirements
IT Strategy
Internal Data Feeds
Primary Deliverables:
Data Products
DW and BI
Architecture
Suppliers:
Business Executive
Governance Body
Enterprise Architecture
Participants:
Sponsors & Product Owner
Change Management
Consumers:
Information
Consumers
Managers and
Executives
Technical
Drivers
Techniques:
Prototypes to Drive
Requirements
Self Service BI
Tools:
Metadata Repositories
Data Integration Tools
Metrics:
Usage Metrics
Customer/User Satisfaction
Business Drivers
Operational Functions: Data warehousing supports operational functions.
Compliance Requirements: Warehouses provide evidence for regulatory compliance.
Business Intelligence (BI): BI activities are a primary reason for data warehousing.
Goals and Principles
Objectives:
BI Support: Facilitate Business Intelligence activities.
Enhance Decision-Making: Enable effective business analysis and decision-making.
Drive Innovation: Innovate based on insights derived from data.
Guiding Principles:
Business-Focused: Align DW with business goals to solve organizational problems.
End-Driven Design: Let BI priorities guide DW content creation.
Detail-First Approach: Summarize and optimize after building on atomic data.
Tailored Solutions: Use appropriate tools for different data consumer groups.
Business Intelligence
BI Analysis:
Understand organizational activities and opportunities.
Improve organizational success through data analysis.
Key to gaining a competitive advantage.
BI Technologies:
Set of tools supporting data analysis.
Evolution of decision support tools.
Used for budgeting to advanced analytics.
Data Warehouse
Data Warehouse (DW):
Integrated decision support database.
Gathers data from various operational and external sources.
Data Marts:
Subset copies of data from the warehouse.
Support historical, analytical, and BI requirements.
Enterprise Data Warehouse (EDW):
Centralized warehouse for the entire organization.
Adheres to an enterprise data model.
Data Warehousing
Data Warehousing Process:
Extract, cleanse, transform, control, and load operations.
Maintains data integrity and historical context.
Enforces business rules and relationships.
Data Types in BI and DW:
Traditionally focuses on structured data.
Governing analytic workloads with unstructured data remains a challenge.
Approaches to Data Warehousing
Bill Inmon:
Utilizes a normalized relational model.
Subject-oriented, integrated, time-variant, and non-volatile data collection.
Ralph Kimball:
Emphasizes a dimensional model.
A copy of transaction data structured for query and analysis.
Core Ideas Recognized by Both:
Stores data from various systems.
Organizes data to enhance value.
Makes data accessible for analysis.
Inmon's Corporate Information Factory (CIF):
Subject-oriented, integrated, time-variant, and non-volatile data.
Historical focus, supporting both tactical and strategic decision-making.
Data Warehouse and Marts Characteristics:
Abundant historical data available.
Higher latency compared to applications.
Time-variant data.
Movement Within CIF:
Shift from operational execution to analysis.
Evolves from fixed operations to ad hoc uses.
Transition from front-line workers to decision-makers.
Dimensional DW (Kimball)
A copy of transaction data structured for query and analysis.
Utilizes a dimensional data model, often referred to as Star Schema.
Components:
Facts: Contain quantitative data about business processes.
Fact Table: Joins with many dimension tables, forming a star schema.
Bus Matrix:
Identifies opportunities for conformed dimensions shared across processes.
Guides scoping and development efforts for an integrated enterprise data warehouse.
DW/BI Architecture View:
Highlights an integrating paradigm with a 'DW Bus' for shared or conformed dimensions.
Key Differences with Inmon's Approach:
Focuses on efficient end-delivery of analytical data.
Emphasizes user-driven data requirements.
DW Architecture Components
Components:
Data Warehouse/BI:
Architectural elements organized to meet enterprise needs.
Big Data:
Represents an additional path for bringing data into the enterprise.
Impact of Big Data Evolution:
Alters the DW/BI landscape.
Adds new avenues for data integration.
Source Systems
Encompass operational systems and external data for integration into the DW/BI environment.
CRM, Accounting, Human Resources applications.
Industry-specific operational systems.
Data Integration
In a Service-Oriented Architecture (SOA) environment, includes the data services layers.
Encompasses Extract, Transform, and Load (ETL) processes, data virtualization, and other techniques.
Data Storage Areas
Staging Area:
Intermediate data store for transformation and integration.
Reference and Master Data Conformed Dimensions:
May reside in separate repositories.
Central Warehouse:
Persists transformed and prepped data, maintaining historical and latest batch data.
Operational Data Store (ODS):
Supports lower latencies for operational use.
Data Marts:
Supports presentation layers, departmental or functional subsets.
Cubes (OLAP):
Types include Relational, Multi-dimensional, and Hybrid.
Types of Load Processing
Historical Loads:
Typically loaded once or a few times for initial setup.
Ongoing Updates:
Consistent, scheduled updates to maintain current warehouse data.
Historical Data
Capture Methods:
Different methods for capturing detailed history.
Design based on specific requirements.
Inmon Approach:
Single data warehouse layer.
Accessed via a star-structured data mart.
Common integration and transformation layer for reuse.
Kimball Approach:
Stores history at the atomic level.
Contains cleansed, standardized, and governed data.
Data Vault:
Cleanses and standardizes in the staging process.
History stored in a normalized atomic structure.
Batch Change Data Capture
Frequency:
Often daily with a nightly batch window.
Change Detection:
Database log techniques common for in-house applications.
Accommodates various change capture techniques.
Load Techniques:
Time-stamped or log table loads are prevalent.
Full loads for legacy systems without native time stamping.
Change Data Capture (CDC) Techniques:
Overlap indicates potential data duplication.
Delete indicator tracks deletes for expiring dimensions.
Near-real-time and Real-time
Objective:
Lower latency and increased integration of real-time or near-real-time data.
Application Example:
Automated banking machine data provisioning for immediate customer updates.
Design Concepts:
Isolation of Change:
Achieved through building partitions and union queries for different partitions.
Alternatives to Batch Processing:
Address shorter latency requirements for data availability in the DW.
Types of Alternatives:
Trickle Feeds (Source Accumulation):
Executed more frequently (e.g., hourly or every 5 minutes).
Batch loads triggered by schedule or threshold (e.g., 300 transactions).
Messaging (Bus Accumulation):
Independent source and target systems.
Frequently used in Data-as-a-Service (DaaS).
Streaming (Target Accumulation):
Target system collects and processes data as received.
Data appears in the warehouse after processing.
Understand Requirements
Key Differences:
Operational systems rely on precise, specific requirements.
Initial Phases Focus:
Prioritize understanding capabilities and data sources.
Invest time in design to minimize rework costs during testing.
Requirement Gathering for DW/BI Projects:
Start with business goals and strategy.
Scope business areas and interview relevant stakeholders.
Capture current and desired data-related questions.
Requirements Cataloging:
Prioritize into essential for production and those that can wait.
DW/BI Project Write-Up:
Frame the context of business areas and processes in scope.
Ariana Alvarado Molina - 2021089068