Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data warehouse infrastructure - Coggle Diagram
Data warehouse infrastructure
INFRASTRUCTURE SUPPORTING ARCHITECTURE
Opreational infrastructure
support each architectural component consists of
People
Procedures
Training
Management software
Physical Infrastructure
includes
server hardware, operating system, network software, database software, the LAN and WAN, vendor tools for every architectural component, people, procedures, and training
HARDWARE SYSTEMS and OPERATING SYSTEMS
Hardware and operating systems make up the computing environment for data warehouse
Guideline for hardwares
Vendor Reference
Scalability
Support
Vendor Stability
Guideline for OS
Salability
Security
Reliability
Availability
Preemptive Multitasking
Use multithreaded approach
Memory protection
Platform Options
Single Platform Option
most straightforward and simplest option
all functions from the backend data extraction to the front-end query processing are performed on a single computing platform.
Hybrid Option
includes:
Source Data platform
Staging Area platform
Data Movement Considerations
for
data acquisition
data storage
data has to move across platforms
data transportation across different platforms
Depends on
source platforms in your company
choice of the platform for data staging and data storage
options
Shared disk
Mass transmission
Real time connection
Manual method
Client Server Architecture for the Data Warehouse
Today’s warehouses are built using the client/server architecture
Most of these are multitiered, second-generation client/server architectures
Desktop Client
Present logic
Presentation Service
Nowadays almost change to Web Client
Application Server :
Middleware
Connectivity
Control
Metadata management
Web access
Authorization
Query - report management
OLAP
Database Server
DBMS
Primary Data Repository
Server Hardware
Selecting the server hardware is among the most important decisions
server hardware selection can be a “bet your bottom dollar” decision
Scalability and optimal query performance are the key phrases
Options
SMP (Symmetric Multiprocessing)
Benefits
It gives scalable performance; simply add more processors to the system bus
It balances workload very well
It provides high concurrency
Limitations
Available memory may be limited
Performance may be limited by bandwidth for processor-to-processor communication, I/O, and bus communication
Availability is limited; like a single computer with many processors
Clusters
Benefits
Provides high availability; all data is accessible even if one node fails
It preserves the concept of one database
This option is good for incremental growth
Limitations
Bandwidth of the bus could limit the scalability of the system
This option comes with a high operating system overhead
Each node has a data cache; the architecture needs to maintain cache consistency for internode synchronization
MPP (Massively Parallel Processing)
Benefits
The option provides fast access between nodes
Any failure is local to the failed node; this improves system availability
This architecture is highly scalable
Generally, the cost per node is low
Limitations
The architecture requires rigid data partitioning
Data access is restricted
Workload balancing is limited
Cache consistency must be maintained
DATABASE SOFTWARE
Data-warehouse related add-ons are becoming part of the database offerings
DBMSs have also been scaled up to support very large databases
Parallel processing options in database software are intended only for machines with multiple processors
Most of the current database software can parallelize a large number of operations
COLLECTION OF TOOLS
Data Acquisition
Data Extraction
Data Transformation
Data Quality
Data Storage
Data Modeling
Data Loading
Information Delivery
Queries and Reports
Dashboards
Scorecards
OLAP
Alert System
DS Apps.
Data Mining