Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data Warehousing and Mining - Coggle Diagram
Data Warehousing and Mining
Decision-Support System
Transaction Processing
Systems that record information abaout transactions
Decision suporp system
High-level information out of the detailed information of traction-processing systems
Help managers to decide
What products to manufacture in a factory
What applicants should be admitted to a university
What products tp stock in a shop
The storage and retrieval of data
Decision-support queries can be written in SQL
Propose to make data analysis easier
Database query languages are not suited to statistical analyses of data
Several packages SAS and S++
The data sources not permit other pats of the company to retrieve daya on demand
Build data warehouses for querie efficiently
Data Warehousing
Is a repository or archive of information gathered from multiply sources.
Stored under a unified schema at a single site
Components of a data Warehouse
Imput: Data sources
Transmit new information either continually or periodically.
In a destination-driven architecture the data warehouse periodically sends requests for new data to the sources.
What schema to use
Data sources that have neem constructed independently to have different schemas
Schema integration
Data loaders
DBMS
Output: Query and analysis tools
Data transformation and cleansing
Data cleansing: Task of correcting and preprocessing data
Data sources often deliver data with minor inconsistencies
Warehouse Shemas
Usually use tools as OLAP tools for designed data analysis
Usually have short identifiers that are forein keys into other tablles called dimension tables
Data Mining
Process of semiautomatically analyzing large databases to find useful patterns
Discover rules and pattersns from data
Knowledge discovery
in databases
Equations relating different variables to each other
Association Rules
Can be used in several ways
A rule must have an associated population
Population: set of instances
Other Types of Associations
Correlations between items are more interesting, and they can be positive or negative
Using plain association rules has several shortcomings
There are standard measures of correlation widely used in statistics, which can help identify more interesting associations between items
One of the major shortcomings is that many associations are not very interesting as they can be predicted
Clustering
Clustering is the problem of finding clusters of similar points in given data
Clustering is also used in classification systems in biology to group related items together
Complex hierarchical clustering scheme grouping related species together at different levels of the hierarchy based on their characteristics.
Other Forms of Data Mining
Text mining applies data-mining techniques to textual documents
Tools can form clusters on pages based on common words, helping users find pages they have visited earlier
Pages can be automatically classified into a Web directory according to their similarity with other pages
Visual displays of data, such as maps and charts, allow data to be presented compactly to users
Data-visualization systems provide system support for users to detect patterns and are an important component of data mining