Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data Warehousing and Mining - Coggle Diagram
Data Warehousing and Mining
Decision-Support Systems
Registration of transaction processing systems
transaction information
The storage and retrieval of data for decision support raises several issues
Many decision-support queries cannot be expressed in SQL or cannot be expressed easily in SQL.
Database query languages are not suited to the performance of detailed statistical analyses of data.
Large companies have diverse sources of data that they need to use for making business decisions.
Knowledge-discovery techniques attempt to discover automatically statistical rules and patterns from data.
Data Mining
Process of semiautomatically analyzing large databases to find useful patterns.
Some types of knowledge discovered from a database can be represented by a set of rules.
Applications
Prediction
Association
Association Rules
Must have an associated population
Population consists of a set of instances
Support
measure of what fraction of the population satisfies both the antecedent and the consequent of the rule.
Confidence
Measure of how often the consequent is true when the antecedent is true.
Clustering
refers to the problem of finding clusters of points in the given data.
Formalized from distance metrics in several ways.
Types
Agglomerative clustering
start by building small clusters and then create higher levels
Hierarchical clusering
Divisive clustering
First create higher levels of the hierarchical clustering, then refine each resulting cluster into lower-level clusters.
Other Forms of Data Mining
Text mining
applies data-mining techniques to textual documents.
Data-visualization
help users to examine large volumes of data, and to detect patterns visually
Data Warehousing
Components of a Data Warehouse
data loaders
DBMS
Warehouse Schemas
Data warehouses typically have schemas that are designed for data analysis, using tools such as OLAP tools
Column-Oriented Storage
When a query needs to access only a few attributes of a relation with a large number of attributes, the remaining attributes need not be fetched from disk into memory.
Storing values of the same type together increases the effectiveness of compression;
Classification
Classification can be done by finding rules that partition the given data into disjoint groups.
Decision-Tree Classifiers
Each leaf node has an associated class, and each internal node has a predicate.
The most common way building them is to use a greedy algorithm,
Attributes
continuous valued
binary splits
categorical
Other Types of Classifiers
Bayesian classifiers
find the distribution of attribute values for each class in the training data
Support Vector Machine
Regression
deals with the prediction of a value, rather than a class
Validating a Classifier
True positive
False positive
True negative
False negative
Quality
Recall
Precision
Specificity
Accuracy
Other Types of Associations
many associations are not very interesting, since they can be predicted
Deviation
Deviations from temporal patterns are often interesting