Please enable JavaScript.
Coggle requires JavaScript to display documents.
Data Mining (process that uses statistical, mathematical and AI techniques…
Data Mining
process that uses statistical, mathematical and AI techniques to extract and identify useful useful information (business rules, trends, affinities, correlations and prediction models) from large data sets.
-
-
-
-
-
-
Process
-
-
Consolidation: select, filter, integrate and unify
Cleaning: blanks, outliers and inconsistencies
Transformation: normalization, discretization, aggregation, new attributes
-
-
-
-
-
Tasks
-
-
Clustering: partitioning collection of things into segments/ groupings based upon similar characteristics
-
Methods
Classification
analyses historical data stored in a DB and automatically generates a model that can predict future behavior
learns patterns from previous labeled data to place new data instances into their respective groups or classes
-
If the prediction is a label (i.e. yes/no, good/bad, high/low) the problem is called a classification
Accuracy
Metrics
-
-
-
Recall: TP/(TP+FN) Ratio correct positives/ sum correctly classified positives + incorrectly classified negatives
-
-
-
Techniques
Decision tree analysis
-
- Select best splitting attribute
-
- Add a branch to the node for each value of the split
- Repeat step 2 and 3 until stopping criteria is reached/ the node is dominated by a single class label
-
Cluster analysis
-
-
Methods
Divisive: start wit one class, then brake appart
Agglomerative: start with individual classes, then join.
-
-
-