Please enable JavaScript.
Coggle requires JavaScript to display documents.
DATA MINING - Coggle Diagram
DATA MINING
TASK & TECHNIQUE
Classification (Decision Trees)
- Given a training set which contain a set of attributes and one of it is the class.
- Training set are used to build a model.
- Test set are used to validate the model.
Clustering (k-means)
- Similar data is grouped in the same cluster.
- Dissimilar data is grouped in the same cluster
Similarity Measure
- Euclidean Distance if attributes are continuous.
- Other Problem-specific Measures.
Association Rules (Association Rule)
- Are "if-then" statements, that help to show the probability of relationships between data items, within large data sets in various types of databases
- an antecedent (if) and a consequent (then)
- Produce dependency rules that will predict an occurrence of an item based on occurrences of other items.
- Ex: “If a customer buys bread, he’s 70% likely of buying milk.”
Prediction
- Predicting a range of continuous values (which can also be called “numeric values”) in a specific datasets.
- Types of Regression
- Linear Regression Model ( Y = bX + A )
- Multiple Regression Model
Deviation analysis
- Discovering most significant changes in data from previously measured or normative values
- Can reveal surprising facts hidden inside data.
- Can be used for knowledge discovery, auditing, fraud detection and data cleaning.
- Modifications of classification, clustering, time series analysis can be used as a means to achieve the goal
-
-
PROBLEMS &
CHALLENGES
-
- Data accuracy will be poor if there is no data cleaning.
- Data mining must be efficient and scalable to extract the data effectively.
- Handling high-dimensionality
- The pattern discovered should be interesting
- Mining diverse and heterogeneous kinds of data
-
-
CLASSIFICATION
Data Mining strategies
Supervised Learning : Train a model with known input and output data to predict future outputs to new data
Classification
Naive Bayes, Model Seeker, Adaptive Bayes
-
-
-
-