Please enable JavaScript.
Coggle requires JavaScript to display documents.
3 CLASSIFIER, Cluster Analysis -
is an unsupervised learning methods that…
3 CLASSIFIER
MODEL EVALUATION
-
Estimate accuracy
-
Cross-validation – k-fold, LOOCV and Stratified CV
-
Ensemble Methods – Random Forest, Bagging and Boosting
-
-
Clustering
iv) When to use it
- To discover different groups in customer base and characterize their customer groups based on purchasing patterns.
- To make discovery of information by classifying the files on the internet
- To have a detection applications such as fraud in a credit card
iii) How to use it
Clustering Approaches
1) Partitioning Approach.
Create several divisions and then assess them using some criteria, such as the total of square errors. Typical methods: k-means, k-medoids, CLARANS
2) Hierarchical Approach
Create a hierarchical decomposition of the set of data (or objects) using some criterion. Main types of hierarchical: Agglomerative and Divisive.
-
4) Grid-Based Approach.
Based on a multiple-level granularity structure, a finite number of cells
5) Model-Based Approach
Each cluster is hypothesized and density function is clustered to locate the group.
i) What is it :
Cluster -
Referring to collection of data objects which are similar (or related) to one another within the same group OR different (or unrelated to the objects in other groups.
ii) Where to use it,
- Generally used for market research, pattern recognition, data analysis, and image processing.
- Widely used for data visualization to get insight data distribution
- Also as a preprocessing Tool for other algorithms for regression, classification, and association analysis.
Decision Tree
i) What is it ?
A type of supervised machine learning when the data is continuously split based on a certain parameter
-
ii) Where to use it ?
Often used in operations research, especially in decision analysis
-
iii) How to use it ?
- First, identify what we would like to classify based on the dataset
- Look at the training dataset.
For example :
- Choose the best attributes to split data which separates two different labels into two sets
-
-
Cluster Analysis -
is an unsupervised learning methods that do not have predefined classes OR previous group information.
Consists of different type of attributes, nominal, numeric and vectors.
-