Please enable JavaScript.
Coggle requires JavaScript to display documents.
DECISION TREE, EVALUATION METHOD, CLUSTERING - Coggle Diagram
DECISION TREE
WHAT IS IT?
A decision tree is a supervised learning approach and a type of algorithm that classifies information in such a way that a tree-shaped model is generated
Is in a flowchart-like tree structure, where;
- each internal node (nonleaf node) denotes a test on an attribute
- each branch represents an outcome of the test
- each leaf node (or terminal node) holds a class label
- the topmost node in the tree is the root node
-
Can be used for both categorical and numerical data;
- Categorical data represent gender, marital status, etc.
- Numerical data represent age, temperature, etc.
WHEN TO USE IT?
Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal
when we want to create data models that will predict class labels or values for the decision-making process
-
HOW TO USE IT?
Decision trees use multiple algorithms to decide to split a node into two or more sub-nodes. The creation of sub-nodes increases the homogeneity of resultant sub-nodes. The decision tree splits the nodes on all available variables and then selects the split which results in most homogeneous sub-nodes
The algorithm selection is also based on the type of target variables. Some algorithms used in Decision Trees:
- ID3 → (extension of D3)
- C4.5 → (successor of ID3)
- CART → (Classification And Regression Tree)
- CHAID → (Chi-square automatic interaction detection Performs multi-level splits when computing classification trees)
- MARS → (multivariate adaptive regression splines)
Attribute Selection Measures:
- Entropy
- Information gain
- Gini index
- Gain Ratio
- Reduction in Variance
- Chi-Square
WHERE TO USE IT?
Applications of classification arise in diverse fields, such as retail target marketing, customer retention, fraud detection, and medical diagnosis
EVALUATION METHOD
-
HOW TO USE IT?
-
Cross Validation Method
It gives the model the opportunity to train on multiple train-test splits, hence gives a better indication of how well a model perform on unseen data.
-
-
CLUSTERING
-
WHAT ?
One of unsupervised learning methods which do not have any predefined classes or any previous group information.
-
HOW ?
Partitional Clustering: A division data objects into non-overlapping subsets (clusters) such that each data object is in exactly one subset.
-
WHEN ?
-
As a pre-processing tool for regression, classification, and association analysis by reducing the size of large data sets and compressing images through vector quantization.