Please enable JavaScript.
Coggle requires JavaScript to display documents.
Machine Learning - Coggle Diagram
Machine Learning
Supervised Learning
-
Decision Tree
-
Goodness measure
Info gain
-
-
bias towards choosing features will large no. of values e.g., id column. This will cause overfitting
-
-
Greedy search, no backtracking
-
-
-
-
-
Support Vector Machine
architecture
always 3 layers - input, hidden and output
-
-
-
-
Penalty parameter, the bigger the C the less tolerant of misclassifications
Linear Regression
-
- Fit a line to the data
- Gradient descent for each parameter e.g., m and c for y = mx + c
- Update m = m + learning rate * (dE/dm)
-
-
K-nearest neighbours
Given a new point, see the classification of k nearest neighbours
-
Needs distance metrics, training data and value of k
-
k value if too small, will be sensitive to noise. k value if too big, may include irrelevant points. Choose odd value for k
Advantages - Simple, building model is inexpensive, well suited for multiple classes data or records with multiple class labels
-
Naive Bayes
-
-
When a feature becomes zero e.g., for the fruits example, the long is zero, the probability becomes zero
Hence the laplace correction, to increase the count of variable to1 so that overall probability won't become zero
Pros
-
-
When assumption of independent variables hold, naive bayes performs better than logistics regression
Does well for categorical. For numerical, need to assume normal distribution
-
Unsupervised Learning
-
-
-
-
Evaluate cluster results
Interpreting
Can explain clusters in practical terms, distinguishing features across cluster profiles
-
-
Machine Learning process
-
Preprocessing
-
Data Cleaning
Fill in missing values, smooth noisy data, remove outliers, resolve inconsistencies
-
-
Data reduction
Dimensionality reduction
-
-
-
General methods
Aggregation
ratio e.g., income to debt ratio
-
-
-
-
Feature subset selection
-
filter approach
-
e.g., select attributes that have as low correlation as possible
-
wrapper approach
backward feature
use all features, remove one at a time
-
-
-
-
-
Model evaluation
Overfitting
When model has high training accuracy, but low testing accuracy
-
-
-
-
Ensemble
-
-
Method 2: Random Forest
-
-
-
-
-
Need to tune the probability of using a feature at each split, suggestion sqrt(total no. of features)
-
Testing random forest
No need separate test set, just test each tree with the left over data
A sample will have been used for testing data a third of the time (assuming 2 thirds are used for training). So this sample with other samples that are used for testing for that tree, will be used to get the prediction of the tree.
-
For regression, sum up (prediction - actual)^2
-
Method 3: Boosting
- Build a model (but don't overfit)
- Increase weights of examples model got wrong ("Look at what you got wrong, Look! Look!"
- Retrain a new model using the weighted training set
- Repeat (e.g., 100+ iterations)
- Weighted prediction of each model
-
-
Gradient Boosting
-
Different from AdaBoost, because AdaBoost use high weight points but Gradient Boost add a user-defined cost function to loss function
-
- Predict using first decision tree stump
- Compute the residuals which is where prediction - actual. This residuals will be used as target for the new tree
- Predict the new tree using the same variables + new target
- Compute the residuals again where original prediction + new prediction using residuals - actual
-
Hybrid models
Difference with ensemble
Ensemble is multiple but homogenous, weak models
Hybird is different models
-
-