Please enable JavaScript.
Coggle requires JavaScript to display documents.
Machine Learning Algorithms - Coggle Diagram
Machine Learning Algorithms
Back Ground on ML Algorithims
Terminology
Row: Row describes a single entity
Cell: A cell is a single value in a row & column
Aim of ML algorithm is to estimate the mapping function (f) of output variables (Y) given input variables (X) or Y= f(X)
Column: Column Describes data of Single type.
Classification Based on Prediction Type:
Non Parametric Algoritthims
: That are free to learn any functional form and don't make strong assumptions about the form
Decision Trees
Naive Bayes
Support Vector Machines
Neural Networks
Parametric ML Algorithims
: That Simplify the function to a known form
Linear Discriminant Analysis
Perceptron
Logistic Regression
Classification based on Output Data Type (Labelled or not)
Un Supervised
: You only have (X) and n o corresponding (Y). Here the goal is to distribute the data
Clustering
: Where you want to discover the inherent groupings in the data eg: K-Means algorithim, grouping customers by purchasing behaviour
Association
: Discover rules that describe large portions of your data eg:- Apriori algorithm, people that buy A also tend to buy B
Semi Supervised:
Problems where you have large amount of input data (X) and only some of the data is labeled(Y)
eg:- Photo Archive where only some of the images are labeled
Supervised
: Where you have (X) and (Y) and use algorithm to learn the mapping function from input to output Y = f(X)
Regression
: Output is a real value
Linear Regression
eg: time series prediction
Classification
: Output is Category
Support Vector Machines
eg: recommendation, topic modelling
Random forest
for Classification & Regression Problems
ML Prediction Errors
Variance Error
: Sensitivity of a model to changes to the training data
Low Variance
: Suggest Small Changes to the estimate of the target function with changes to the training data set
Linear regression,
LDA,
Logistic regression
High Variance
: Suggests Large changes to the estimate of the target function with the changes to the training data set
Decision Trees,
KNN,
SVM
Bias Error
: simplifying assumptions made by algorithm to make the problem easier to solve
Low Bias
: Suggests more assumptions about the form of the target function
Decision Trees,
K-Nearest Neighbors,
Support Vector Machines
High Bias
: Suggest less assumptions about the form of the target function
Linear Regression,
Linear Discriminant Analysis,
Logistic Regression
Irreducible Error
: Cannot be reduced regardless of what algorithim is used
Bias-Variance Trade-Off
: The goal of any Supervised ML is to achieve Low bias and Low Variance
Increasing the bias will decrease the variance
Increasing the variance will decrease the bias
Overfitting
: Learning the training data at the expense of not generalizing well on new data
Non Parametric,
Nonlinear models
How to Limit Overfitting
Use a resampling technique to estimate model accuracy
eg:- k-fold cross validation
Hold back a validation dataset
Underfitting
: Failing to learn the problem from the training data sufficiently
Based on Model Used to Predict
Non Linear Machine Learning Algorithims
Classification & Regression Trees
Naive Bayes using probability for classification
K-Nearest Neighbors
Learning Vector Quantization and Extension of KNN
Support Vector Machines
Ensemble Machine Learning Algorithims
Bagging & Random Forests
Boosting Ensemble & AdaBoost
Linear Machine Learning Algorithims
Linear regression for predicting real values
Simple Linear Regression
Ordinary Least Squares
Gradient Descent
Regularized Linear Regression
Tips to Prepare Data for Linear Regression
Logistic regression for classification with categories
Linear discriminant analysis with more than two categories
Gradient descent Optimization procedure:
used to find the values of parameters of a function that minimizes a cost function. gradient descent is best used when parameters cannot be calculated analytically and must be searched for by an optimization algorithm
Gradient Descent
: Goal is to minimize a given function in this case loss function;
Compute Slope (gradient) derivative of the function at a current point
Move in opposite direction of slope increase from the current point by the computed amount
Eg:
Linear & Logistic regressions
Batch Gradient Descent
: We take the average of the gradients of all the training examples and then use that mean gradient to update our parameters. So that’s just one step of gradient descent in one epoch
Cost decreases smoothly but we need to calculate slope for all data which could me compute intense
Stochastic Gradient Descent
: in Stochastic Gradient Descent (SGD), we consider just one example at a time to take a single step. We do the following steps in one epoch for SGD: 1. Calculate Slope o one example and move in opposite direction
SGD can be used for larger datasets. It converges faster when the dataset is large
we cannot implement the vectorized implementation on it.
Mini Batch Gradient Descent
: Neither we use all the dataset all at once nor we use the single example at a time. We use a batch of a fixed number of training examples which is less than the actual dataset and call it a mini-batch.
Tips for Gradient descent
Plot Cost vs Time:
Plot Cost values for each iteration and if cost doesn't decrease, try reducing your learning rate.
Plot Mean Cost
and
SGD
usually takes 1-10 passes through the training data set
Learning Rate:
eg: 0.1, 0.001, 0.0001
Rescale Inputs:
The algorithm will reach the minimum cost faster if the shape of the cost function is not skewed and distorted. You can achieve this by rescaling data between 0 and 1
Math
Eg:
=1+1
=SUM(A7:C7)
=COUNT(A7:C7)
=LOG(2,2)
=2^2
=SQRT(4)
=EXP(2)
=LN()
=PI()
Statistical:
=AVERAGE()
=MODE
=STDDEV
=PEARSON()
=RAND()
=NORMINV(RAND(),10,1) --(Gaussian Random Number)
=IF()