Please enable JavaScript.
Coggle requires JavaScript to display documents.
Supervised Learning, Model Performance Metrics, Cross Validation…
Supervised Learning
Regression (Continuous)
Linear Regression (predicting house prices)
Simple Linear Regression
models relationship between 1 independent and 1 dependent variable
y= β0 + β1x
(β0=intercept, β1=slope)
Multi Linear Regression
models relationship between 2+ independent variables and 1 dependent variable
y = β0 + β1x1 + β2x2 .. + Bnxn
Polynomial Linear Regression
relationship between independent variable and dependent variable is modeled as an nth degree polynomial
KNN (heart disease prediction)*
Decision Trees(rule based decision making)*
Random Forest (credit scoring)*
Support Vector Machines [SVM] (social media monitoring)*
Non-Linear Regression
Classification (Categorical)
KNN
Decision Trees
Support Vector Machines [SVM]
Logistic Regression (customer purchase prediction)
Random Forest
Naïve Bayes (email spam detection)
Model Performance Metrics
Mean Squared Error
Avg of the square of the errors (actual vs predicted)
Root Mean Squared Error
R² (R-Squared)
R² = 0.8 means independent variables can explain 80% variability in independent variable
High train R², Low test R² = Overfitting
Low train R², Low test R² = Underfitting
Mean Absolute Error (MAE)
Avg of the absolute errors between actual and predicted values
Cross Validation Techniques
Holdout Method
Leave One Out Cross Validation (LOOCV)
Stratified K-Fold CV
K-Fold CV
Regularization in Regression
Lasso (L1) Regression
Least Absolute Shrinkage & Selection Operator
Adds a penalty (λ) to the absolute value of the magnitude of coefficients
Penalty: Adds the sum of the absolute values of the coefficients to the loss function
Effect: Can set some coefficients to zero, effectively performing feature selection
Useful to reduce the number of features
hyper parameters: λ and max_iterations
Ridge (L2) Regression
Penalty: Adds the sum of the squared coefficients to the loss function
Effect: Shrinks the coefficients but does not set them to zero
Useful when the dataset contains many features that are not highly co-related
hyper parameters: λ and max_iterations
Elastic Net
Penalty: Combines both L1 and L2 regularization
Effect: Balances between Ridge and Lasso, allowing for both coefficient shrinkage and feature selection
Useful when the dataset contains many features and some are highly correlated
hyper parameters: λ & L1_ratio (mix of L1 & L2)
Hyperparameter Tuning
GridSearch CV
* commonly associated with classification