Machine Learning

Random Forest Classifier/regressor

Types of ML systems

Supervised VS Unsupervised

Unsupervised

Reinforcement Learning

Supervised

LFDA

Neural Networks

PCA is used for dimensionality and noise reduction

K-means for feature learning by clustering the data

Auto-encoder for feature learning

Optimization

In machine learning, the aim is usually to find optimal parameters θ∗ of a function or model fθ that minimize a cost function J(θ).

Generalisation

generalization refers to a model’s capability to perform well on unseen data.

Model Performance Measures

Hyper-parameters

Hyperparameters tuning

Semisupervised

Definitions

Applications

image classification

natural language understanding
(NLU)- natural language processing (NLP),

semantic segmentation

text classification

SVM

Batch and Online Leaning

Instance-Based VS Model-Based Learning

Challenges

Bad data

Bad learning algorithms

Overfitting/Underfitting

Machine Learning project checklist

Prepare the Data

Download/load/fetch the data

Look at the data structure

Create a Test set

explore the training set

Prepare the data for ML Algorithms

Data cleaning

missing data/featuures

Handling text and categorical attributes

Custom Transformers

Feature Scaling

Accuracy

Accuracy using Cross-Validation

Select and Train Models :

Sampling data into training/test sets

RMSE

one RMSE measure/score

Multiple RMSE measures/scores==Cross-Validation

Cross-Validation

Fine-Tune your selected Model

Grid Search GridSearch

Randomized Search

Ensemble Method

Relative importance of attributes

Evaluate on the Test set

Stratified Sampling

Confusion Matrix

Other metrics (precision,Recall, F1 score, etc)

Precision Recall trade-off

The ROC curve

AUC

Trianing models

Linear Regression model

Training

Closed-form equation

iterative optimization

Gradient descent

Batch GD (Full GD)

Mini-Batch GD

Stochastic GD (random Instance GD/iteration)

Polynomial Regression

Logistic Regression

Softmax Regression

Classification

Learning rate

Learning Curve(s)

Regularisation

Cost function

Early stopping

click to edit

SVM

linear SVM

Hard/Soft margin classification

Non linear SVM classification

ex: Polynomila features added to original data to handle non linear dataset while using linear SVM.

polynomial kernal trik

Similarity Features ex:GRBF

Gaussian RBF Kernel (trick)

Out-of-core learning

linearSVC (for classification)

linearSVR (for regression)

Non linear SVM regression

outlier detection

Kernelized SVMs

Decision Trees

Model Interpretation (White box vs black box)

Ensemble learning and Random Forest

Voting classifiers

Majority voting

Hard voting

Soft Voting

Bagging and Pasting

Aggregation

Bagging

Pasting

Out-of-Bag Evaluation

Random patches /Random sibspace

Random Forest Classifier/regressor

Extra-Trees

Feature Importance

Boosting

AdaBoost

Gradient Boosting

Stacking

Random sampling techniques

Data Splitting

Data Augmentation