Machine Learning
Types of ML systems
Supervised VS Unsupervised
Unsupervised
Reinforcement Learning
Supervised
LFDA
Neural Networks
PCA is used for dimensionality and noise reduction
K-means for feature learning by clustering the data
Auto-encoder for feature learning
Optimization
In machine learning, the aim is usually to find optimal parameters θ∗ of a function or model fθ that minimize a cost function J(θ).
Generalisation
generalization refers to a model’s capability to perform well on unseen data.
Model Performance Measures
Hyper-parameters
Hyperparameters tuning
Semisupervised
Definitions
Applications
image classification
natural language understanding
(NLU)- natural language processing (NLP),
semantic segmentation
text classification
SVM
Batch and Online Leaning
Instance-Based VS Model-Based Learning
Challenges
Bad data
Bad learning algorithms
Overfitting/Underfitting
Machine Learning project checklist
Prepare the Data
Download/load/fetch the data
Look at the data structure
Create a Test set
explore the training set
Prepare the data for ML Algorithms
Data cleaning
missing data/featuures
Handling text and categorical attributes
Custom Transformers
Feature Scaling
Accuracy
Accuracy using Cross-Validation
Select and Train Models :
Sampling data into training/test sets
RMSE
one RMSE measure/score
Multiple RMSE measures/scores==Cross-Validation
Cross-Validation
Fine-Tune your selected Model
Grid Search
Randomized Search
Ensemble Method
Relative importance of attributes
Evaluate on the Test set
Stratified Sampling
Confusion Matrix
Other metrics (precision,Recall, F1 score, etc)
Precision Recall trade-off
The ROC curve
AUC
Trianing models
Linear Regression model
Training
Closed-form equation
iterative optimization
Gradient descent
Batch GD (Full GD)
Mini-Batch GD
Stochastic GD (random Instance GD/iteration)
Polynomial Regression
Logistic Regression
Softmax Regression
Classification
Learning rate
Learning Curve(s)
Regularisation
Cost function
Early stopping
click to edit
SVM
linear SVM
Hard/Soft margin classification
Non linear SVM classification
ex: Polynomila features added to original data to handle non linear dataset while using linear SVM.
polynomial kernal trik
Similarity Features ex:GRBF
Gaussian RBF Kernel (trick)
Out-of-core learning
linearSVC (for classification)
linearSVR (for regression)
Non linear SVM regression
outlier detection
Kernelized SVMs
Decision Trees
Model Interpretation (White box vs black box)
Ensemble learning and Random Forest
Voting classifiers
Majority voting
Hard voting
Soft Voting
Bagging and Pasting
Aggregation
Bagging
Pasting
Out-of-Bag Evaluation
Random patches /Random sibspace
Random Forest Classifier/regressor
Extra-Trees
Feature Importance
Boosting
AdaBoost
Gradient Boosting
Stacking
Random sampling techniques
Data Splitting
Data Augmentation