Please enable JavaScript.
Coggle requires JavaScript to display documents.
Machine Learning METHODOLOGY - Coggle Diagram
Machine Learning
METHODOLOGY
Starting questions
Convergence
: much more data imply better accuracy?
Bias
: can we find the true value in expectation?
Error
,
variance
: what is the best we can do?
Estimate characterisation
: do we know some properties? e.g., the estimator’s distribution...
Identifiability
: are we sure to find the correct value of the parameter?
(likelihood approach)
Confidence Interval
: estimate of the uncertainty (Industrial applications)
To be asked to myself:
What are my data?
What do I want to do?
Which compromise do I look for (Performance / accuracy)?
Main principles
ERROR
TRAINING Error
An error close to 0 in the LEARNING phase does not necessary lead to an error close to 0 in the TEST phase.
TESTS error
PHASES
Phase 1: TRAINING
Phase 2: TESTS
Few data => no deep learning and no complex model
COMPUTATIONAL cost
Definition:
The more complex the model is, the more expensive the computational cost is.
Order of magnitude:
1) ChatGPT - 50 billions of parameters:
Learning cost = 1,5 Milliards GPU
Single inference_30 tokens cost = 2g CO2 (if 500 billions of parameter = 60gCO2) => not so high but the issue is the qty of inference
2) Standard Google request w/o IA = 0,2g CO2
Compromise performance vs accuracy:
Performance = speed / reactivity
Accuracy = Error estimate
DATASET
ORIGINAL Dataset
= TRAINING set + TEST set
TRAINING set
= TRAINING set + VALIDATION set
Cross-Validation process:
Training set => based on iterations corresponding to different hyperparameters, the ML algorithm is defined to fit the predictive model
Validation set => used to estimate the performance of the predictive model for each iteration.
the bias and variability of the validation can then be estimated.
VOCABULARY
SUPERVISED ML
Loss Function (L)
:
input = predicted value z which corresponds to a training data y
the loss function L estimates how z and y look like
L:(z,y)∈R×Y⟼L(z,y)∈R
Linear regression =>
Erreur des moindres carrés
(1/2)(y−z)2
SVM =>
Hinge Loss
max(0,1−yz)
Logistic regression =>
Loss logistic
log(1+exp(−yz))
Neuronal network =>
Cross-entropie
−[ylog(z)+(1−y)log(1−z)]
Cost Function (J):
is based on the Loss function as follows
is used to evaluate the performance of a modelutilisée pour
Algorithme du gradient
:
α∈R = learning rate
the rule to update the algorithm is expressed with J and α, as follows:
LIkelihood (vraisemblance)
:
For L(θ) de paramètre θ the likelihood is used to find θ by maximizing the likelihood.
the log(likelihood) is currently used l(θ)=log(L(θ)) as follows:
Hypothesis (h)
:
corresponds to the chosen model hθ
For an input Xi, the predicted value issued by the model is hθ(Xi).