Estimation
In machine learning, the core objective
is to predict parameters of the model.
The constraint is that the learned parameters
should fit the data very well. Thus, the machine
learning models can be considered as selecting
a better hypothesis by searching a huge hypothesis
space. Estimation of these hyperparameters which
govern the hypothesis is central part of the learning
algorithm. In general, setting, the probabilistic goal
of the algorithm can be considered as maximizing the
probability of the hypothesis given the data. With the
bayes theorem, it can be calculated by multiplying the
prior probability of the hypothesis and likelihood of the
data given the hypothesis, divided by the probability of
the data.
Maximum Likelihood Estimation
As mentioned earlier, the posterior probability of the hypothesis
given the data depends on the likelihood and the prior probabiltiy.
Suppose, we assume that the prior probability of the hypothesis is uniform then, the objective remains is to maximiize the likelihood. Under the assumption that the parameters are normally distrivbuted with the mean value of the prediction, tthe MLE estimate is equivalent to minimaztion of the squared error for regression problems and minimization of cross entropy in case of classification problems.
Maximum Aposterior Estimation
Instead of the uniform prior assumption as in MLE
if we assume normal distribution of the parameters the bayes rule
corresponds to the loss funtion as in MLE plus the regularization terms.