Statistical Learning Theory
Focus: Deviation between target function and actual function realized by network.
Regressive model
Properties
Mean value of the expectational error epsilon, given any realization x, is zero
Expectational error E is uncorrelated with the regression function f(X)
"mathematical" description of a stochastic
environment
Neural network to approximate the model
Terminology
B(w): bias of the average value of the approximating function
Inability of the neural network defined by the function
F(x, w) to accurately approximate the regression function f(x)
approximation error
V(w): variance of the approximating function F(x, w)
inadequacy of
the information contained in the training sample T about the regression function f(x)
estimation error
Good overall performance
B(w) & V(w) of the approximating function F(x, w) = F(x, T) would both have to be small
Bias variance dilemma
Supervised Learning components
Environment
Teacher
Learning machine (algorithm)
Empirical Risk Minimization
does not depend on the unknown distribution function
can be minimized with respect to the weight vector w in theory
Convergence
VC Dimension
measure of the capacity or expressive power of the family of classification functions realized by the learning machine