Statistical Learning Theory

Focus: Deviation between target function and actual function realized by network.

Regressive model

Properties

Mean value of the expectational error epsilon, given any realization x, is zero

Expectational error E is uncorrelated with the regression function f(X)

"mathematical" description of a stochastic
environment

Neural network to approximate the model

Terminology

B(w): bias of the average value of the approximating function

Inability of the neural network defined by the function
F(x, w) to accurately approximate the regression function f(x)

approximation error

V(w): variance of the approximating function F(x, w)

inadequacy of
the information contained in the training sample T about the regression function f(x)

estimation error

Good overall performance

B(w) & V(w) of the approximating function F(x, w) = F(x, T) would both have to be small

Bias variance dilemma

Supervised Learning components

Environment

Teacher

Learning machine (algorithm)

Empirical Risk Minimization

does not depend on the unknown distribution function

can be minimized with respect to the weight vector w in theory

Convergence

VC Dimension

measure of the capacity or expressive power of the family of classification functions realized by the learning machine