Please enable JavaScript.
Coggle requires JavaScript to display documents.
Supervised learning (The networks used for supervised learning are called…
-
"The idea is that a network with smaller
weights is more robust to the effect of noise. When the weights are small, then
small changes in some of the patterns do not give a substantially different
training result. When the network has large weights, by contrast, it may
happen that small changes in the input give significant differences in the
training result that are difficult to generalise (Figure 6.9). Other regularisation
schemes are discussed in Chapter 7."
Men hjälper ju också mot satureringen av aktiveringsfunktionen
och en annan förklaring är ju att det leder till "enklare" funktioner som därför borde generalisera bättre
-
Kom ihåg, från Khan Academy (Hessematris \(\mathbf{H}_{f}\), här kallad \(\mathbb{M}\))
-
Weight decay eliminerar de minsta vikterna, men små vikter behövs ofta för att uppnå lågt training error. Därför denna metod bättre.
-
Shift the data so that the mean vanishes
Scale input data to get same variance in all directions
-