Please enable JavaScript.
Coggle requires JavaScript to display documents.
DL 2019 Exam (Math symbols (f = y = predicted label, h = activation after…
DL 2019 Exam
Math symbols
-
-
z = activation before activation function, often z = Wh + b
W = weight matrix that multiplies activations, h, of the previous layer, W*h
-
-
2. Optimization
Regression
Tune parameters by minimizing
the mean-squared error (MSE)
Classification
Softmax
makes sure all class labels, f_j, can be predicted and that only the given class labels can result from this function
cross entropy loss
any negative log-likelihood loss is a cross entropy between the empirical/data distribution and the model distribution
-
-
-
3. Regularization
-
-
Regularization methods
-
Limit model capacity
Reduce network size
RoT: To keep the risk of overtting low, the number of examples
should be ten times larger than the number of parameters.
-
-
-
Increase amount of data
-
Injecting noise
Similar to data augmentation, learns resilience against such noise
-
-
-
-
-
1. Introduction
feature engineering
-
Problems
“Every time I fire a linguist, the performance of the speech recognizer goes up"
-
-
History of deep learning
-
-
-
-
Hopfield network (1982)
Recurrent neural network
Minimize energy function to fit data into predetermined patterns
Boltzmann machines, by Hinton and Sejnowski (1986)
-
-
Backpropagation on MLP
-
Werbos (1982) and later popularized by Rumelhart, Hinton & Williams (1986) for training deep NN
Convolutional networks
-
-
Waibel (1989) time-delay NN to audio, a moving window
Recurrent NN
sequential data processing, but difficult to train with backpropagation
-
-
-
Schidhuber's group (2010) demo deep NNs don't need pre-training, need only more time to train
-
-
-
-
-
-