Please enable JavaScript.
Coggle requires JavaScript to display documents.
(Activation function, LSTM internal structure, Optimization,…
-
LSTM internal structure
-
-
Normalization
-
Min max
Scale to value between [0,1]
-
-
Optimization
Optimizer function
Adam
-
-
moment estimation and the second order moment estimation of the gradient to dynamically adjust the learning
-
-
GD
Solve the optimal value along the direction of the gradient descent. The method converges at a linear rate.
SGD
The update parameters are calculated using a randomly sampled mini-batch. The method converges at a sublinear rate
NAG
Accelerate the current gradient descent by accumulating the previous gradient as momentum and perform the gradient update process with momentum.
Frank-Wolfe
The method approximates the objective function with a linear function, solves the linear programming to find the feasible descending direction, and makes a one-dimensional search along the direction in the feasible domain.
SVRG
Instead of saving the gradient of each sample, the average gradient is saved at regular intervals. The gradient sum is updated at each iteration by calculating the gradients with respect to the old parameters and the current parameters for the randomly selected samples.
AdaGrad
The learning rate is adaptively adjusted according to the sum of the squares of all historical gradients.
-
-
ADMM
The method solves optimization problems with linear constraints by adding a penalty term to the objective and separating variables into sub-problems which can be solved iteratively.
SAG
The old gradient of each sample and the summation of gradients over all samples are maintained in memory. For each update, one sample is randomly selected and the gradient sum is recalculated and used as the update direction.
-
Regularization
Drop out
Higher value reduces features, may cause underfit
Low value keeps all features, over fit may happen