Please enable JavaScript.
Coggle requires JavaScript to display documents.
Deep Learning - Coggle Diagram
Deep Learning
Activation Function
Sigmoid (0,1)
Derivative (0,0.25)
Saturation :check:
Continuous :check:
Vanishing Gradient :check:
Tanh (-1,-1)
Derivative (0,1)
Vanishing Gradient :check:
Saturation :check:
Continuous :check:
Softsign (0,∞)
Derivative (0,1]
Continuous :check:
Vanishing Gradient :check:
Saturation :check:
ReLU ([0,∞)
Derivative {0,1}
Dead Neurons :check:
Softplus (0,∞)
Derivative (0,1)
Vanishing Gradient :check:
Saturation :check:
Continuous :check:
Softmax (0,1)
Multi-class classification (used at output layer)
Regularization
L1 Regularization
Laplace Distribution rule
L2 Regularization
Gaussian Distribution rule
Dataset Expansion
Rotate or scale images in object recognition
Random noise in speech recognition
Replacing words with synonyms in NLP
Noise injection using label smoothing in Softmax classification
Dropout
Early Stopping
Types of Neural Network
Convolutional Neural Network (CNN)
Convolutional Layer
Pooling Layer
Fully Connected Layer
Long Short-term Memory Network (LSTMs)
Gated Recurrent Unit (GRU)
Recurrent Neural Network (RNN
Type of RNN
One to one
One to many
Many to one
Many to many
Usages
Video-image frame sequence
Audio-clips sequence
Sentences-words sequence
Generative Adversarial Network (GAN)
Component
Generator G
Discriminator D
Usages
Image generation
Text generation
Speech enhancement
Image super resolution
Common Problem
Data Imbalance
Solution
Random Undersampling
Deleting redundant samples in a category
Random Oversampling
Copying samples
Synthetic Minority Oversampling Technique
Sampling
Merging samples
Vanishing Gradient
Solution
ReLu activation function and LSTM used as alleviation
Exploding Gradient
Solution
Gradient clipping used as alleviation
Overfitting
Root cause
Many feature dimensions
Many model assumption
Many parameters
Too much noise but very few training data
Solution
Data augmentation
Regularization
Early Stopping
Dropout
Training Rules
Gradient Descent
Stochastic Gradient Descent (SGD)
Mini-Batch Gradient Descent (MBGD)
Batch Gradient Descent (BGD)
Lost Function
Quadratic Cost
Mean Square Error (regression)
Cross Entropy Error Function (classification)
Backpropagation Algorithm
Signal (forward propagation)
Error (backpropagation)
Optimizer
Common Optimizers
GD Optimizer
Momentum Optimizer
Nesterov AdaGrad
AdaDelta
RMSProp
Adam
AdaMax
Nadam
Purposes of Optimization
Accelerating algorithm converge
Preventing/Jumping out of local extreme values
Simplifying manual parameter setting, especially learning rate