Please enable JavaScript.
Coggle requires JavaScript to display documents.
Deep Learning - Coggle Diagram
Deep Learning
Probability and Information Theory
degree of belief
frequentist probability
Bayesian probability
Random Variables
Probability Distributions
Discrete Variables and Probability Mass Functions
Joint probability discribution
normalized
uniform distribution
Continuous Variables and Probability Density Functions(PDF)
Marginal Probability
marginal probability distribution
sum rule
Conditional Probaility
The Chain Rule of Conditional Probabilities
Chain rule
product rule
Independence and Conditional Independence
Expectation, Variance and Covariance
expectation, expected value
variance
covariance
Common Probability Distributions
Bernoulli Distribution
Multinoulli Distribution
Gaussian Distribution(normal distribution)
precision
standard normal distribution
central limit theorem
multivariate normal distribution
precision matrix
isotropic
Exponential and Laplace Distributions
exponential distribution
Laplace distribution
The Dirac Distribution and Empirical Distribution
empirical distribution
generalized function
empirical frequency
Mixtures of Distributions
mixture distribution
latent variable
Gaussiian mixture model
prior probability
posterior probability
universal approximator
Useful Properties of Common Functions
logit
negative part function
positive part function
softplus function
saturates
logistics sigmoid
Bayes' Rule
Technical details of continuous variables
almost everywhere
measure theory
measure zero
Jacobian matrix
Information Theory
self-information
nats
Shannon entropy
bits or shannons
Kullback-Leibler (KL) divergence
differential entropy
cross-entropy
Structured Probabilistic Models
structured probabilistic or graphical model
Directed
Undirected
proportional
multinomial distribution
Linear Algebra
Scalars
Vectors
Matrices
Tensors
Multiplying Matrices and Vectors
element-wise product
Hadamard product
dot product
Identity and Inverse Matrices
matrix inversion
identity matrix
Linear Dependence and Span
linear combination
singular
Norms
triangle inequality
Euclidean norn
max norm
Frobenius norm
Special Kinds of Matrices and Vectors
Diagonal matrices
unit norm
orthogonal
orthonormal
Eigendecomposition
eigenvalue
postive semidefinite
negative semidefinite
Singular Value Decomposition
singular vectors
singular values
The Moore-Penrose Pseudoinverse
The Trace Operator
The Determinant
Principal Components Analysis
Machine Learning Basics
Learning Algorithms
Task
Classification
Classification with missing inputs
Regression
Transcription
Machine translation
Struvtured output
Anomaly detection
Synthesis and sampling
Imputation of missing values
Denoising
Density estimation or probability mass function estimation
Performance Measure
error rate
test set
Experience
unsupervised
supervised
label or target
dataset
data points
reinforcement learning
design matrix
Linear Regression
parameters
weights
mean squared error
normal equations
Capacity, Overfitting and Underfitting
generalization
training error
generalization error
test error
test set
statistical learning theory
data-generating process
i.i.d. assumptions
identically distributed
data-generating distribution
underfitting
overfitting
capacity
hypothesis space
resentational capacity
effective capacity
Occam's razor
Vapnik-Chervonenkis dimension
underfitting regime
overfitting regime
optimal capacity
nearest neighbor regression
Byes error
No Free Lunch Theorem
Regularization
weight decay
regularizer
regularization
Hyperparameters and Validation Sets
capacity
validation set
Corss-Validation
Estimators, Bias and Variance
Bias
Variance and Standard Error
Bernoulli Distribution
standard error
variance
Gaussian Distribution Estimator of the Mean
Bernoulli Distribution
asymptotically unbiased
unbiased
Estimators of the Variance of a Gaussian Distribution
sample variance
unbiased sample variable
Trading Off Bias and Variance to Minimize Mean Squared Error
mean squared error (MSE)
Consistency
almost sure
Almost sure convergence
Point Estimation
statistic
Function Estimation
Maximum Likelihood Estimation
Conditional Log-Likelihood and Mean Squared Error
Likelihood
Properties of Maximum Likelihood
statistical efficiency
parametric case
Bayesian Statistics
Bayesian Linear Regression
posterior
probability distribution
Bayesian statistics
frequentist statistics
Maximum A Posteriori (MAP) Estimation
Supervised Learning Algorithms
Probabilistic Supervised Learning
logistic regression
Support Vector Machines
kernel trick
Gaussian kernel
radial basis function
template matching
kernel machines or kernel methods
Other simple Supervised Learning Algorithms
Unsupervised Learning Algorithms
Principal Components Analysis
K-means clustering
Stochastic Gradient Descent (SGD)
minibatch
Building a Machine Learning Algorithm
Challenges Motivation Deep Learning
Curse od Dimensionality
Local Constancy and Smoothness Regularization
local constancy prior
local kernels
smoothness prior
Manifold Learning
manifold
manifold hypothesis
Numerical Computation
Overflow and Underflow
underflow
overflow
softmax function
Poor Conditioning
condition number
Gradient-Based Optimization
objective function or citerion
cost function
loss function
error function
derivative
gradient descent
critical points
stationary points
local minimum
local maximum
saddle points
global minimum
partial dervatives
gradient
directional derivative
Jacobian and Hessian Matrices
Jacobian matrix
second derivative
second derivative test
curvature
Hessian matrix
first-order optimization algorithms
second-order optimization algorithms
Lipschitz continuous
Lipschitz constant
convex optization
Constrained Optimization
constrained optimization
feasible
Karush-Kuhn-Tucker (KKT)
Generalized Lagrangian or generalized Lagrange function
equality constraints and inequality constrains
Linear Least Squares
Lagrange multipliers