Please enable JavaScript.
Coggle requires JavaScript to display documents.
Machine Learning (Learning Styles (Supervised learning (SVM faster…
Machine Learning
Learning Styles
Supervised learning
no kernel
Cost function
\( C*\sum_{i=1}^m{y^{(i)}cost_1(\theta^Tx^{(i)}) + (1-y^{(i)})cost_0(\theta^Tx^{(i)})} + \frac{1}{2}\sum_{j=1}^n{\theta_j^2} \)
kernel
-
similarity functions
-
Polynomial
\( k(x,l) = {(x^Tl + a)}^b \)
-
-
-
cost function
\( C*\sum_{i=1}^m{y^{(i)}cost_1(\theta^Tf^{(i)}) + (1-y^{(i)})cost_0(\theta^Tf^{(i)})} + \frac{1}{2}\sum_{j=1}^m{\theta_j^2} \)
practical
-
-
-
usage
-
No. feature small, No. example intermediate
-
No. features small, No. example large
-
-
Regression Algorithms
-
overfitting
Regularization
-
-
-
optimization
gradient descent:
\( \theta_0 := \theta_0 - \alpha \frac{1}{m} X_0^T(h_\theta -y) \)
\( \theta_j := \theta_j - \alpha[ \frac{1}{m} X_j^T(h_\theta-y) + \frac{\lambda}{m}\theta_j \quad j \in \{ 1,2 \dots n\} \)
linear regression
normal equation: \( \theta = (X^TX + \lambda L)^{-1}X^Ty \)
\(L = \begin{pmatrix}
0 & 0 & \dots & 0 \\
0 & 1 & \dots & 0 \\
\vdots & \vdots & \dots & \vdots \\
0 & 0 & \dots & 1 \\
\end{pmatrix} \)
-
Unsupervised learning
Clustering
-
algorithm
K means
-
procedure
randomly initialize centroids \( \mu_1, \mu_2 .. \)
clusters assignment
\( \mu_k =\) average mean points belong to cluster k loop back
-
-
-
anormaly Detection
-
-
-
evaluation
-
metrics
true pos, false pos, true neg, false neg
-
-
-
-
-
-
Model
-
learning Algorithm create
parameters \( \theta_0, \theta_1 \dots \) minimize
cost function \( J(\theta_0, \theta_1 \dots )\)
-
-
-
Large scale
-
-
online learning
-
algorithm
repeated forever
get \( (x,y) \) from the user
update \( \theta \) using \( (x,y) \)
Application
Recommender Systems
-
collaborative filtering
algorithm
initialize \( x,\theta \) to small values
-
simultaneously optimize \( \theta, x \)
cost function
\( J = \sum_{(i,j): r(i,j)=1}{((\theta^{(j)})^Tx^{(i)} - y^{(i,j)})^2 } + \frac{\lambda}{2} \sum_{i=1}^{n_m} \sum_{k=1}^{n}{(x_k^{(i)})^2} + \frac{\lambda}{2} \sum_{j=1}^{n_u} \sum_{k=1}^{n}{(\theta_k^{(j)})^2} \)
-
-
-
Practice
Diagnostic
-
High bias (underfitting)
-
-
High Variance (overfitting)
-
-
tool
learning curve
\( J_{train}(N_{training\_examples}) , J_{CV}(N_{training\_examples}) \)
-
-
-