Please enable JavaScript.
Coggle requires JavaScript to display documents.
Support Vector Machine, SVM algorithm, Refs - Coggle Diagram
Support Vector Machine
Maximal Margin Classifier
Margin
=Shortest Distance between observations and threshold
When the threshold is
halfway
between N set of observations, the margin is as large as it could be where N=number of classes to classify
What
Use the threshold that gives the
largest margin
for classification
Sensitive to
OUTLIERS
Support Vector Classifier
Soft Margin
= Allowing misclassifications, distance between observations and threshold
How
Use CV to determine
how many misclassifications and observations
to allow inside Soft Margin to get the best classification
Support Vectors
= observations on the edge and within Soft Margin
SVM
Kernels
Radial Basis Function (RBF)
(a-b)^2= squared distance between 2 observations; The
amount of influence one observation has on another is a function of squared distance
~ Polynomial
Adding PK terms with r=0 & increasing d until d=infinity =>
Dot Product with coordinates for an infinite number of dimensions
Calculation
set γ=½ => Taylor Series Expansion to the term e^(ab)
Value we get at the end is the relationship between 2 points in infinite-dimensions
γ=scales the influence of squared distance=CV
~ Weighted NN :explode:
Polynomial
(a x b + r)^d
where a,b=2 different observations in dataset; r=coefficient of polynomial; d=degree of the polynomial
r,d = determined by CV
E.g. r=½ , d=2; (a x b + ½)^2 = a^2 b^2+ab+¼ = (a,a^2,½) . (b,b^2,½) = K
K = one of
2-D relationships to solve for SVC
though data is not actually transformed into 2-D
Dot product gives
high dimensional coordinates
for the data
(a,a^2,½) . (b,b^2,½) =>
1st terms (a,b) =
x-axis
coordinates;
2nd terms (a^2,b^2) =
y-axis
coordinates;
3rd terms (½,½) = z-axis coordinates, ignore if both constants;
d=1: Polynomial kernel computes relationships btwn each pair of observations in 1-D => which are used to find SVC
d=2: 2nd dimension based on x-value^2
d=3: 3rd dimension based on x-value^3
Kernel Trick
Kernel functions only calculate the relationships between every pair of observations as if they are in higher dimensions; they don't actually do the transformation
:arrow_down: Computation by avoiding the math that transforms the data from low to high dimensions
Algorithm
Move the OG data into higher dimension
Find Support Vector that separates data in high dimension into C groups where C= num of classes
SVM algorithm
Derivation
Optimization problem
Objective: Maximize Margin (=distance to
nearest points of different classes)
Linear Separation
Non-linear Separation
Transformation to the higher dimensional space=> Classes are now linearly separable in this higher dimensional feature space
Kernel Trick
Refs
SVM
https://towardsdatascience.com/the-kernel-trick-c98cdbcaeb3f
https://vitalflux.com/machine-learning-svm-kernel-trick-example/
https://ankitnitjsr13.medium.com/math-behind-support-vector-machine-svm-5e7376d0ee4d
https://www.youtube.com/watch?v=YuyeOErjrOM