Support Vector Machines
These are exam preparation notes, subpar in quality and certainly not of divine quality.
See the index with all articles in this series
If you are stuck, read Wikipedia in parallel.
The goal of SVMs is to divide two groups with a line that separates the data points as clearly as possible.
There are two cases:
- Data points can be cleanly split into their classes
- At least some data points overlap making it impossible to lay a line between datapoints.
Clean split
With an SVM you want to make a binary classification of points and predict class membership.
From ZackWeinberg Wikipedia CC BY_SA 3.0
The label is assigned by calculating on which side of a hyperplane a point lies.
\[y = sign(\underline w^T \underline x + b)\]Where \(\underline w\) and \(b\) are your set of parameters for a linear connectionist neuron.
Our goal is to find a line that cuts as “clean” as possible throught the two groups. Finding lines that separate the two groups
\[\underline w^T \underline x + b = 0\]is ambiguous as can be seen in the graphic above (H2 and H3). In this graphic H3 is the optimal solution. How do we maximize the distance to the two point clouds?
\[\underset{\alpha = 1,...,p}{min} | \underline w ^T \underline x^{(\alpha)}+b| \overset{!}{=}1\]What we thereby do is maximize the distance of the split line to the closest points. This is very prone to outliers, as the closest values determine the parameters of the line.
Optimization problems
-
Primal
\[d_w = \frac {1}{||\underline w || } \overset {!}{=} max\] -
Dual: Provides lower bound to the solution of the primal problem
\[TODO\]
Unclean split - C-SVM
When we cannot put a straight line through the data points, we cannot only optimize for the above, but need to add a regularization term. (Kind of adding prior knowledge)
\[\frac{1}{2}||\underline w || ^2 + \frac {C}{p}\sum^p_{\alpha=1}\varphi_\alpha \overset{!}{=}min\]with
- C regularization parameter
- \(\varphi\) (squared???) error
Kernels
Kernels are often used to identify more complex shapes. Typical kernel functions are:
-
polynomial of degree d
-
RBF kernel with range \(\sigma\)
-
neural network (excluded from exam)
-
plummer kernel (excluded from exam)
Multiclass classification
As SVMs were designed as binary classifiers, they cannot be directly used for multiclass classification.
What is usually done is combining multiple binary classifiers via a couple of perceptrons.