Linear Classifier Foundation
In a supervised learning problem, we are given a set of training example drawn from an unknown distribution. our goal is to learn a function that generalises well to unseen data.
The decision boundary for a linear classifier is a hyperplane:
- 2D space: A line ().
- 3D space: A plane.
- d-Dimensional space: A hyperplane (). The classification rule is:
- If Class +1
- If Class -1
Why Maximise the Margin
While many hyperplanes can separate a dataset, SVMs seek the one with the maximum margin.
- Overfitting Avoidance: A boundary too close to training examples is sensitive to noise. Maximising the distance helps the model generalise to new points.
- Support Vectors: Not every data point is equal. Only the points closest to the boundary (the Support Vectors) actually βsupportβ or define the hyperplane. Moving other points doesnβt change the boundary as long as they stay outside the margin.
The Optimisation Problem
To find the best hyperplane, we must maximise the perpendicular distance from the hyperplane to the closest point
Perpendicular Distance Formula
The distance from a point to the hyperplane is:
where is the Euclidean norm (length) of the weight vector.
Constraints
For the classifier to be valid, all training examples must be correctly classified:
Deriving the Primal Form
- Initial Goal:
- The Rescaling Trick: Rescaling and by a constant does not change the hyperplane. We can choose a scale such that for the closest points,
- Simplified Constraint: This implies all other points satisfy
- Final Objective: Maximising is mathematically identical to minimising
IMPORTANT
The Primal SVM Optimization Task:
Subject to:
Handling Nonlinear Problems
When data is not linearly separable (e.g., one class forms a circle inside another), we apply a non linear transformation to map the input data into a higher-dimensional feature space.
- New Hypothesis: .
- Linear SVM: If we use , we have a standard linear SVM.
- Polynomial Embedding: For example, using a degree-2 polynomial can create a curved boundary (like an ellipse) in the original space to separate circular data.