Standard “Hard Margin” SVMs can suffer from “overfitting” because they attempt to classify every single training point perfectly. This often leads to fitting the noise in the data rather than the underlying pattern, which degrades performance on unseen data.
By allowing a Soft Margin, the model can ignore certain outliers or noisy points, resulting in a simpler decision boundary that typically generalises better
Slack Variables
To implement a soft margin, we introduce a slack variable for every training example where . These variables measure the “error” or “displacement” of a point relative to its ideal position.
Intuition of Slack Values
The value of tells us exactly where a point lies in relation to the decision boundary and the margin.
- : The point is correctly classified and lies either on or outside the margin.
- : The point correctly classified but falls within the margin area.
- : The point sits exactly on the decision boundary.
- : The point is misclassified (on the wrong side of the decision boundary).
Tip
In exam problems, if a point is “correctly classified but inside the margin”, its slack variable must be between and .
The Primal Optimisation Problem
The goal is to find a balance between a large margin and small classification errors.
The Objective Function
We minimise the following:
Subject to
- (Modified margin constraint)
- for all .
The Role of Hyperparameter
acts as a “penalty” for errors.
- Large : Penalise slack heavily, forcing the model to behave like a Hard Margin SVM with a narrower margin and fewer errors.
- Small : More tolerant of slack, allowing a wider margin even if it means more training points are misclassified or fall within the margin.
The Dual Formulation
To solve the optimisation efficiently (especially with kernels), we convert the primal problems into a dual representation using Lagrange Multipliers
The Dual Objective
Subject to (Box Constraints):
- for all
Important
The primary difference between Hard Margin and Soft Margin duals is the addition of the upper boundary on the Lagrange multipliers
Predictions and Support Vectors
Once the model is trained, we make predictions for a new point using:
Identifying Support Vectors
In a Soft Margin SVM, Support Vectors are all example where .
- If , the point lies exactly on the margin .
- If , the point is a non-margin support vector and lies inside the margin or is misclassified
Calculating the Bias
The bias term is calculated by averaging the results from support vectors that lie exactly on the margin :
Step-by-Step: From Primal to Dual
- Define Constraints: Express constraints as . For SVM, these are and .
- Lagrange Relaxation: Create the Lagrangian function by adding constraints multiplied by Lagrange multipliers and .
- KKT Stationarity: Take derivatives of with respect to the primal variables () and set them to zero.
- Substitution: Substitute these back into the Lagrangian to eliminate primal variables, resulting in the dual function