08 Sequential Minimal Optimisation (SMO)

The SVM Optimisation Problem

The goal is to find the vector of Lagrange multipliers $a$ that maximises the dual objective function:

\tilde{L} (a) = n = 1 \sum N a^{(n)} - \frac{1}{2} n = 1 \sum N m = 1 \sum N a^{(n)} a^{(m)} y^{(n)} y^{(m)} k (x^{(n)}, x^{(m)})

Subject to two constraints:

Box Constraint: $0 \leq a^{(n)} \leq C$ for all $n$
Summation Constraint: $\sum_{n = 1}^{N} a^{(n)} y^{(n)} = 0$

Tip

Standard algorithms require storing $N \times N$ matrix, which is often too costly for large datasets. SMO bypasses this by solving for only two variables at a time.

Sequential Minimal Optimisation (SMO) Logic

Why update two multipliers at once ?

If we tried to update only one multiplier $a^{(i)}$ while keeping others fixed, we would violate the summation constraint $\sum a^{(n)} y^{(n)} = 0$ . Therefore, two is the smallest number of multipliers we can update simultaneously while maintaining valid constraints.

The Core Algorithm:

Initialise $a$ : Usually set all $a^{(n)} = 0$
Select a pair $a^{(i)}$ and $a^{(j)}$ to update next using heuristics.
Optimise $\tilde{L} (a)$ with respect to this pair, holding all other multipliers constant.
Repeat until the system converges to a solution satisfying the KKT condition.

Analytical Update Rules

SMO solves for the new values of $a^{(i)}$ and $a^{(j)}$ without needing a complex numerical optimiser

Calculate the Bounds ( $L$ and $H$ )

Because of the box constraint and the summation constraint, the new value of $a^{(j)}$ must stay within a specific range $[L, H]$

If $y^{(i)} \neq = y^{(j)}$ :
- $L = max (0, a^{(j)} - a^{(i)})$
- $H = min (C, C + a^{(j)} - a^{(i)})$
If $y^{(i)} = y^{(j)}$ :
- $L = max (0, a^{(j)} + a^{(i)} - C)$
- $H = min (C, a^{(j)} + a^{(i)})$

Updating $a^{(j)}$

a_{new}^{(j)} = a^{(j)} + \frac{y ^{(j)} ( E ^{(i)} - E ^{(j)} )}{η}

Where:

$E^{(n)} = h (x^{(n)}) - y^{(n)}$ (The prediction error)
$η = k (x^{(i)}, x^{(i)}) + k (x^{(j)}, x^{(j)}) - 2 k (x^{(i)}, x^{(j)})$

Clipping and updating $a^{(i)}$

To ensure the constraints are met, $a^{(j)}$ is “clipped” to the $[L, H]$ range:

If $a^{(j, n e w)} \leq H$ then $a^{(j, c l i pp e d)} = H$
If $L < a^{(j, n e w)} > H$ , then $a^{(j, c l i pp e d)} = a^{(j, n e w)}$
If $a^{(j, n e w)} \leq L$ , then $a^{(j, c l i pp e d)} = L$

Finally $a^{(i)}$ is updated using the new $a^{(j)}$ to satisfy the summation constraint:

a_{new}^{(i)} = a^{(i)} + y^{(i)} y^{(j)} (a_{old}^{(j)} - a_{clipped}^{(j)})

Selection Heuristics

To speed up convergence, SMO does not pick pairs randomly

Picking the first multiplier $a^{(i)}$

The algorithm alternated between:

Scanning the entire dataset for examples that violate the KKT conditions
Scanning only the “non-bound” examples (where $0 < a < C$ ) that violates KKT conditions

Picking the second multiplier $a^{(j)}$

Once $a^{(i)}$ is chosen, $a^{(j)}$ is selected to maximise the step size. This is approximated by picking the example that results in the largest absolute error difference $∣ E^{(i)} - E^{(j)} ∣$

Convergence and KKT Conditions

The algorithm is guaranteed to converge as long as at least one of the selected multipliers violates the KKT conditions.

The KKT conditions for Soft Margin SVM are:

$a^{(n)} = 0$ . The point is not a support vector and is correctly classified ( $y^{(n)} h (x^{(n)}) \geq 1$ )
$0 < a^{(n)} < C$ . The point is a support vector exactly on the margin ( $y^{(n)} h (x^{(n)}) = 1$ ))
$a^{(n)} = C$ : The point is a support vector that either lies between the margins or is misclassified $y^{(n)} h (x^{(n)}) \leq 1$ )

Ayush Acharjya's Notes

Explorer

08 Sequential Minimal Optimisation (SMO)

The SVM Optimisation Problem

Sequential Minimal Optimisation (SMO) Logic

Why update two multipliers at once ?

The Core Algorithm:

Analytical Update Rules

Calculate the Bounds ( $L$ and $H$ )

Updating $a^{(j)}$

Clipping and updating $a^{(i)}$

Selection Heuristics

Picking the first multiplier $a^{(i)}$

Picking the second multiplier $a^{(j)}$

Convergence and KKT Conditions

The KKT conditions for Soft Margin SVM are:

Graph View

Table of Contents

Backlinks

Ayush Acharjya's Notes

Explorer

08 Sequential Minimal Optimisation (SMO)

The SVM Optimisation Problem

Sequential Minimal Optimisation (SMO) Logic

Why update two multipliers at once ?

The Core Algorithm:

Analytical Update Rules

Calculate the Bounds (L and H)

Updating a(j)

Clipping and updating a(i)

Selection Heuristics

Picking the first multiplier a(i)

Picking the second multiplier a(j)

Convergence and KKT Conditions

The KKT conditions for Soft Margin SVM are:

Graph View

Table of Contents

Backlinks

Calculate the Bounds ( $L$ and $H$ )

Updating $a^{(j)}$

Clipping and updating $a^{(i)}$

Picking the first multiplier $a^{(i)}$

Picking the second multiplier $a^{(j)}$