Making Predictions in the Dual Form

In the dual formulation of SVMs, we move away from calculating an explicit weight vector . Instead, we classify new instances based on their similarity to the training examples.

The Primal vs Dual Prediction

  • Primal Formula:*
  • Dual Formula:

Decision Rule

  • If Classify as +1.
  • If Classify as -1.

Tip

The dual form is powerful because it allows us to work with kernels , which can represent infinite-dimensional feature space that would be impossible to compute directly in the primal form.

The Role of Support Vectors

A critical property of the dual form is sparsity. We do not actually need to sum over all training examples.

KKT Condition and Sparsity

Based on the Karun-Kuhn-Tucker (KKT) conditions:

  1. Non-Support Vectors: For most points, . These points do not affect the decision boundary.
  2. Support Vectors (S): Only points where contribute to the prediction.
  3. The Margin: Support vectors satisfy the condition , meaning they lie exactly on the margin

Optimised Prediction Formula

Where is the set of indices of the support vectors

Calculating the Bias Term

The bias is not solved directly in the dual optimisation but is recovered using the support vectors.

Step-by-Step Derivation

  1. Pick any support vectors , where
  2. Since its on the margin
  3. Substitute the dual form of :
  4. Multiply both side by (since ): .
  5. Solve for : )

Important

To ensure numerical stability, it is standard practice to calculate for all support vectors and take the average:

Where is the total number of support vectors

Kernels as Similarity Functions

A kernel represents the inner product of two vectors in high-dimensional features space:

Mercer’s Condition

To be a valid kernel, a function must satisfy Mercer’s Condition:

  • Symmetry: .
  • Positive Semi-definite: The Gram matrix (where ) must satisfy for any vector .

Kernel Composition Rules

You can build complex kernels from simpler ones (). Valid operations include:

  • Addition: .
  • Scaling: (where ).
  • Exponentiation: .
  • Product: .

Common Kernel Function

Linear Kernel

  • Formula: .
  • Intuition: No transformation; equivalent to standard linear SVM.

Polynomial Kernel

  • Formula: .
  • Intuition: Maps data into a space of polynomial combinations of features.

Gaussian / Radial Basis Function (RBF) Kernel

  • Formula: .
  • Intuition: Measures closeness; the value is 1 if and drops toward 0 as they move apart.
  • Technical Note: The RBF kernel corresponds to an infinite-dimensional feature mapping , which can be shown via Taylor series expansion.

Kernels for Structured Data

Kernels allow SVMs to handle data that isn’t naturally represented as a fixed-length vector of numbers.

Sets

For two subsets :

  • Kernel: .

Strings (All-Subsequence Kernel)

  • Approach: Compares strings based on the number of shared subsequences.
  • Complexity: Direct calculation is exponential, but Dynamic Programming reduces it to .

Trees (All-Subtree Kernel)

  • Approach: Counts the number of common subtrees between two trees.
  • Efficiency: Computed in using Dynamic Programming.