13 VC Dimension and Generalisation

Moving Beyond Finite Hypothesis Sets

The foundational problem in learning theory is balancing two questions:

Can we ensure $E_{o u t} (g)$ is close to $E_{in} (g)$ ?
Can we make $E_{in} (g)$ small enough ?

Previously, we used Hoeffding’s Inequality, which relied on $M$ (the number of hypothesis). However, for most real-world (like lines in a $2 D$ plane), $M = \infty$ , which makes the standard bound useless.

The Union Bound Problem

The factor $M$ comes from the Union Bound, which assumes the “bad events” (where $E_{in} (g)$ and $E_{o u t} (g)$ diverge) for different hypothesis are non-overlapping. In reality, similar hypotheses overlap significantly, meaning we overcount the “badness”. To fix this, we need a way to group similar hypotheses and count the effective number of choices.

Dichotomies and the Growth Function

Instead of counting every possible line in a space, we count how many different ways those lines can classify a specific set of $N$ points

Dichotomy: A “mini-hypothesis” that only consider the labels $(\pm 1)$ assigned to the training data points
While there are infinite hypotheses, there are at most $2^{N}$ probabilities dichotomies for $N$ points.

The Growth Function $(m_{H} (N))$

To remove the dependence on specific point locations, we define the growth function as the maximum number of dichotomies the hypothesis set $H$ can create for any $N$ points:

m_{H} (N) = x_{1}, \dots, x_{N} max ∣ H (x_{1}, \dots, x_{N}) ∣

Tip

If a hypothesis set can produce all $2^{N}$ possible labelings for a set of $N$ points, we say it shatters those points.

Defining VC Dimension $(d_{V C})$

The Vapnik-Chervonenkis (VC) Dimension is the mathematical bridge that allows us to handle infinite models.

Definition: $d_{V C} (H)$ is the largest $N$ for which $m_{H} (N) = 2^{N}$

It is the most points $H$ can shatter
If the Break Point is $k$ , then $d_{V C} = k - 1$

Examples of $d_{V C}$

2D Perceptron: $d_{V C} = 3$ (can shatter 3 points, but the break point is 4).
2D Rectangle: $d_{V C} = 4$ .
General Perceptron: For a $d$ -dimensional space, $d_{V C} = d + 1$ (the $+ 1$ accounts for the bias term).

The VC Generalisation Bound

The VC Dimensional replaces $M$ in the generalisation inequalities. This is the VC Inequality:

P [∣ E_{in} (g) - E_{o u t} (g) ∣ > ϵ] \leq 4 m_{H} (2 N) exp (- \frac{1}{8} ϵ^{2} N)

The square root term is the **Penalty for Model Complexity $(Ω)$

High $d_{V C}$ : $E_{in}$ will be low (the model is powerful), but $Ω$ will be high (high risk of overfitting).
Low $d_{V C}$ : $Ω$ will be low (good generalisation), but $E_{in}$ might be high (the model is too simple).

Practical Sample Complexity

How many data points $(N)$ do we actually need for a model to generalise ?

Theoretical Bound: For a given $ϵ$ and $δ$ , theory often suggests $N \approx 10, 000 \times d_{V C}$ .
Practical Rule of Thumb: In practice, $N \approx 10 \times d_{V C}$ is often sufficient to achieve good results.

Ayush Acharjya's Notes

Explorer

13 VC Dimension and Generalisation

Moving Beyond Finite Hypothesis Sets

The Union Bound Problem

Dichotomies and the Growth Function

The Growth Function $(m_{H} (N))$

Defining VC Dimension $(d_{V C})$

Definition: $d_{V C} (H)$ is the largest $N$ for which $m_{H} (N) = 2^{N}$

Examples of $d_{V C}$

The VC Generalisation Bound

Practical Sample Complexity

Graph View

Table of Contents

Backlinks

Ayush Acharjya's Notes

Explorer

13 VC Dimension and Generalisation

Moving Beyond Finite Hypothesis Sets

The Union Bound Problem

Dichotomies and the Growth Function

The Growth Function (mH​(N))

Defining VC Dimension (dVC​)

Definition: dVC​(H) is the largest N for which mH​(N)=2N

Examples of dVC​

The VC Generalisation Bound

Practical Sample Complexity

Graph View

Table of Contents

Backlinks

The Growth Function $(m_{H} (N))$

Defining VC Dimension $(d_{V C})$

Definition: $d_{V C} (H)$ is the largest $N$ for which $m_{H} (N) = 2^{N}$

Examples of $d_{V C}$