33 Occam's Razor Bound

Confidence bounds are frequentist guarantees - including both Test Set Bound, and Train Set Bounds.
However, we will take and adapt a Bayesian idea,
Before seeing the training set $S$ , we define a “prior” probability distribution over the classifiers, $P (c)$ . This represents our “bet” on the candidate classifiers.
Suppose a simple learning problem where we have a countable number of classifiers for the learning algorithm to choose from based on a training sample $S$ .

Theorem

For all “priors” $P (c)$ over a countable set of classifiers, for all distributions $D$ , for all $δ \in (0, 1]$ :

S \sim D^{n} Pr (\forall c : c_{D} \leq \overline{Bin} (n, n \cdot \overset{c}{^}_{S}, δ P (c))) \geq 1 - δ

Compare with the Test Set Bound : $δ \to δ P (c)$ . Corollary For all $P (c)$ , all $D$ , all $δ \in (0, 1]$ :

S \sim D^{n} Pr c_{D} \leq \overset{c}{^}_{S} + s \frac{lo g \frac{1}{P ( c )} + lo g \frac{1}{δ}}{2 n} \geq 1 - δ

Tightness depends on the self-information $lo g \frac{1}{P ( c )}$ of the classifier returned by the learning algorithm, with respect to our prior bets.

Occam’s Razor Bound : Proof

Idea : without training apply the Test Set Bound to each $c$ in turn with confidence parameter $δ P (c)$ :

\forall c S \sim D^{m} Pr (c_{D} \leq \overline{Bin} (m, n \cdot \overset{c}{^}_{S}, δ P (c))) \geq 1 - δ P (c)

Negate to get equivalent statement :

\forall c S \sim D^{m} Pr (c_{D} > \overline{Bin} (m, n \cdot \overset{c}{^}_{S}, δ P (c))) < δ P (c)

Apply union bound : $P r (A or B) \leq P r (A) + P r (B)$ repeatedly.

S \sim D^{m} Pr (\exists c : c_{D} > \overline{Bin} (m, n \cdot \overset{c}{^}_{S}, δ P (c))) < c \sum δ P (c) = δ

Ayush Acharjya's Notes

Explorer

33 Occam's Razor Bound

Theorem

Occam’s Razor Bound : Proof

Graph View

Table of Contents

Backlinks