• Confidence bounds are frequentist guarantees - including both Test Set Bound, and Train Set Bounds.
  • However, we will take and adapt a Bayesian idea,
  • Before seeing the training set , we define a “prior” probability distribution over the classifiers, . This represents our “bet” on the candidate classifiers.
  • Suppose a simple learning problem where we have a countable number of classifiers for the learning algorithm to choose from based on a training sample .

Theorem

For all “priors” over a countable set of classifiers, for all distributions , for all :

Compare with the Test Set Bound : . Corollary For all , all , all :

Tightness depends on the self-information of the classifier returned by the learning algorithm, with respect to our prior bets.

Occam’s Razor Bound : Proof

Idea : without training apply the Test Set Bound to each in turn with confidence parameter :

Negate to get equivalent statement :

Apply union bound : repeatedly.