The Framework of Learning

Machine learning aims to find a final hypothesis that approximates an unknown target function

Core Components:

  • Input : Feature vector (e.g., age, salary)
  • Labels : The output we want to predict.
  • Target Function : The ideal mapping which is always known.
  • Dataset : A limited number of samples used for training
  • Hypothesis : The set of all possible candidate functions (decision boundaries) our algorithm can choose from
  • Learning Algorithm : The process that picks the best from based on the training data

Key

Learning is only feasible if the training and testing samples are drawn from the same input distribution . If they are unrelated or drawn with different biases, learning becomes β€œhopeless”

Defining Error Measures

To determine if learning is β€œsuccessful”, we must measure how often out hypothesis disagrees with the target .

In-Sample Error ()

Also called the training error, this is the average error across the training sample:

where the bracket is an indicator function (1 if true, 0 if false)

Out-Sample Error ()

Also called testing error, this is the probability that the hypothesis will fail on a new sample drawn from :

While we cannot calculate directly (because and are known), we can infer it from

Hoeffding Inequality: The Bridge to Generalisation

The β€œBin Analogy” helps us understand this: if we draw a large enough sample of marbles from a bin, the fraction of orange marbles in our hand () is likely close to the actual fraction in the bin ().

For a Single Fixed Hypothesis

For any fixed chosen before looking at the data, the probability that the gap between and is larger than a margin is bounded by

For the Entire Hypothesis Set

In reality, we pick after looking at the data. To account for the fact that we chose the β€œbest” looking function from possibilities, we use the Union Bound.

PAC Learnability

A target function is PAC-learnable if an algorithm can find a hypothesis such that for any accuracy and confidence , there is a sample size that makes the inequality hold.

The Generalisation Bound

We can rewrite he Hoeffding inequality to provide a performance gurantee for

  • Upper Limit: Provides a performance guarantee; your real-world error won’t be worse than this.
  • Lower Limit: Represents the intrinsic limit of your dataset and model complexity.

The Complexity Trade-Off

Learning involves balancing two central questions:

  1. Can we make close to ? (Generalization)
  2. Can we make small enough? (Training)

The Role of (Model Complexity):

  • Small (Simple Models):
  • Generalisation: High. is very likely to be close to .
  • Training: Low. Too few choices may result in a high because the model can’t β€œfit” the data.
  • Large (Complex Models):
  • Generalisation: Low. The bound worsens, meaning is a poor predictor of .
  • Training: High. More choices make it easier to find a function with .

Common

Using an extremely complex model to force training error to zero. While becomes small, the generalisation gap () grows, often leading to poor real-world performance ().