The Framework of Learning
Machine learning aims to find a final hypothesis that approximates an unknown target function
Core Components:
- Input : Feature vector (e.g., age, salary)
- Labels : The output we want to predict.
- Target Function : The ideal mapping which is always known.
- Dataset : A limited number of samples used for training
- Hypothesis : The set of all possible candidate functions (decision boundaries) our algorithm can choose from
- Learning Algorithm : The process that picks the best from based on the training data
Key
Learning is only feasible if the training and testing samples are drawn from the same input distribution . If they are unrelated or drawn with different biases, learning becomes βhopelessβ
Defining Error Measures
To determine if learning is βsuccessfulβ, we must measure how often out hypothesis disagrees with the target .
In-Sample Error ()
Also called the training error, this is the average error across the training sample:
where the bracket is an indicator function (1 if true, 0 if false)
Out-Sample Error ()
Also called testing error, this is the probability that the hypothesis will fail on a new sample drawn from :
While we cannot calculate directly (because and are known), we can infer it from
Hoeffding Inequality: The Bridge to Generalisation
The βBin Analogyβ helps us understand this: if we draw a large enough sample of marbles from a bin, the fraction of orange marbles in our hand () is likely close to the actual fraction in the bin ().
For a Single Fixed Hypothesis
For any fixed chosen before looking at the data, the probability that the gap between and is larger than a margin is bounded by
For the Entire Hypothesis Set
In reality, we pick after looking at the data. To account for the fact that we chose the βbestβ looking function from possibilities, we use the Union Bound.
PAC Learnability
A target function is PAC-learnable if an algorithm can find a hypothesis such that for any accuracy and confidence , there is a sample size that makes the inequality hold.
The Generalisation Bound
We can rewrite he Hoeffding inequality to provide a performance gurantee for
- Upper Limit: Provides a performance guarantee; your real-world error wonβt be worse than this.
- Lower Limit: Represents the intrinsic limit of your dataset and model complexity.
The Complexity Trade-Off
Learning involves balancing two central questions:
- Can we make close to ? (Generalization)
- Can we make small enough? (Training)
The Role of (Model Complexity):
- Small (Simple Models):
- Generalisation: High. is very likely to be close to .
- Training: Low. Too few choices may result in a high because the model canβt βfitβ the data.
- Large (Complex Models):
- Generalisation: Low. The bound worsens, meaning is a poor predictor of .
- Training: High. More choices make it easier to find a function with .
Common
Using an extremely complex model to force training error to zero. While becomes small, the generalisation gap () grows, often leading to poor real-world performance ().