12 Learning Feasibility

The Framework of Learning

Machine learning aims to find a final hypothesis $(g)$ that approximates an unknown target function $(f)$

Core Components:

Input $(x)$ : Feature vector (e.g., age, salary)
Labels $(y)$ : The output we want to predict.
Target Function $(f)$ : The ideal mapping $f : X \to Y$ which is always known.
Dataset $(D)$ : A limited number of samples used for training
Hypothesis $(H)$ : The set of all possible candidate functions (decision boundaries) our algorithm can choose from
Learning Algorithm $(A)$ : The process that picks the best $g$ from $H$ based on the training data

Key

Learning is only feasible if the training and testing samples are drawn from the same input distribution $p (x)$ . If they are unrelated or drawn with different biases, learning becomes “hopeless”

Defining Error Measures

To determine if learning is “successful”, we must measure how often out hypothesis $h$ disagrees with the target $f$ .

In-Sample Error ( $E_{in}$ )

Also called the training error, this is the average error across the $N$ training sample:

E_{in} (h) = \frac{1}{N} n = 1 \sum N [[h (x_{n}) \neq = f (x_{n})]]

where the bracket $[[\cdot]]$ is an indicator function (1 if true, 0 if false)

Out-Sample Error ( $E_{o u t}$ )

Also called testing error, this is the probability that the hypothesis will fail on a new sample $x$ drawn from $p (x)$ :

E_{o u t} (h) = P [h (x) \neq = f (x)]

While we cannot calculate $E_{o u t}$ directly (because $f$ and $p (x)$ are known), we can infer it from $E_{in}$

Hoeffding Inequality: The Bridge to Generalisation

The “Bin Analogy” helps us understand this: if we draw a large enough sample of marbles from a bin, the fraction of orange marbles in our hand ( $ν$ ) is likely close to the actual fraction in the bin ( $μ$ ).

For a Single Fixed Hypothesis

For any fixed $h$ chosen before looking at the data, the probability that the gap between $E_{in}$ and $E_{o u t}$ is larger than a margin $ϵ$ is bounded by

P [∣ E_{in} (h) - E_{o u t} (h) ∣ > ϵ] \leq 2 exp (- 2 ϵ^{2} N)

For the Entire Hypothesis Set

In reality, we pick $g$ after looking at the data. To account for the fact that we chose the “best” looking function from $M$ possibilities, we use the Union Bound.

P [∣ E_{in} (g) - E_{o u t} (g) ∣ > ϵ] \leq 2 M exp (- 2 ϵ^{2} N)

PAC Learnability

A target function is PAC-learnable if an algorithm can find a hypothesis such that for any accuracy $(1 - ϵ)$ and confidence $(1 - δ)$ , there is a sample size $N$ that makes the inequality hold.

The Generalisation Bound

We can rewrite he Hoeffding inequality to provide a performance gurantee for $E_{o u t}$

E_{o u t} (g) \leq E_{in} (g) + \frac{1}{2 N} lo g \frac{2 M}{δ}

Upper Limit: Provides a performance guarantee; your real-world error won’t be worse than this.
Lower Limit: Represents the intrinsic limit of your dataset and model complexity.

The Complexity Trade-Off

Learning involves balancing two central questions:

Can we make $E_{o u t}$ close to $E_{in}$ ? (Generalization)
Can we make $E_{in}$ small enough? (Training)

The Role of $M$ (Model Complexity):

Small $M$ (Simple Models):
Generalisation: High. $E_{o u t}$ is very likely to be close to $E_{in}$ .
Training: Low. Too few choices may result in a high $E_{in}$ because the model can’t “fit” the data.
Large $M$ (Complex Models):
Generalisation: Low. The bound worsens, meaning $E_{in}$ is a poor predictor of $E_{o u t}$ .
Training: High. More choices make it easier to find a function with $E_{in} \approx 0$ .

Common

Using an extremely complex model to force training error to zero. While $E_{in}$ becomes small, the generalisation gap ( $ϵ$ ) grows, often leading to poor real-world performance ( $E_{o u t}$ ).

Ayush Acharjya's Notes

Explorer

12 Learning Feasibility

The Framework of Learning

Core Components:

Defining Error Measures

In-Sample Error ( $E_{in}$ )

Out-Sample Error ( $E_{o u t}$ )

Hoeffding Inequality: The Bridge to Generalisation

For a Single Fixed Hypothesis

For the Entire Hypothesis Set

PAC Learnability

The Generalisation Bound

The Complexity Trade-Off

The Role of $M$ (Model Complexity):

Graph View

Table of Contents

Backlinks

Ayush Acharjya's Notes

Explorer

12 Learning Feasibility

The Framework of Learning

Core Components:

Defining Error Measures

In-Sample Error (Ein​)

Out-Sample Error (Eout​)

Hoeffding Inequality: The Bridge to Generalisation

For a Single Fixed Hypothesis

For the Entire Hypothesis Set

PAC Learnability

The Generalisation Bound

The Complexity Trade-Off

The Role of M (Model Complexity):

Graph View

Table of Contents

Backlinks

In-Sample Error ( $E_{in}$ )

Out-Sample Error ( $E_{o u t}$ )

The Role of $M$ (Model Complexity):