Regression problem setup

  • Let be a training dataset sampled from some distribution.
  • Let denote an input and the corresponding output.
  • Assume the true relationship is given by

where is a random noise with and variance .

  • A learning algorithm produces an estimator .

The quantity we want to analyse

We want to analyse the performance of a model at a fixed test point . The total expected squared error at is :

This expectation is taken with respect to :

  • The unknown true distribution of the training set : Randomly drawn from the true distribution .
  • The noise : Variability in for a fixed .

Bias

The bias is defined as the difference between the true function and the expected model prediction over all possible training set :

where :

  • : The true function that generates the data.
  • : The model’s prediction at given training set .
  • : The expected prediction over different training sets.

Variance

The variance measures how much the model’s predictions vary for different training set .

This captures the sensitivity of the model to different training sets.

Noise

The irreducible noise, , is the variance in that is independent of .

This is the inherent randomness in the data, which cannot be reduced by any model.

Decomposing the Error while Ignoring the Noise

Start with :

Add/Subtract the mean :

Expand the square :

Eliminate the Cross-term :

Including the Noise Term

  • Recall that the observed output is : has variance .
  • Therefore, the overall expected squared error is :
  • This gives us the full decomposition