Regression problem setup
- Let be a training dataset sampled from some distribution.
- Let denote an input and the corresponding output.
- Assume the true relationship is given by
where is a random noise with and variance .
- A learning algorithm produces an estimator .
The quantity we want to analyse
We want to analyse the performance of a model at a fixed test point . The total expected squared error at is :
This expectation is taken with respect to :
- The unknown true distribution of the training set : Randomly drawn from the true distribution .
- The noise : Variability in for a fixed .
Bias
The bias is defined as the difference between the true function and the expected model prediction over all possible training set :
where :
- : The true function that generates the data.
- : The modelβs prediction at given training set .
- : The expected prediction over different training sets.
Variance
The variance measures how much the modelβs predictions vary for different training set .
This captures the sensitivity of the model to different training sets.
Noise
The irreducible noise, , is the variance in that is independent of .
This is the inherent randomness in the data, which cannot be reduced by any model.
Decomposing the Error while Ignoring the Noise
Start with :
Add/Subtract the mean :
Expand the square :
Eliminate the Cross-term :
Including the Noise Term
- Recall that the observed output is : has variance .
- Therefore, the overall expected squared error is :
- This gives us the full decomposition