Suppose we have many training sets of size generated from the some unknown distribution. From each we estimate a separate .
Bias of an Estimator :
- Definition : Bias of is
- An estimator is unbiased if for all .
Variance of an Estimator :
- Definition : Variance of is (if is a scalar variable) or (if is a vector variable).
- Unlike bias, the variance does not directly depend on the true parameter .
Bias-Variance decomposition of the Mean Squared Error
We look at the expected square error (expectation over the distribution that generated the training sets) from the true parameter value .
Add & Subtract to complete the square :
Rearrange to conclude :
MLE and MAP estimators
The Maximum Likelihood Estimator for is :
The MLE is a sample mean of Bernoulli trials : The Maximum a Posteriori (MAP) Estimator for is :
The MAP is the maximiser of the posterior distribution of when the prior distribution is . Note : Bias-variance is a frequentist concept. While MAP(and Bayesian mean) derive from Bayesian framework, all estimators can be analysed in the frequentist framework.
Bias of the Estimator :
Bias is defined as :
Bias of MLE :
Thus, the MLE is unbiased. Bias of MAP :
Thus, the MAP estimator is biased towards the prior mean, especially for small , but becomes unbiased as grows large.
Variance of the Estimators :
Variance is given by:
Variance of MLE :
Variance if MAP Estimator :
This can be made smaller than the by an informative prior ( away from 1). Notice the trade-off with . For large , this becomes approx. . Similar to the MLE, but with a denominator that includes prior information ().
Implications of Bias-Variance Analysis
- We want estimators that have low bias and low variance - but both are not achievable simultaneously with a finite sample, and there are trade-offs.
- Bias-variance properties of estimators can guide the choice of estimator to use. MLE has low bias if is sufficiently large, but it has high variance.
- High bias, low variance estimators (e.g., regularization methods - also interpretable as MAP estimators) improve stability and generalization.
- Bayesian estimators incorporate prior information, introducing bias but reducing variance. The Bayesian posterior mean is often biased but achieves lower MSE than frequentist estimators.
- The best choice depends on sample size, prior knowledge, and application needs.