Suppose we have many training sets of size generated from the some unknown distribution. From each we estimate a separate .

Bias of an Estimator :

  • Definition : Bias of is
  • An estimator is unbiased if for all .

Variance of an Estimator :

  • Definition : Variance of is (if is a scalar variable) or (if is a vector variable).
  • Unlike bias, the variance does not directly depend on the true parameter .

Bias-Variance decomposition of the Mean Squared Error

We look at the expected square error (expectation over the distribution that generated the training sets) from the true parameter value .

Add & Subtract to complete the square :

Rearrange to conclude :

MLE and MAP estimators

The Maximum Likelihood Estimator for is :

The MLE is a sample mean of Bernoulli trials : The Maximum a Posteriori (MAP) Estimator for is :

The MAP is the maximiser of the posterior distribution of when the prior distribution is . Note : Bias-variance is a frequentist concept. While MAP(and Bayesian mean) derive from Bayesian framework, all estimators can be analysed in the frequentist framework.

Bias of the Estimator :

Bias is defined as :

Bias of MLE :

Thus, the MLE is unbiased. Bias of MAP :

Thus, the MAP estimator is biased towards the prior mean, especially for small , but becomes unbiased as grows large.

Variance of the Estimators :

Variance is given by:

Variance of MLE :

Variance if MAP Estimator :

This can be made smaller than the by an informative prior ( away from 1). Notice the trade-off with . For large , this becomes approx. . Similar to the MLE, but with a denominator that includes prior information ().

Implications of Bias-Variance Analysis

  • We want estimators that have low bias and low variance - but both are not achievable simultaneously with a finite sample, and there are trade-offs.
  • Bias-variance properties of estimators can guide the choice of estimator to use. MLE has low bias if is sufficiently large, but it has high variance.
  • High bias, low variance estimators (e.g., regularization methods - also interpretable as MAP estimators) improve stability and generalization.
  • Bayesian estimators incorporate prior information, introducing bias but reducing variance. The Bayesian posterior mean is often biased but achieves lower MSE than frequentist estimators.
  • The best choice depends on sample size, prior knowledge, and application needs.