• Regularisation is the first line of defence against overfitting, i.e. reducing the variance.
  • It amounts to adding a penalty term to the cost function, e.g. the squared L2 norm of the model’s weights.
  • In likelihood based learning, the cost function is , where is the observed data, and are the model’s parameters or weights.
  • The added regularisation term then may be interpreted as a negative log of a Gaussian prior in the Bayesian sense.
  • Working backwards from the modified cost function

back to the probabilistic model (by taking - and exp), this amounts to finding that maximises :

  • Hence, in likelihood based learning, regularisation is equivalent to MAP estimation of the parameters.
  • Hence, it can reduce variance at the expense of a bias.