- Regularisation is the first line of defence against overfitting, i.e. reducing the variance.
- It amounts to adding a penalty term to the cost function, e.g. the squared L2 norm of the model’s weights.
- In likelihood based learning, the cost function is , where is the observed data, and are the model’s parameters or weights.
- The added regularisation term then may be interpreted as a negative log of a Gaussian prior in the Bayesian sense.
- Working backwards from the modified cost function
back to the probabilistic model (by taking - and exp), this amounts to finding that maximises :
- Hence, in likelihood based learning, regularisation is equivalent to MAP estimation of the parameters.
- Hence, it can reduce variance at the expense of a bias.