38 Regularisation

Regularisation is the first line of defence against overfitting, i.e. reducing the variance.
It amounts to adding a penalty term to the cost function, e.g. the squared L2 norm of the model’s weights.
In likelihood based learning, the cost function is $- lo g P (D ∣ θ)$ , where $D$ is the observed data, and $θ$ are the model’s parameters or weights.
The added regularisation term $λ \cdot ∥ θ ∥^{2}$ then may be interpreted as a negative log of a Gaussian prior in the Bayesian sense.
Working backwards from the modified cost function

J (θ) = - lo g P (D ∣ θ) + λ \cdot ∥ θ ∥^{2}

back to the probabilistic model (by taking - and exp), this amounts to finding $θ$ that maximises : $P (D ∣ θ) p (θ) \propto p (θ ∣ D)$

Hence, in likelihood based learning, regularisation is equivalent to MAP estimation of the parameters.
Hence, it can reduce variance at the expense of a bias.

Ayush Acharjya's Notes