Formally, let ) be a random sample from a distribution with parameter . Suppose we have observed the values of as . A maximum likelihood estimate (MLE) of , denoted as , is a value of that maximises the likelihood function:
Maximum Likelihood Estimator:
A Maximum Likelihood Estimator of the parameter , denoted as , is a random variable
whose value is given by when
Cost Function for computing the MLE
Cost Function : A function that maps a set of events into a number representing the “cost” of that event occurring, also called the loss function or objective function. For computing the MLE, there is one-to-one mapping between the likelihood function and the cost function: Given some data .
Why Negative Logarithm ?
- Convention - Software for minimization problems.
- Convenience - Logarithms simplify multiplication into addition. This makes differentiation easier.
- Numerical stability - Product of small probabilities can converge to zero, causing computational issues due to machine precision limits.