Formally, let ) be a random sample from a distribution with parameter . Suppose we have observed the values of as . A maximum likelihood estimate (MLE) of , denoted as , is a value of that maximises the likelihood function:

Maximum Likelihood Estimator:

A Maximum Likelihood Estimator of the parameter , denoted as , is a random variable

whose value is given by when

Cost Function for computing the MLE

Cost Function : A function that maps a set of events into a number representing the “cost” of that event occurring, also called the loss function or objective function. For computing the MLE, there is one-to-one mapping between the likelihood function and the cost function: Given some data .

Why Negative Logarithm ?

  • Convention - Software for minimization problems.
  • Convenience - Logarithms simplify multiplication into addition. This makes differentiation easier.
  • Numerical stability - Product of small probabilities can converge to zero, causing computational issues due to machine precision limits.