Mutual Information measures the information that two random variables and share.

Intuition : It quantifies how much knowing one variable reduces uncertainty about the other. For two discrete random variables and , the mutual, information is defined as :

Mutual Information as KL Divergence

The mutual information can be expressed as the Kullback-Leibler (KL) divergence between the joint distribution and the product of marginal distributions :

Interpretation : Mutual information measures the divergence (β€˜error’) of modelling the joint probability distribution as the product of marginals .

Special Case : When and are independent.


Relationship/identities

These equations reveal how mutual information quantifies the reduction in uncertainty about one random variable given knowledge of another.

Properties of Mutual Information

Key Properties :

  • Non-negative :
  • Symmetric :
  • Measures statistical dependence :
    • if and only if and are independent.
    • increases with the dependence between and and with their individual entropies and .
  • In practice, we use Mutual Information (MI) to decide which features of the data to use for classification. The features that have highest MI with the class label are the informative featues.