Mutual Information measures the information that two random variables and share.
Intuition : It quantifies how much knowing one variable reduces uncertainty about the other. For two discrete random variables and , the mutual, information is defined as :
Mutual Information as KL Divergence
The mutual information can be expressed as the Kullback-Leibler (KL) divergence between the joint distribution and the product of marginal distributions :
Interpretation : Mutual information measures the divergence (βerrorβ) of modelling the joint probability distribution as the product of marginals .
Special Case : When and are independent.
Relationship/identities
These equations reveal how mutual information quantifies the reduction in uncertainty about one random variable given knowledge of another.
Properties of Mutual Information
Key Properties :
- Non-negative :
- Symmetric :
- Measures statistical dependence :
- if and only if and are independent.
- increases with the dependence between and and with their individual entropies and .
- In practice, we use Mutual Information (MI) to decide which features of the data to use for classification. The features that have highest MI with the class label are the informative featues.