KL quantifies the distance between two probability distributions:
Let P and Q be two probability distributions of a discrete random variable X:
DKLβ(Pβ₯Q)=xβRXβββP(x)logβQ(x)P(x)ββ=EβlogβQ(X)P(X)βββ
By convention :
- 0Β logQ0β=0
- PΒ log0Pβ=β
Properties:
- DKLβ(Pβ₯Q)β₯0
- DKLβ(Pβ₯Q)=0ifP(x)=Q(x)
- NotΒ symmetric:DKLβ(Pβ₯Q)ξ =DKLβ(Qβ₯P)
Kullback-Leibler Divergence : Examples
For a binary random variable X with range RXβ={0,1}, assume two distributions P and Q are defined as :
P(0)Q(0)β=1βr,P(1)=r=1βs,Q(1)=sβ
KL Divergences :
DKLβ(Pβ₯Q)DKLβ(Qβ₯P)β=(1βr)log(1βs1βrβ)+rlog(srβ)=(1βs)log(1βr1βsβ)+slog(rsβ)β