02 Bayesian Estimation

Let $D$ denote data, and $θ$ the parameters of our model.

Bayesian represent uncertainty by treating $θ$ as a random variable with prior distribution. Parameters and data are often continuous-values.
let p( $θ$ ) be the prior PDF (Probability density function).
In simple terms. Once the data is observed, we want to update our belief about $θ$ given the new evidence. This updated belief is represented by the posterior distribution, denoted as $P (θ ∣ d a t a)$
Return the posterior distribution of $θ$ given data $D$ :

p (θ ∣ D) = \frac{p ( D ∣ θ ) p ( θ )}{p ( D )}

Bayesian estimation does not operate through optimisation to obtain best parameter values.
$p (D) = \int p (D ∣ θ) p (θ) d θ$ is called marginal likelihood, or evidence.
Predictive distributions for a new example:
- in supervised learning: $p (Y ∣ X, D) = \int p (Y ∣ X, θ) p (θ ∣ D) d θ$
- in unsupervised learning: $p (X ∣ D) = \int p (X ∣ θ) p (θ ∣ D) d θ$

Ayush Acharjya's Notes