- : input space (input domain).
- : output space
- : classifier
Basic Assumption
All samples are drawn independently and identically distributed (i.i.d.) from some unknown distribution over .
- : a random i.i.d. sample set. (training sample or test sample, depending on the context).
True Error & Observed Error
A classifier is good if the following quantity is small :
-
True Error : .
Problem
This quantity is unknown because the true distribution is unknown.
Instead we have a sample from this unknown distribution. The following quantity is an empirical proxy based on the sample :
- Observed Error : where denotes indicator function.
Question
Is a deterministic quantity (a number), or a random variable ?
Answer : Depends - if is a fixed function then is deterministic (a number in ). If is the classifier trained on the sample , then is a random variable as a function of .