• : input space (input domain).
  • : output space
  • : classifier

Basic Assumption

All samples are drawn independently and identically distributed (i.i.d.) from some unknown distribution over .

  • : a random i.i.d. sample set. (training sample or test sample, depending on the context).

True Error & Observed Error

A classifier is good if the following quantity is small :

  • True Error : .

    Problem

    This quantity is unknown because the true distribution is unknown.

Instead we have a sample from this unknown distribution. The following quantity is an empirical proxy based on the sample :

  • Observed Error : where denotes indicator function.

Question

Is a deterministic quantity (a number), or a random variable ?

Answer : Depends - if is a fixed function then is deterministic (a number in ). If is the classifier trained on the sample , then is a random variable as a function of .