Supervised learning involves finding a final hypothesis that approximates an unknown target function .

Core Data Structures

  • Input Space (): A dimensional space containing features. Inputs can be Numeric (age), Ordinal (low/medium/high), or Categorical (car brands).
  • Output Space (): The target values (e.g., house prices for regression or categories for classification).
  • Training Set (): A collection of input-output pairs: .

The Design Matrix

To process data efficiently, all input vectors from the training set are often organised into a Design Matrix.

  • Each row represents one training example .
  • Each column represents a specific independent variable (feature).
  • A “bias” column of 1s is often added as the first column to account for the intercept term .

Logistic Regression: Hypothesis Set

Despite its name, Logistic Regression is for classification, specifically binary classification .
From Linear Scores to Probabilities

  1. The Score: We calculate a linear combination of inputs .
  2. The Problem: is unbounded ( to ), but the probabilities must be
  3. The Solution (Logit): We model the logit (log-odds) as the linear combination:
  1. The Activation (Sigmoid): Solving for gives us the Sigmoid function which Squashes the score into a probability:

Decision Boundary

  • The model predicts Class 1 is (which occurs when ).
  • The model predicts Class 0 if (which occurs when ).
  • Decision Boundary: The hyperplane is defined by .

Tip

Distance and Confidence: The larger the absolute value of the score , the further the point is from the decision boundary, and the higher the model’s confidence in its pediction.