Supervised learning involves finding a final hypothesis that approximates an unknown target function .
Core Data Structures
- Input Space (): A dimensional space containing features. Inputs can be Numeric (age), Ordinal (low/medium/high), or Categorical (car brands).
- Output Space (): The target values (e.g., house prices for regression or categories for classification).
- Training Set (): A collection of input-output pairs: .
The Design Matrix
To process data efficiently, all input vectors from the training set are often organised into a Design Matrix.
- Each row represents one training example .
- Each column represents a specific independent variable (feature).
- A “bias” column of 1s is often added as the first column to account for the intercept term .
Logistic Regression: Hypothesis Set
Despite its name, Logistic Regression is for classification, specifically binary classification .
From Linear Scores to Probabilities
- The Score: We calculate a linear combination of inputs .
- The Problem: is unbounded ( to ), but the probabilities must be
- The Solution (Logit): We model the logit (log-odds) as the linear combination:
- The Activation (Sigmoid): Solving for gives us the Sigmoid function which Squashes the score into a probability:
Decision Boundary
- The model predicts Class 1 is (which occurs when ).
- The model predicts Class 0 if (which occurs when ).
- Decision Boundary: The hyperplane is defined by .
Tip
Distance and Confidence: The larger the absolute value of the score , the further the point is from the decision boundary, and the higher the model’s confidence in its pediction.