The Need for Nonlinear Transformations

Standard linear models, like basic Logistic Regression, assume that a straight line (a linear decision boundary where ) can separate classes.
In many real-world scenarios, classes are nonlinearly separable. Forcing a straight line on such data leads to underfitting, where the model is too simple to capture the underlying structure. By transforming the data, we can use the power of linear algorithms to solve nonlinear problems.

Feature Mapping & Basis Functions

To solve nonlinear problems, we apply a transformation function to the input vector . This creates a new feature space.

Polynomial Basis Expansion

A common transformation is to create polynomials of degree . This expansion includes all possible terms up to that degree based on the original variables.

  • Example: Degree 2 Expansion (1 Input Variable)
    • Original:
    • Transformed:
  • Example: Degree 2 Expansion (2 Input Variables)
    • Original:
    • Transformed:

Tip

Key Insight: A linear decision boundary in the high-dimensional feature space translates back to a nonlinear decision boundary (like a parabola or circle) in the original input spac.

Non-Polynomial Transformations

Transformations are not limited to polynomials. You can use any nonlinear function that helps separate the data, such as exponential functions:

Implementation in Logistic Regression

When using transformation, the core logic of Logistic Regression remains the same, but every instance of is replaced by .

Mathematical Adaptation:

  • Transformed Logit:
  • Probability Model:

Optimisation Formulas

To find the optimal weights , we minimise the error function using the transformed training set

  • Gradient:
  • Hessian:

The Linearity Paradox

Is a model with nonlinear transformation still a β€œlinear model” ?
Yes. In machine learning the β€œlinearity” of a model typically refers to its parameters and not the input variables .

  • The model is nonlinear in the input space (it draws curves).
  • The model is linear in the feature space (it draws a straight line in the embedding).
  • The model is linear in its parameters (weights are not squared or used in exponents).

Advantages and Risks

Advantages

  • Efficiency: We can continue using well-understood and fast linear learning algorithms.
  • Robustness: Linear models often have strong generalisation properties.

Caveats & Common Mistakes

  • Overfitting: As the polynomial degree increases, the model gains more β€œpower” and might start fitting to noise in the training data rather than the actual trend.
  • Dimensionality Explosion: The number of features in can grow very quickly as you increase the number of input variables and the degree of expansion, making calculations slower.

Step-by-Step Adoption Process

  1. Define : Choose the basis functions (e.g., quadratic expansion).
  2. Data Transformation: Apply to all training samples to create a new dataset of pairs.
  3. Train Linear Model: Use standard algorithms (like Gradient Descent) to find weights for the transformed data.
  4. Predict: To classify a new point, first transform it to , then apply the linear model .