Multi-Layer-Perceptrons (MLPs) also knowns as Feed-Forward-Nets, functions as “soft perceptrons” where inputs are multiplied by weights, summed with a bias, and passed through an activation function

Key Activation Functions

Activation functions introduce non-linearity, allowing networks to learn complex patterns.

  • Sigmoid: . Normalises values to
    • Cons: Saturated neurons “kill” gradients; the exponential function is computationally expensive.
  • Tanh: . Normalises numbers to .
    • Cons: Like sigmoid, saturated neurons kill gradients
  • ReLU (Rectified Linear Unit):
    • Pros: Does not saturate in the positive region and converges roughly faster than sigmoid/tanh.
    • Cons: “Dead ReLU” units can occur when neurons provide no gradients.
  • Leaky ReLU:
    • Pros: Does not saturate in negative or positive regions; it “will not die”.