Multi-Layer-Perceptrons (MLPs) also knowns as Feed-Forward-Nets, functions as “soft perceptrons” where inputs are multiplied by weights, summed with a bias, and passed through an activation function
Key Activation Functions
Activation functions introduce non-linearity, allowing networks to learn complex patterns.
- Sigmoid: . Normalises values to
- Cons: Saturated neurons “kill” gradients; the exponential function is computationally expensive.
- Tanh: . Normalises numbers to .
- Cons: Like sigmoid, saturated neurons kill gradients
- ReLU (Rectified Linear Unit):
- Pros: Does not saturate in the positive region and converges roughly faster than sigmoid/tanh.
- Cons: “Dead ReLU” units can occur when neurons provide no gradients.
- Leaky ReLU:
- Pros: Does not saturate in negative or positive regions; it “will not die”.