14 Activation Functions

Multi-Layer-Perceptrons (MLPs) also knowns as Feed-Forward-Nets, functions as “soft perceptrons” where inputs are multiplied by weights, summed with a bias, and passed through an activation function

Key Activation Functions

Activation functions introduce non-linearity, allowing networks to learn complex patterns.

Sigmoid: $f (x) = \frac{1}{1 + e ^{- x}}$ . Normalises values to $[0, 1]$
- Cons: Saturated neurons “kill” gradients; the exponential function is computationally expensive.
Tanh: $f (x) = t anh (x)$ . Normalises numbers to $[- 1, 1]$ .
- Cons: Like sigmoid, saturated neurons kill gradients
ReLU (Rectified Linear Unit): $f (x) = max (0, x)$
- Pros: Does not saturate in the positive region and converges roughly $6 x$ faster than sigmoid/tanh.
- Cons: “Dead ReLU” units can occur when neurons provide no gradients.
Leaky ReLU: $f (x) = max (0.01 x, x)$
- Pros: Does not saturate in negative or positive regions; it “will not die”.

Ayush Acharjya's Notes

Explorer

14 Activation Functions

Key Activation Functions

Graph View

Backlinks