08 Multi-Layer Perceptron (MLP) Architecture

The MLP, also known as a Feed-Forward Net, is described as a “soft perceptron” that utilises gradient descent to learn it parameters.

Mathematical Notation

$L$ : Total number of layers. Layer $1$ is the input layer, and the superscript $L$ denotes the output layer.
$m$ : The width(number of units) of a layer. This can vary between layers.
$w_{jk}^{l}$ : The weight of the connection from the $k$ -th unit in layer $l - 1$ to the $j$ -th unit in layer $l$ .
$b_{j}^{l}$ : The bias of the $j$ -th unit in layer $l$
$z_{j}^{l}$ : The weighted input to unit $j$ in layer $l$ : $z_{j}^{l} = \sum_{k} w_{jk}^{l} a_{k}^{l - 1} + b_{j}^{l}$
$a_{j}^{l}$ : The activation of unit $j$ in layer $l$ : $a_{l}^{j} = σ (z_{j}^{l})$ , where $σ$ is an activation function.

Parameter Calculation

For a layer of width $m$ connected to a previous layer of the same width, the total number of parameters (weights + bias) is calculated as $m^{2} + m$ .

Activation Functions

The goal is to replace the standard non-differentiable sign function with a different non-linear function to enable gradient-based learning.

Sigmoid Function: $σ (x) = \frac{1}{1 + e x p ( - x )}$
Mapping: It maps values from $(- \infty, + \infty)$ to the range $(0, 1)$
Derivative: $σ^{'} (x) = σ (x) (1 - σ (x))$

Ayush Acharjya's Notes

Explorer

08 Multi-Layer Perceptron (MLP) Architecture

Parameter Calculation

Activation Functions

Graph View

Table of Contents

Backlinks