16 Convolutional Neural Network

Mechanics of Convolutional

A convolutional layer slides a kernel (a mix of weights) over an image to compute dot product, which are then followed by a bias and an activation function.

Kernel Effect: Depending on its values, a kernel can perform identity, edge detection, sharpening or blurring (Box, blur, Gaussian blur).
Geometric Interpretation: A dot product with a kernel $w$ gives the signed distance to a hyperplane defined by $w$ .
Spatial Dimensions: Convolution typically makes the output matrix smaller than the input unless padding is used.

Padding, Stride and Output Size

Stride: The step size the filter takes as it slides across the image. A stride of $2$ reduces the output size significantly.
Padding: Zero-padding (“Same” convolution) adds zeros around the border to keep the output size the same as input.
Magic Formula: To calculate the output width ( $W_{2}$ ) for an input $W_{1}$ , kernel size $F$ , padding $P$ , and stride $S$ :

W_{2} = \frac{W _{1} - F + 2 P}{S} + 1

Pooling and Model Properties

Pooling Layers

Pooling downsamples activation maps to make representations smaller and more manageable. It operates over each channel independently.

Max Pooling: Takes the highest value from the area covered by the kernel (common setting: $F = 2, S = 2$ ).
Average Pooling: Calculates the average value from the area.
Output Size: $W_{2} = \frac{W _{1} - F}{S} + 1$ . Note: the depth ( $D_{2}$ ) remains equal to the input depth ( $D_{1}$ ).

Equivariance vs Invariance

Equivariance: $f (S (x)) = S (f (x))$ . Convolutions are shift-equivariant, meaning they detect patterns, regardless of their location.
Invariance: $f (S (x)) = f (x)$ . Pooling and fully connected layers help achieve shift-invariance, allowing the network to generalise (e.g., a cat is still a cat regardless of position).
Rotation: Standard CNN are not invariant or equivalent with respect to orientation/rotation.

Ayush Acharjya's Notes

Explorer

16 Convolutional Neural Network

Mechanics of Convolutional

Padding, Stride and Output Size

Pooling and Model Properties

Pooling Layers

Equivariance vs Invariance

Graph View

Table of Contents

Backlinks