Mechanics of Convolutional

A convolutional layer slides a kernel (a mix of weights) over an image to compute dot product, which are then followed by a bias and an activation function.

  • Kernel Effect: Depending on its values, a kernel can perform identity, edge detection, sharpening or blurring (Box, blur, Gaussian blur).
  • Geometric Interpretation: A dot product with a kernel gives the signed distance to a hyperplane defined by .
  • Spatial Dimensions: Convolution typically makes the output matrix smaller than the input unless padding is used.

Padding, Stride and Output Size

  • Stride: The step size the filter takes as it slides across the image. A stride of reduces the output size significantly.
  • Padding: Zero-padding (β€œSame” convolution) adds zeros around the border to keep the output size the same as input.
  • Magic Formula: To calculate the output width () for an input , kernel size , padding , and stride :

Pooling and Model Properties

Pooling Layers

Pooling downsamples activation maps to make representations smaller and more manageable. It operates over each channel independently.

  • Max Pooling: Takes the highest value from the area covered by the kernel (common setting: ).
  • Average Pooling: Calculates the average value from the area.
  • Output Size: . Note: the depth () remains equal to the input depth ().

Equivariance vs Invariance

  • Equivariance: . Convolutions are shift-equivariant, meaning they detect patterns, regardless of their location.
  • Invariance: . Pooling and fully connected layers help achieve shift-invariance, allowing the network to generalise (e.g., a cat is still a cat regardless of position).
  • Rotation: Standard CNN are not invariant or equivalent with respect to orientation/rotation.