Image Representation

Images are matrices of discrete integer values or approximate real values.

  • Grayscale 8-bit: Every pixel is 1 byte, allowing for possible values.
  • Grayscale 16-bit: Used in scientific imaging; 2 bytes per pixel ( values).
  • Grayscale 32-bit Floating Point: 4 bytes per pixel; offers better precision for small values but lower precision for large values.
  • RGB Colour: Based on trichromatic vision (S, M, and L cones in the eye). 8-bit RGB images use three stacked channels, resulting in (16,777,216) possible colour values.

MLP Scaling Problem

MLPs do not scale well for image processing.

  • Input Size: A 1000x1000 pixel image results in 1,000,000 inputs.
  • Weight Explosion: In a standard MLP, every input is connected to every neuron in the next layer. For a large image, this requires millions of weights per layer, which is computationally unsustainable.
  • Solution: CNNs use weight sharing, where neurons share weights depending on their relative position