15 Digital Image Representation and MLP Limitations

Image Representation

Images are matrices of discrete integer values or approximate real values.

Grayscale 8-bit: Every pixel is 1 byte, allowing for $2^{8} = 256$ possible values.
Grayscale 16-bit: Used in scientific imaging; 2 bytes per pixel ( $2^{16} = 65, 536$ values).
Grayscale 32-bit Floating Point: 4 bytes per pixel; offers better precision for small values but lower precision for large values.
RGB Colour: Based on trichromatic vision (S, M, and L cones in the eye). 8-bit RGB images use three stacked channels, resulting in $2^{24}$ (16,777,216) possible colour values.

MLPs do not scale well for image processing.

Input Size: A 1000x1000 pixel image results in 1,000,000 inputs.
Weight Explosion: In a standard MLP, every input is connected to every neuron in the next layer. For a large image, this requires millions of weights per layer, which is computationally unsustainable.
Solution: CNNs use weight sharing, where neurons share weights depending on their relative position