Image Representation
Images are matrices of discrete integer values or approximate real values.
- Grayscale 8-bit: Every pixel is 1 byte, allowing for possible values.
- Grayscale 16-bit: Used in scientific imaging; 2 bytes per pixel ( values).
- Grayscale 32-bit Floating Point: 4 bytes per pixel; offers better precision for small values but lower precision for large values.
- RGB Colour: Based on trichromatic vision (S, M, and L cones in the eye). 8-bit RGB images use three stacked channels, resulting in (16,777,216) possible colour values.
MLP Scaling Problem
MLPs do not scale well for image processing.
- Input Size: A 1000x1000 pixel image results in 1,000,000 inputs.
- Weight Explosion: In a standard MLP, every input is connected to every neuron in the next layer. For a large image, this requires millions of weights per layer, which is computationally unsustainable.
- Solution: CNNs use weight sharing, where neurons share weights depending on their relative position