When a CNN processes an image, it uses a small sliding window to find patterns like edges. What is this sliding window called?
The small sliding window used by a Convolutional Neural Network (CNN) to find patterns in an image is called a kernel or a filter. A kernel is a small matrix of numerical weights that defines a specific pattern or feature the CNN is designed to detect. During the processing, this kernel slides across the entire input image or a previously generated feature map, moving by a specified stride, which is the step size it takes for each movement. At each position where the kernel overlaps with a portion of the input, it performs an element-wise multiplication between its weights and the corresponding pixel values within that window of the input. All the resulting products are then summed together to produce a single output value. This single value is then recorded in a new matrix called a feature map or activation map. By repeating this process across the entire input, the kernel generates a complete feature map that highlights the locations and strengths where its specific pattern is present in the input. For example, one kernel might be designed to detect horizontal edges, another for vertical edges, and others for more complex textures or shapes. The numerical weights within each kernel are not manually programmed; instead, they are learned automatically by the CNN during its training phase through a process called backpropagation. The same kernel is applied across all spatial locations of the input image or feature map. This fundamental concept, known as weight sharing, significantly reduces the total number of learnable parameters in the network, makes the CNN efficient, and allows it to detect the same pattern regardless of where it appears in the image.