Govur University Logo
--> --> --> -->
...

A network predicts if an image shows one of 10 different animals. Which activation function should be used for the very last layer to get probabilities for each animal?



The activation function that should be used for the very last layer to get probabilities for each animal is the Softmax function. This is because the problem describes a multi-class classification scenario, where an image belongs to one and only one of 10 distinct animal categories, and the goal is to obtain a probability distribution across these mutually exclusive classes. The Softmax function takes a vector of arbitrary real numbers as input, which are typically the raw outputs from the neural network's final layer before activation, known as logits. For this network, there would be 10 logits, one for each animal class. Softmax then transforms these logits into a vector of probabilities. Each individual output value produced by Softmax will be a probability between 0 and 1, inclusive, and crucially, the sum of all 10 output probabilities will always equal 1. For example, if the network's logits for three animals were [2.0, 1.0, 0.5] (representing a cat, dog, and bird respectively), the Softmax function would convert these into a probability distribution such as [0.665, 0.244, 0.090] (for cat, dog, bird), where 0.665 + 0.244 + 0.090 approximately equals 1. This ensures that the model's output is an easily interpretable set of probabilities, indicating its confidence for each of the 10 animals, while correctly representing that the image is classified as one specific animal.