The sigmoid activation function is most susceptible to the vanishing gradient problem. The sigmoid function outputs values between 0 and 1. During backpropagation, the gradients are multiplied together as they are passed backward through the layers of the neural network. The derivative of the sigmoid function has a maximum value of 0.25. When the input to a sigmoid function is very large or v....
Log in to view the answer