Govur University Logo
--> --> --> -->
...

When training a CNN, why does adding a Batch Normalization layer before the activation function help reduce the internal covariate shift?



Internal covariate shift refers to the change in the distribution of layer inputs as the parameters of previous layers change during training. As a neural network learns, the weights in earlier layers are updated, which causes the output values (activations) fed into subsequent layers to shift in range and distribution. This forces later layers to constantly adapt to new input statistic....

Log in to view the answer



Redundant Elements