Govur University Logo
--> --> --> -->
...

How does layer normalization contribute to the stability and performance of deep Transformer networks?



Layer normalization contributes to the stability and performance of deep Transformer networks by normalizing the activations within each layer. This helps to address the problem of internal covariate shift, which is the change in the distribution of network activations as the parameters of the network change during training. Internal covariate shift can slow down training and make it difficult for the network to converge. Layer normalization normalizes the activations across the f....

Log in to view the answer



Redundant Elements