Govur University Logo
--> --> --> -->
...

Describe a learning rate scheduling technique commonly used in Transformer training, and explain its benefits.



A learning rate scheduling technique commonly used in Transformer training is the inverse square root schedule, also known as the "Noam learning rate schedule" or the "warmup and decay" schedule. This schedule dynamically adjusts the learning rate during training, starting with a warm-up phase where the learning rate gradually increases, followed by a decay phase where the learning rate decreases proportionally to the inverse square root of the training step. The learning rate at each step is calculated as: lr = d_model^(-0.5) min(step_num^(-0.5), step_num warmup_steps^(-1.5)), ....

Log in to view the answer



Redundant Elements