Label smoothing improves the generalization performance of a neural machine translation model by preventing the model from becoming overconfident in its predictions during training. In standard training with cross-entropy loss, the model is encouraged to predict the correct token with a probability of 1 and all other tokens with a probability of 0. This can lead to the model becoming overly certain about its predictions, even when the training data is noisy or the model is not entirely confident. Overconf....
Log in to view the answer