Govur University Logo
--> --> --> -->
...

What is the purpose of future (causal) masking in the decoder during training?



The purpose of future masking, also known as causal masking, in the decoder during training is to prevent the decoder from "cheating" by attending to future tokens in the output sequence when predicting the current token. During training, the decoder receives the entire target sequence as input, but it should only use the tokens that precede the current token to predict the current token. This mimics the autoregressive nature of sequence generation, where the model gener....

Log in to view the answer



Redundant Elements