Govur University Logo
--> --> --> -->
...

Which specific architectural design choice allows a Decoder-only model to generate text sequentially while preventing the model from 'seeing' future tokens during the training phase?



The specific architectural design choice is the causal attention mask, which is applied within the self-attention mechanism of the Transformer architecture. In a Decoder-only model, self-attention allows each word, or token, in a sequence to calculate its relationship with every other word in that same sequence. During training, the model receives....

Log in to view the answer



Redundant Elements