Govur University Logo
--> --> --> -->
...

What is the primary architectural difference between a standard Transformer encoder and a decoder regarding the constraints placed on the self-attention mask?



The primary architectural difference is that the encoder utilizes a bidirectional self-attention mask, while the decoder utilizes a causal or masked self-attention mask. In a Transformer encoder, every token in a sequence can attend to every other token, meaning the model can look at both the words appearing before and after a s....

Log in to view the answer



Redundant Elements