Govur University Logo
--> --> --> -->
...

What are the main sources of computational bottlenecks when training large Transformer models?



The main sources of computational bottlenecks when training large Transformer models are the self-attention mechanism, the feed-forward networks, and the large vocabulary size. The self-attention mechanism has a computational complexity of O(n^2), where n is the sequence length. This means that the computational cost of self-attention grows quadratically with the sequence length, making it a major bottleneck for long sequences. Calculating the attention weights between every pair of words becomes very expensive for long documents or paragraphs. The feed-forward networks, ....

Log in to view the answer



Redundant Elements