Govur University Logo
--> --> --> -->
...

What specific limitation of standard positional encodings in Transformers makes them unable to naturally capture the relative distance between tokens without explicit geometric mapping?



Standard positional encodings in Transformers, such as the fixed sinusoidal functions used in the original architecture, suffer from the limitation of being absolute rather than relative. These encodings assign a unique, static vector to every position index in a sequence, effectively labeling tokens as being at position one, position two, and so on. Because these vectors are added directly to token embeddings before the self-attention mechanism, the model lear....

Log in to view the answer



Redundant Elements