Positional Encoding

The attention mechanism is order-invariant. Positional Encodings (PE) are used to inject positional information of the tokens (within the context) into the Transformer architecture. Typically, positional encoding are vectors of the same dimension as the embedding vectors, and they get added to them.

PE can be absolute or relative, thus encoding either the absolute position of a token within a context or the position of a token relative to others. PE can be deterministic or learned, thus either being a user-specified function or be initialized as random vectors and then learned during training.

AbsoluteRelative
DeterministicSinusoidal PE
LearnedConvolutional Sequence to Sequence LearningSelf-Attention with Relative Position Representations and Music Transformer

Additionally, CoPE uses context-dependent relative PE.

Previous
Next