Positional Encoding

The attention mechanism is order-invariant. Positional Encodings (PE) are used to inject positional information of the tokens (within the context) into the Transformer architecture. Typically, positional encoding are vectors of the same dimension as the embedding vectors, and they get added to them.

PE can be absolute or relative, thus encoding either the absolute position of a token within a context or the position of a token relative to others. PE can be deterministic or learned, thus either being a user-specified function or be initialized as random vectors and then learned during training.

	Absolute	Relative
Deterministic	Sinusoidal PE
Learned	Convolutional Sequence to Sequence Learning	Self-Attention with Relative Position Representations and Music Transformer

Additionally, CoPE uses context-dependent relative PE.

Last updated on Jul 6, 2024

Edit this page