attention

An Introduction to Transformers - Summary

Input to the Transformer The input to the transformer is a sequence

X^{(0)} \in R^{D \times N}

where

N

is the length of the sequence and

D

is the dimensionality of each item in the sequence, which are known as tokens and denoted as

x_{n}^{(0)} \in R^{D \times 1}

X = [x_{0}^{(0)}, \dots, x_{N}^{(0)}]

The items in the sequence are representations of objects of interest. For instance, in language tasks, a token is usually a unique vector representation of a word, whereas for an image it would be a vector representation of a patch.

Paper Summary: An Introduction To Transformers - Turner (2023)