attention

An Introduction to Transformers - Summary

Input to the Transformer The input to the transformer is a sequence \(X^{(0)}\in\mathbb{R}^{D\times N}\) where \(N\) is the length of the sequence and \(D\) is the dimensionality of each item in the sequence, which are known as tokens and denoted as \(\mathbf{x}_n^{(0)}\in\mathbb{R}^{D\times 1}\). \[ X = \left[\mathbf{x}_0^{(0)}, \ldots, \mathbf{x}_N^{(0)}\right] \] The items in the sequence are representations of objects of interest. For instance, in language tasks, a token is usually a unique vector representation of a word, whereas for an image it would be a vector representation of a patch.

Paper Summary: An Introduction To Transformers - Turner (2023)

Input to the Transformer The input to the transformer is a sequence \(X^{(0)}\in\mathbb{R}^{D\times N}\) where \(N\) is the length of the sequence and \(D\) is the dimensionality of each item in the sequence, which are known as tokens and denoted as \(\mathbf{x}_n^{(0)}\in\mathbb{R}^{D\times 1}\). \[ X = \left[\mathbf{x}_0^{(0)}, \ldots, \mathbf{x}_N^{(0)}\right] \] The items in the sequence are representations of objects of interest. For instance, in language tasks, a token is usually a unique vector representation of a word, whereas for an image it would be a vector representation of a patch.