Learned

This is the simplest method and, to my knowledge, was introduced in Convolutional Sequence to Sequence Learning. This is basically identical to embedding, except that here it is not the token index that “plucks out” a column of our matrix, but rather the “context index”, i.e. the index within the context. Suppose x are our tokens, then the standard embedding is a matrix WE which has shape (nembedding,nvocab) i.e. it contains a column for each vocabulary element and each of these columns have nembedding elements. Positional embedding, or learned positional encoding, similarly initializes a matrix WP which now has shape (nembedding,context-length), meaning that we pluck out a column for each token within the context. Then, as usual we simply sum these two: if we and wp are the columns of these two matrices corresponding to the same token, then we will work with wp+we. This is also what is done in Andrej Karpathy’s Makemore series videos.

TL;DR We assign simply add a vector to the embedding depending on the position within the context. These vectors are parameters that are learned during training.

Previous
Next