Interpolation in Positional Encodings and Using YaRN for Larger Context Window
This post is divided into three parts; they are: • Interpolation and Extrapolation in Sinusoidal Encodings and RoPE • Interpolation in Learned Encodings • YaRN for Larger Context Window Sinusoidal encodings excel at extrapolation due to their use of continuous functions: $$ \begin{aligned} PE(p, 2i) &= \sin\left(\frac{p}{10000^{2i/d}}\right) \\ PE(p, 2i+1) &= \cos\left(\frac{p}{10000^{2i/d}}\right) \end{aligned} $$ You can simply substitute $p$ with a larger value to obtain the positional encoding for a longer sequence.
