ID: 2008.07027

Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size

August 17, 2020

View on ArXiv

Similar papers 2