ID: 2207.06881

Recurrent Memory Transformer

July 14, 2022

View on ArXiv

Similar papers 2

Addressing Some Limitations of Transformers with Feedback Memory

February 21, 2020

89% Match
Angela Fan, Thibaut Lavril, Edouard Grave, ... , Sukhbaatar Sainbayar
Machine Learning
Computation and Language
Machine Learning

Transformers have been successfully applied to sequential, auto-regressive tasks despite being feedforward networks. Unlike recurrent neural networks, Transformers use attention to capture temporal relations while processing input tokens in parallel. While this parallelization makes them computationally efficient, it restricts the model from fully exploiting the sequential nature of the input. The representation at a given layer can only access representations from lower laye...

Find SimilarView on arXiv

Learn To Remember: Transformer with Recurrent Memory for Document-Level Machine Translation

May 3, 2022

89% Match
Yukun Feng, Feng Li, Ziang Song, ... , Koehn Philipp
Artificial Intelligence

The Transformer architecture has led to significant gains in machine translation. However, most studies focus on only sentence-level translation without considering the context dependency within documents, leading to the inadequacy of document-level coherence. Some recent research tried to mitigate this issue by introducing an additional context encoder or translating with multiple sentences or even the entire document. Such methods may lose the information on the target side...

Find SimilarView on arXiv

Transformer with Memory Replay

May 19, 2022

89% Match
Rui Liu, Barzan Mozafari
Machine Learning

Transformers achieve state-of-the-art performance for natural language processing tasks by pre-training on large-scale text corpora. They are extremely compute-intensive and have very high sample complexity. Memory replay is a mechanism that remembers and reuses past examples by saving to and replaying from a memory buffer. It has been successfully used in reinforcement learning and GANs due to better sample efficiency. In this paper, we propose \emph{Transformer with Memory ...

Find SimilarView on arXiv

Large Memory Layers with Product Keys

July 10, 2019

89% Match
Guillaume Lample, Alexandre Sablayrolles, Marc'Aurelio Ranzato, ... , Jégou Hervé
Computation and Language
Machine Learning

This paper introduces a structured memory which can be easily integrated into a neural network. The memory is very large by design and significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on product keys, which enable fast and exact nearest neighbor search. The ability to increase the number of parameters while keeping the same computational budget lets the over...

Find SimilarView on arXiv

Augmenting Language Models with Long-Term Memory

June 12, 2023

89% Match
Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, ... , Wei Furu
Computation and Language

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history. We design a novel decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memo...

Find SimilarView on arXiv

Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model

May 24, 2023

89% Match
Yinghan Long, Sayeed Shafayet Chowdhury, Kaushik Roy
Computation and Language
Artificial Intelligence
Machine Learning

Transformers have shown dominant performance across a range of domains including language and vision. However, their computational cost grows quadratically with the sequence length, making their usage prohibitive for resource-constrained applications. To counter this, our approach is to divide the whole sequence into segments and apply attention to the individual segments. We propose a segmented recurrent transformer (SRformer) that combines segmented (local) attention with r...

Find SimilarView on arXiv

Document-level Neural Machine Translation with Associated Memory Network

October 31, 2019

89% Match
Shu Jiang, Rui Wang, Zuchao Li, Masao Utiyama, Kehai Chen, Eiichiro Sumita, ... , Lu Bao-liang
Computation and Language

Standard neural machine translation (NMT) is on the assumption that the document-level context is independent. Most existing document-level NMT approaches are satisfied with a smattering sense of global document-level information, while this work focuses on exploiting detailed document-level context in terms of a memory network. The capacity of the memory network that detecting the most relevant part of the current sentence from memory renders a natural solution to model the ...

Find SimilarView on arXiv

Recurrent Memory Decision Transformer

June 15, 2023

89% Match
Arkadii Bessonov, Alexey Staroverov, Huzhenyu Zhang, Alexey K. Kovalev, ... , Panov Aleksandr I.
Machine Learning
Artificial Intelligence

Originally developed for natural language problems, transformer models have recently been widely used in offline reinforcement learning tasks. This is because the agent's history can be represented as a sequence, and the whole task can be reduced to the sequence modeling task. However, the quadratic complexity of the transformer operation limits the potential increase in context. Therefore, different versions of the memory mechanism are used to work with long sequences in a n...

Find SimilarView on arXiv

UniMem: Towards a Unified View of Long-Context Large Language Models

February 5, 2024

89% Match
Junjie Fang, Likai Tang, Hongzhe Bi, Yujia Qin, Si Sun, Zhenyu Li, Haolun Li, Yongjian Li, Xin Cong, Yukun Yan, Xiaodong Shi, Sen Song, Yankai Lin, ... , Sun Maosong
Computation and Language
Artificial Intelligence

Long-context processing is a critical ability that constrains the applicability of large language models. Although there exist various methods devoted to enhancing the long-context processing ability of large language models (LLMs), they are developed in an isolated manner and lack systematic analysis and integration of their strengths, hindering further developments. In this paper, we introduce UniMem, a unified framework that reformulates existing long-context methods from ...

Find SimilarView on arXiv

Memoria: Hebbian Memory Architecture for Human-Like Sequential Processing

October 4, 2023

89% Match
Sangjun Park, JinYeong Bak
Machine Learning
Artificial Intelligence
Neural and Evolutionary Comp...

Transformers have demonstrated their success in various domains and tasks. However, Transformers struggle with long input sequences due to their limited capacity. While one solution is to increase input length, endlessly stretching the length is unrealistic. Furthermore, humans selectively remember and use only relevant information from inputs, unlike Transformers which process all raw data from start to end. We introduce Memoria, a general memory network that applies Hebbian...

Find SimilarView on arXiv