Recurrent Memory Transformer

Stateful Memory-Augmented Transformers for Efficient Dialogue Modeling

September 15, 2022

89% Match

Qingyang Wu, Zhou Yu

Computation and Language

Transformer encoder-decoder models have achieved great performance in dialogue generation tasks, however, their inability to process long dialogue history often leads to truncation of the context To address this problem, we propose a novel memory-augmented transformer that is compatible with existing pre-trained encoder-decoder models and enables efficient preservation of the dialogue history information. By incorporating a separate memory module alongside the pre-trained tra...

Find SimilarView on arXiv

Extended Mind Transformers

June 4, 2024

89% Match

Phoebe Klett, Thomas Ahle

Machine Learning

Computation and Language

Pre-trained language models demonstrate general intelligence and common sense, but long inputs quickly become a bottleneck for memorizing information at inference time. We resurface a simple method, Memorizing Transformers (Wu et al., 2022), that gives the model access to a bank of pre-computed memories. We show that it is possible to fix many of the shortcomings of the original method, such as the need for fine-tuning, by critically assessing how positional encodings should ...

Find SimilarView on arXiv

Enhancing Length Extrapolation in Sequential Models with Pointer-Augmented Neural Memory

April 18, 2024

89% Match

Hung Le, Dung Nguyen, Kien Do, ... , Tran Truyen

Machine Learning

Computation and Language

We propose Pointer-Augmented Neural Memory (PANM) to help neural networks understand and apply symbol processing to new, longer sequences of data. PANM integrates an external neural memory that uses novel physical addresses and pointer manipulation techniques to mimic human and computer symbol processing abilities. PANM facilitates pointer assignment, dereference, and arithmetic by explicitly using physical pointers to access memory content. Remarkably, it can learn to perfor...

Find SimilarView on arXiv

Addressing Some Limitations of Transformers with Feedback Memory

February 21, 2020

89% Match

Angela Fan, Thibaut Lavril, Edouard Grave, ... , Sukhbaatar Sainbayar

Machine Learning

Computation and Language

Machine Learning

Transformers have been successfully applied to sequential, auto-regressive tasks despite being feedforward networks. Unlike recurrent neural networks, Transformers use attention to capture temporal relations while processing input tokens in parallel. While this parallelization makes them computationally efficient, it restricts the model from fully exploiting the sequential nature of the input. The representation at a given layer can only access representations from lower laye...

Find SimilarView on arXiv

R$^3$Mem: Bridging Memory Retention and Retrieval via Reversible Compression

February 21, 2025

89% Match

Xiaoqiang Wang, Suyuchen Wang, ... , Liu Bang

Computation and Language

Artificial Intelligence

Memory plays a key role in enhancing LLMs' performance when deployed to real-world applications. Existing solutions face trade-offs: explicit memory designs based on external storage require complex management and incur storage overhead, while implicit memory designs that store information via parameters struggle with reliable retrieval. In this paper, we propose R$^3$Mem, a memory network that optimizes both information Retention and Retrieval through Reversible context comp...

Find SimilarView on arXiv

Learn To Remember: Transformer with Recurrent Memory for Document-Level Machine Translation

May 3, 2022

89% Match

Yukun Feng, Feng Li, Ziang Song, ... , Koehn Philipp

Artificial Intelligence

The Transformer architecture has led to significant gains in machine translation. However, most studies focus on only sentence-level translation without considering the context dependency within documents, leading to the inadequacy of document-level coherence. Some recent research tried to mitigate this issue by introducing an additional context encoder or translating with multiple sentences or even the entire document. Such methods may lose the information on the target side...

Find SimilarView on arXiv

Transformer with Memory Replay

May 19, 2022

89% Match

Rui Liu, Barzan Mozafari

Machine Learning

Transformers achieve state-of-the-art performance for natural language processing tasks by pre-training on large-scale text corpora. They are extremely compute-intensive and have very high sample complexity. Memory replay is a mechanism that remembers and reuses past examples by saving to and replaying from a memory buffer. It has been successfully used in reinforcement learning and GANs due to better sample efficiency. In this paper, we propose \emph{Transformer with Memory ...

Find SimilarView on arXiv

Large Memory Layers with Product Keys

July 10, 2019

89% Match

Guillaume Lample, Alexandre Sablayrolles, Marc'Aurelio Ranzato, ... , Jégou Hervé

Computation and Language

Machine Learning

This paper introduces a structured memory which can be easily integrated into a neural network. The memory is very large by design and significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on product keys, which enable fast and exact nearest neighbor search. The ability to increase the number of parameters while keeping the same computational budget lets the over...

Find SimilarView on arXiv

Augmenting Language Models with Long-Term Memory

June 12, 2023

89% Match

Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, ... , Wei Furu

Computation and Language

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history. We design a novel decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memo...

Find SimilarView on arXiv

Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model

May 24, 2023

89% Match

Yinghan Long, Sayeed Shafayet Chowdhury, Kaushik Roy

Computation and Language

Artificial Intelligence

Machine Learning

Transformers have shown dominant performance across a range of domains including language and vision. However, their computational cost grows quadratically with the sequence length, making their usage prohibitive for resource-constrained applications. To counter this, our approach is to divide the whole sequence into segments and apply attention to the individual segments. We propose a segmented recurrent transformer (SRformer) that combines segmented (local) attention with r...

Find SimilarView on arXiv