ID: 2207.06881

Recurrent Memory Transformer

July 14, 2022

View on ArXiv

Similar papers 3

Document-level Neural Machine Translation with Associated Memory Network

October 31, 2019

89% Match
Shu Jiang, Rui Wang, Zuchao Li, Masao Utiyama, Kehai Chen, Eiichiro Sumita, ... , Lu Bao-liang
Computation and Language

Standard neural machine translation (NMT) is on the assumption that the document-level context is independent. Most existing document-level NMT approaches are satisfied with a smattering sense of global document-level information, while this work focuses on exploiting detailed document-level context in terms of a memory network. The capacity of the memory network that detecting the most relevant part of the current sentence from memory renders a natural solution to model the ...

Find SimilarView on arXiv

Logarithmic Memory Networks (LMNs): Efficient Long-Range Sequence Modeling for Resource-Constrained Environments

January 14, 2025

89% Match
Mohamed A. Taha
Artificial Intelligence
Machine Learning

Long-range sequence modeling is a crucial aspect of natural language processing and time series analysis. However, traditional models like Recurrent Neural Networks (RNNs) and Transformers suffer from computational and memory inefficiencies, especially when dealing with long sequences. This paper introduces Logarithmic Memory Networks (LMNs), a novel architecture that leverages a hierarchical logarithmic tree structure to efficiently store and retrieve past information. LMNs ...

Find SimilarView on arXiv

MeMo: Towards Language Models with Associative Memory Mechanisms

February 18, 2025

89% Match
Fabio Massimo Zanzotto, Elena Sofia Ruzzetti, Giancarlo A. Xompero, Leonardo Ranaldi, Davide Venditti, Federico Ranaldi, Cristina Giannone, ... , Romagnoli Raniero
Computation and Language
Artificial Intelligence

Memorization is a fundamental ability of Transformer-based Large Language Models, achieved through learning. In this paper, we propose a paradigm shift by designing an architecture to memorize text directly, bearing in mind the principle that memorization precedes learning. We introduce MeMo, a novel architecture for language modeling that explicitly memorizes sequences of tokens in layered associative memories. By design, MeMo offers transparency and the possibility of model...

Find SimilarView on arXiv

Recurrent Memory Decision Transformer

June 15, 2023

89% Match
Arkadii Bessonov, Alexey Staroverov, Huzhenyu Zhang, Alexey K. Kovalev, ... , Panov Aleksandr I.
Machine Learning
Artificial Intelligence

Originally developed for natural language problems, transformer models have recently been widely used in offline reinforcement learning tasks. This is because the agent's history can be represented as a sequence, and the whole task can be reduced to the sequence modeling task. However, the quadratic complexity of the transformer operation limits the potential increase in context. Therefore, different versions of the memory mechanism are used to work with long sequences in a n...

Find SimilarView on arXiv

MemLong: Memory-Augmented Retrieval for Long Text Modeling

August 30, 2024

89% Match
Weijie Liu, Zecheng Tang, Juntao Li, ... , Zhang Min
Computation and Language
Artificial Intelligence

Recent advancements in Large Language Models (LLMs) have yielded remarkable success across diverse fields. However, handling long contexts remains a significant challenge for LLMs due to the quadratic time and space complexity of attention mechanisms and the growing memory consumption of the key-value cache during generation. This work introduces MemLong: Memory-Augmented Retrieval for Long Text Generation, a method designed to enhance the capabilities of long-context languag...

Find SimilarView on arXiv

Attention is All You Need Until You Need Retention

January 15, 2025

89% Match
M. Murat Yaslioglu
Machine Learning
Artificial Intelligence

This work introduces a novel Retention Layer mechanism for Transformer based architectures, addressing their inherent lack of intrinsic retention capabilities. Unlike human cognition, which can encode and dynamically recall symbolic templates, Generative Pretrained Transformers rely solely on fixed pretrained weights and ephemeral context windows, limiting their adaptability. The proposed Retention Layer incorporates a persistent memory module capable of real time data popula...

Find SimilarView on arXiv

UniMem: Towards a Unified View of Long-Context Large Language Models

February 5, 2024

89% Match
Junjie Fang, Likai Tang, Hongzhe Bi, Yujia Qin, Si Sun, Zhenyu Li, Haolun Li, Yongjian Li, Xin Cong, Yukun Yan, Xiaodong Shi, Sen Song, Yankai Lin, ... , Sun Maosong
Computation and Language
Artificial Intelligence

Long-context processing is a critical ability that constrains the applicability of large language models. Although there exist various methods devoted to enhancing the long-context processing ability of large language models (LLMs), they are developed in an isolated manner and lack systematic analysis and integration of their strengths, hindering further developments. In this paper, we introduce UniMem, a unified framework that reformulates existing long-context methods from ...

Find SimilarView on arXiv

Memoria: Hebbian Memory Architecture for Human-Like Sequential Processing

October 4, 2023

89% Match
Sangjun Park, JinYeong Bak
Machine Learning
Artificial Intelligence
Neural and Evolutionary Comp...

Transformers have demonstrated their success in various domains and tasks. However, Transformers struggle with long input sequences due to their limited capacity. While one solution is to increase input length, endlessly stretching the length is unrealistic. Furthermore, humans selectively remember and use only relevant information from inputs, unlike Transformers which process all raw data from start to end. We introduce Memoria, a general memory network that applies Hebbian...

Find SimilarView on arXiv

M+: Extending MemoryLLM with Scalable Long-Term Memory

February 1, 2025

89% Match
Yu Wang, Dmitry Krotov, Yuanzhe Hu, Yifan Gao, Wangchunshu Zhou, Julian McAuley, Dan Gutfreund, ... , He Zexue
Computation and Language

Equipping large language models (LLMs) with latent-space memory has attracted increasing attention as they can extend the context window of existing language models. However, retaining information from the distant past remains a challenge. For example, MemoryLLM (Wang et al., 2024a), as a representative work with latent-space memory, compresses past information into hidden states across all layers, forming a memory pool of 1B parameters. While effective for sequence lengths u...

Find SimilarView on arXiv

Memorizing Transformers

March 16, 2022

89% Match
Yuhuai Wu, Markus N. Rabe, ... , Szegedy Christian
Machine Learning
Artificial Intelligence
Computation and Language

Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus acquiring new knowledge immediately. In this work, we extend language models with the ability to memorize the internal representations of past inputs. We demonstrate that an approximate kNN lookup into a non-differentiable memory of recent (key,...

Find SimilarView on arXiv