Memory Transformer

Augmenting Language Models with Long-Term Memory

June 12, 2023

89% Match

Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, ... , Wei Furu

Computation and Language

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history. We design a novel decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memo...

Find SimilarView on arXiv

Extended Mind Transformers

June 4, 2024

89% Match

Phoebe Klett, Thomas Ahle

Machine Learning

Computation and Language

Pre-trained language models demonstrate general intelligence and common sense, but long inputs quickly become a bottleneck for memorizing information at inference time. We resurface a simple method, Memorizing Transformers (Wu et al., 2022), that gives the model access to a bank of pre-computed memories. We show that it is possible to fix many of the shortcomings of the original method, such as the need for fine-tuning, by critically assessing how positional encodings should ...

Find SimilarView on arXiv

Large Memory Layers with Product Keys

July 10, 2019

89% Match

Guillaume Lample, Alexandre Sablayrolles, Marc'Aurelio Ranzato, ... , Jégou Hervé

Computation and Language

Machine Learning

This paper introduces a structured memory which can be easily integrated into a neural network. The memory is very large by design and significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on product keys, which enable fast and exact nearest neighbor search. The ability to increase the number of parameters while keeping the same computational budget lets the over...

Find SimilarView on arXiv

Memorizing Transformers

March 16, 2022

89% Match

Yuhuai Wu, Markus N. Rabe, ... , Szegedy Christian

Machine Learning

Artificial Intelligence

Computation and Language

Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus acquiring new knowledge immediately. In this work, we extend language models with the ability to memorize the internal representations of past inputs. We demonstrate that an approximate kNN lookup into a non-differentiable memory of recent (key,...

Find SimilarView on arXiv

Global memory transformer for processing long documents

December 3, 2022

89% Match

Arij Al Adel

Computation and Language

Machine Learning

Transformer variants dominate the state-of-the-art in different natural language processing tasks such as translation, reading comprehension and summarization. Our paper is more directed to use general memory slots added to the inputs and studying the results of adding these slots. This paper is a go on study of general memory slots rule that were added to the input of the proposed model in previous work. We have two main tasks;1) pretraining task using masked language modeli...

Find SimilarView on arXiv

Document-level Neural Machine Translation with Associated Memory Network

October 31, 2019

89% Match

Shu Jiang, Rui Wang, Zuchao Li, Masao Utiyama, Kehai Chen, Eiichiro Sumita, ... , Lu Bao-liang

Computation and Language

Standard neural machine translation (NMT) is on the assumption that the document-level context is independent. Most existing document-level NMT approaches are satisfied with a smattering sense of global document-level information, while this work focuses on exploiting detailed document-level context in terms of a memory network. The capacity of the memory network that detecting the most relevant part of the current sentence from memory renders a natural solution to model the ...

Find SimilarView on arXiv

Memoria: Hebbian Memory Architecture for Human-Like Sequential Processing

October 4, 2023

89% Match

Sangjun Park, JinYeong Bak

Machine Learning

Artificial Intelligence

Neural and Evolutionary Comp...

Transformers have demonstrated their success in various domains and tasks. However, Transformers struggle with long input sequences due to their limited capacity. While one solution is to increase input length, endlessly stretching the length is unrealistic. Furthermore, humans selectively remember and use only relevant information from inputs, unlike Transformers which process all raw data from start to end. We introduce Memoria, a general memory network that applies Hebbian...

Find SimilarView on arXiv

HMT: Hierarchical Memory Transformer for Long Context Language Processing

May 9, 2024

89% Match

Zifan He, Zongyue Qin, Neha Prakriya, ... , Cong Jason

Computation and Language

Machine Learning

Transformer-based large language models (LLM) have been widely used in language processing applications. However, most of them restrict the context window that permits the model to attend to every token in the inputs. Previous works in recurrent models can memorize past tokens to enable unlimited context and maintain effectiveness. However, they have "flat" memory architectures, which have limitations in selecting and filtering information. Since humans are good at learning a...

Find SimilarView on arXiv

Addressing Some Limitations of Transformers with Feedback Memory

February 21, 2020

89% Match

Angela Fan, Thibaut Lavril, Edouard Grave, ... , Sukhbaatar Sainbayar

Machine Learning

Computation and Language

Machine Learning

Transformers have been successfully applied to sequential, auto-regressive tasks despite being feedforward networks. Unlike recurrent neural networks, Transformers use attention to capture temporal relations while processing input tokens in parallel. While this parallelization makes them computationally efficient, it restricts the model from fully exploiting the sequential nature of the input. The representation at a given layer can only access representations from lower laye...

Find SimilarView on arXiv

Can Active Memory Replace Attention?

October 27, 2016

89% Match

Łukasz Kaiser, Samy Bengio

Machine Learning

Computation and Language

Several mechanisms to focus attention of a neural network on selected parts of its input or memory have been used successfully in deep learning models in recent years. Attention has improved image classification, image captioning, speech recognition, generative models, and learning algorithmic tasks, but it had probably the largest impact on neural machine translation. Recently, similar improvements have been obtained using alternative mechanisms that do not focus on a sing...

Find SimilarView on arXiv