Associative Recurrent Memory Transformer

Memformer: A Memory-Augmented Transformer for Sequence Modeling

October 14, 2020

89% Match

Qingyang Wu, Zhenzhong Lan, Kun Qian, Jing Gu, ... , Yu Zhou

Computation and Language

Transformers have reached remarkable success in sequence modeling. However, these models have efficiency issues as they need to store all the history token-level representations as memory. We present Memformer, an efficient neural network for sequence modeling, that utilizes an external dynamic memory to encode and retrieve past information. Our model achieves linear time complexity and constant memory space complexity when processing long sequences. We also propose a new opt...

Find SimilarView on arXiv

Neurocache: Efficient Vector Retrieval for Long-range Language Modeling

July 2, 2024

88% Match

Ali Safaya, Deniz Yuret

Computation and Language

Artificial Intelligence

Machine Learning

This paper introduces Neurocache, an approach to extend the effective context size of large language models (LLMs) using an external vector cache to store its past states. Like recent vector retrieval approaches, Neurocache uses an efficient k-nearest-neighbor (kNN) algorithm to retrieve relevant past states and incorporate them into the attention process. Neurocache improves upon previous methods by (1) storing compressed states, which reduces cache size; (2) performing a si...

Find SimilarView on arXiv

TCNCA: Temporal Convolution Network with Chunked Attention for Scalable Sequence Processing

December 9, 2023

88% Match

Aleksandar Terzic, Michael Hersche, Geethan Karunaratne, Luca Benini, ... , Rahimi Abbas

Machine Learning

Computer Vision and Pattern ...

MEGA is a recent transformer-based architecture, which utilizes a linear recurrent operator whose parallel computation, based on the FFT, scales as $O(LlogL)$, with $L$ being the sequence length. We build upon their approach by replacing the linear recurrence with a special temporal convolutional network which permits larger receptive field size with shallower networks, and reduces the computational complexity to $O(L)$. The resulting model is called TCNCA, a Temporal Convolu...

Find SimilarView on arXiv

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

January 9, 2019

88% Match

Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, ... , Salakhutdinov Ruslan

Machine Learning

Computation and Language

Machine Learning

Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling. We propose a novel neural architecture Transformer-XL that enables learning dependency beyond a fixed length without disrupting temporal coherence. It consists of a segment-level recurrence mechanism and a novel positional encoding scheme. Our method not only enables capturing longer-term dependency, but also resolves the context fra...

Find SimilarView on arXiv

An Efficient Memory-Augmented Transformer for Knowledge-Intensive NLP Tasks

October 30, 2022

88% Match

Yuxiang Wu, Yu Zhao, Baotian Hu, Pasquale Minervini, ... , Riedel Sebastian

Computation and Language

Artificial Intelligence

Machine Learning

Access to external knowledge is essential for many natural language processing tasks, such as question answering and dialogue. Existing methods often rely on a parametric model that stores knowledge in its parameters, or use a retrieval-augmented model that has access to an external knowledge source. Parametric and retrieval-augmented models have complementary strengths in terms of computational efficiency and predictive accuracy. To combine the strength of both approaches, w...

Find SimilarView on arXiv

MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory

April 17, 2024

88% Match

Ali Modarressi, Abdullatif Köksal, Ayyoob Imani, ... , Schütze Hinrich

Computation and Language

While current large language models (LLMs) demonstrate some capabilities in knowledge-intensive tasks, they are limited by relying on their parameters as an implicit storage mechanism. As a result, they struggle with infrequent knowledge and temporal degradation. In addition, the uninterpretable nature of parametric memorization makes it challenging to understand and prevent hallucination. Parametric memory pools and model editing are only partial solutions. Retrieval Augment...

Find SimilarView on arXiv

Improving language models by retrieving from trillions of tokens

December 8, 2021

88% Match

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan, Jack W. Rae, ... , Sifre Laurent

Computation and Language

Machine Learning

We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\times$ fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a...

Find SimilarView on arXiv

Learning to Rehearse in Long Sequence Memorization

June 2, 2021

88% Match

Zhu Zhang, Chang Zhou, Jianxin Ma, Zhijie Lin, Jingren Zhou, ... , Zhao Zhou

Machine Learning

Existing reasoning tasks often have an important assumption that the input contents can be always accessed while reasoning, requiring unlimited storage resources and suffering from severe time delay on long sequences. To achieve efficient reasoning on long sequences with limited storage resources, memory augmented neural networks introduce a human-like write-read memory to compress and memorize the long input sequence in one pass, trying to answer subsequent queries only base...

Find SimilarView on arXiv

Autoregressive Search Engines: Generating Substrings as Document Identifiers

April 22, 2022

88% Match

Michele Bevilacqua, Giuseppe Ottaviano, Patrick Lewis, Wen-tau Yih, ... , Petroni Fabio

Computation and Language

Information Retrieval

Knowledge-intensive language tasks require NLP systems to both provide the correct answer and retrieve supporting evidence for it in a given corpus. Autoregressive language models are emerging as the de-facto standard for generating answers, with newer and more powerful systems emerging at an astonishing pace. In this paper we argue that all this (and future) progress can be directly applied to the retrieval problem with minimal intervention to the models' architecture. Previ...

Find SimilarView on arXiv

Augmenting Language Models with Long-Term Memory

June 12, 2023

88% Match

Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, ... , Wei Furu

Computation and Language

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history. We design a novel decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memo...

Find SimilarView on arXiv