Associative Recurrent Memory Transformer

$\text{Memory}^3$: Language Modeling with Explicit Memory

July 1, 2024

88% Match

Hongkang Yang, Zehao Lin, Wenjin Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, Jinbo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, ... , E Weinan

Computation and Language

Artificial Intelligence

Machine Learning

The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowledge externalized to explicit memories, the LLM can enjoy a smaller parameter size...

Find SimilarView on arXiv

Parallel Context Windows for Large Language Models

December 21, 2022

88% Match

Nir Ratner, Yoav Levine, Yonatan Belinkov, Ori Ram, Inbal Magar, Omri Abend, Ehud Karpas, Amnon Shashua, ... , Shoham Yoav

Computation and Language

When applied to processing long text, Large Language Models (LLMs) are limited by their context window. Existing efforts to address this limitation involve training specialized architectures, and cannot be easily applied to off-the-shelf LLMs. We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training. The key to the approach is to carve a long context into chunks (``windows''), restric...

Find SimilarView on arXiv

Benchmarking and Building Long-Context Retrieval Models with LoCo and M2-BERT

February 12, 2024

88% Match

Jon Saad-Falcon, Daniel Y. Fu, Simran Arora, ... , Ré Christopher

Information Retrieval

Machine Learning

Retrieval pipelines-an integral component of many machine learning systems-perform poorly in domains where documents are long (e.g., 10K tokens or more) and where identifying the relevant document requires synthesizing information across the entire text. Developing long-context retrieval encoders suitable for these domains raises three challenges: (1) how to evaluate long-context retrieval performance, (2) how to pretrain a base language model to represent both short contexts...

Find SimilarView on arXiv

RNNs are not Transformers (Yet): The Key Bottleneck on In-context Retrieval

February 28, 2024

88% Match

Kaiyue Wen, Xingyu Dang, Kaifeng Lyu

Machine Learning

Computation and Language

Machine Learning

This paper investigates the gap in representation powers of Recurrent Neural Networks (RNNs) and Transformers in the context of solving algorithmic problems. We focus on understanding whether RNNs, known for their memory efficiency in handling long sequences, can match the performance of Transformers, particularly when enhanced with Chain-of-Thought (CoT) prompting. Our theoretical analysis reveals that CoT improves RNNs but is insufficient to close the gap with Transformers....

Find SimilarView on arXiv

Memory-efficient Transformers via Top-$k$ Attention

June 13, 2021

88% Match

Ankit Gupta, Guy Dar, Shaya Goodman, ... , Berant Jonathan

Computation and Language

Machine Learning

Following the success of dot-product attention in Transformers, numerous approximations have been recently proposed to address its quadratic complexity with respect to the input length. While these variants are memory and compute efficient, it is not possible to directly use them with popular pre-trained language models trained using vanilla attention, without an expensive corrective pre-training stage. In this work, we propose a simple yet highly accurate approximation for v...

Find SimilarView on arXiv

RET-LLM: Towards a General Read-Write Memory for Large Language Models

May 23, 2023

88% Match

Ali Modarressi, Ayyoob Imani, ... , Schütze Hinrich

Computation and Language

Large language models (LLMs) have significantly advanced the field of natural language processing (NLP) through their extensive parameters and comprehensive data utilization. However, existing LLMs lack a dedicated memory unit, limiting their ability to explicitly store and retrieve knowledge for various tasks. In this paper, we propose RET-LLM a novel framework that equips LLMs with a general write-read memory unit, allowing them to extract, store, and recall knowledge from ...

Find SimilarView on arXiv

TransformerFAM: Feedback attention is working memory

April 14, 2024

88% Match

Dongseong Hwang, Weiran Wang, Zhuoyuan Huo, ... , Mengibar Pedro Moreno

Machine Learning

Artificial Intelligence

Computation and Language

While Transformers have revolutionized deep learning, their quadratic attention complexity hinders their ability to process infinitely long inputs. We propose Feedback Attention Memory (FAM), a novel Transformer architecture that leverages a feedback loop to enable the network to attend to its own latent representations. This design fosters the emergence of working memory within the Transformer, allowing it to process indefinitely long sequences. TransformerFAM requires no ad...

Find SimilarView on arXiv

Data-Efficient Autoregressive Document Retrieval for Fact Verification

November 17, 2022

88% Match

James Thorne

Computation and Language

Artificial Intelligence

Information Retrieval

Machine Learning

Document retrieval is a core component of many knowledge-intensive natural language processing task formulations such as fact verification and question answering. Sources of textual knowledge, such as Wikipedia articles, condition the generation of answers from the models. Recent advances in retrieval use sequence-to-sequence models to incrementally predict the title of the appropriate Wikipedia page given a query. However, this method requires supervision in the form of huma...

Find SimilarView on arXiv

ERNIE-Doc: A Retrospective Long-Document Modeling Transformer

December 31, 2020

88% Match

Siyu Ding, Junyuan Shang, Shuohuan Wang, Yu Sun, Hao Tian, ... , Wang Haifeng

Computation and Language

Transformers are not suited for processing long documents, due to their quadratically increasing memory and time consumption. Simply truncating a long document or applying the sparse attention mechanism will incur the context fragmentation problem or lead to an inferior modeling capability against comparable model sizes. In this paper, we propose ERNIE-Doc, a document-level language pretraining model based on Recurrence Transformers. Two well-designed techniques, namely the r...

Find SimilarView on arXiv

LaMemo: Language Modeling with Look-Ahead Memory

April 15, 2022

88% Match

Haozhe Ji, Rongsheng Zhang, Zhenyu Yang, ... , Huang Minlie

Computation and Language

Although Transformers with fully connected self-attentions are powerful to model long-term dependencies, they are struggling to scale to long texts with thousands of words in language modeling. One of the solutions is to equip the model with a recurrence memory. However, existing approaches directly reuse hidden states from the previous segment that encodes contexts in a uni-directional way. As a result, this prohibits the memory to dynamically interact with the current conte...

Find SimilarView on arXiv