Scaling Transformer to 1M tokens and bey...

Large Memory Layers with Product Keys

July 10, 2019

91% Match

Guillaume Lample, Alexandre Sablayrolles, Marc'Aurelio Ranzato, ... , Jégou Hervé

Computation and Language

Machine Learning

This paper introduces a structured memory which can be easily integrated into a neural network. The memory is very large by design and significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on product keys, which enable fast and exact nearest neighbor search. The ability to increase the number of parameters while keeping the same computational budget lets the over...

Find SimilarView on arXiv

$\text{Memory}^3$: Language Modeling with Explicit Memory

July 1, 2024

91% Match

Hongkang Yang, Zehao Lin, Wenjin Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, Jinbo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, ... , E Weinan

Computation and Language

Artificial Intelligence

Machine Learning

The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowledge externalized to explicit memories, the LLM can enjoy a smaller parameter size...

Find SimilarView on arXiv

Improving language models by retrieving from trillions of tokens

December 8, 2021

91% Match

Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, Diego de Las Casas, Aurelia Guy, Jacob Menick, Roman Ring, Tom Hennigan, Saffron Huang, Loren Maggiore, Chris Jones, Albin Cassirer, Andy Brock, Michela Paganini, Geoffrey Irving, Oriol Vinyals, Simon Osindero, Karen Simonyan, Jack W. Rae, ... , Sifre Laurent

Computation and Language

Machine Learning

We enhance auto-regressive language models by conditioning on document chunks retrieved from a large corpus, based on local similarity with preceding tokens. With a $2$ trillion token database, our Retrieval-Enhanced Transformer (RETRO) obtains comparable performance to GPT-3 and Jurassic-1 on the Pile, despite using 25$\times$ fewer parameters. After fine-tuning, RETRO performance translates to downstream knowledge-intensive tasks such as question answering. RETRO combines a...

Find SimilarView on arXiv

Augmenting Language Models with Long-Term Memory

June 12, 2023

91% Match

Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, ... , Wei Furu

Computation and Language

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history. We design a novel decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memo...

Find SimilarView on arXiv

LaMemo: Language Modeling with Look-Ahead Memory

April 15, 2022

90% Match

Haozhe Ji, Rongsheng Zhang, Zhenyu Yang, ... , Huang Minlie

Computation and Language

Although Transformers with fully connected self-attentions are powerful to model long-term dependencies, they are struggling to scale to long texts with thousands of words in language modeling. One of the solutions is to equip the model with a recurrence memory. However, existing approaches directly reuse hidden states from the previous segment that encodes contexts in a uni-directional way. As a result, this prohibits the memory to dynamically interact with the current conte...

Find SimilarView on arXiv

Retentive Network: A Successor to Transformer for Large Language Models

July 17, 2023

90% Match

Yutao Sun, Li Dong, Shaohan Huang, Shuming Ma, Yuqing Xia, Jilong Xue, ... , Wei Furu

Computation and Language

Machine Learning

In this work, we propose Retentive Network (RetNet) as a foundation architecture for large language models, simultaneously achieving training parallelism, low-cost inference, and good performance. We theoretically derive the connection between recurrence and attention. Then we propose the retention mechanism for sequence modeling, which supports three computation paradigms, i.e., parallel, recurrent, and chunkwise recurrent. Specifically, the parallel representation allows fo...

Find SimilarView on arXiv

R-Transformer: Recurrent Neural Network Enhanced Transformer

July 12, 2019

90% Match

Zhiwei Wang, Yao Ma, ... , Tang Jiliang

Machine Learning

Computation and Language

Computer Vision and Pattern ...

Audio and Speech Processing

Recurrent Neural Networks have long been the dominating choice for sequence modeling. However, it severely suffers from two issues: impotent in capturing very long-term dependencies and unable to parallelize the sequential computation procedure. Therefore, many non-recurrent sequence models that are built on convolution and attention operations have been proposed recently. Notably, models with multi-head attention such as Transformer have demonstrated extreme effectiveness in...

Find SimilarView on arXiv

MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory

April 17, 2024

90% Match

Ali Modarressi, Abdullatif Köksal, Ayyoob Imani, ... , Schütze Hinrich

Computation and Language

While current large language models (LLMs) demonstrate some capabilities in knowledge-intensive tasks, they are limited by relying on their parameters as an implicit storage mechanism. As a result, they struggle with infrequent knowledge and temporal degradation. In addition, the uninterpretable nature of parametric memorization makes it challenging to understand and prevent hallucination. Parametric memory pools and model editing are only partial solutions. Retrieval Augment...

Find SimilarView on arXiv

Memoria: Hebbian Memory Architecture for Human-Like Sequential Processing

October 4, 2023

90% Match

Sangjun Park, JinYeong Bak

Machine Learning

Artificial Intelligence

Neural and Evolutionary Comp...

Transformers have demonstrated their success in various domains and tasks. However, Transformers struggle with long input sequences due to their limited capacity. While one solution is to increase input length, endlessly stretching the length is unrealistic. Furthermore, humans selectively remember and use only relevant information from inputs, unlike Transformers which process all raw data from start to end. We introduce Memoria, a general memory network that applies Hebbian...

Find SimilarView on arXiv

ERNIE-Doc: A Retrospective Long-Document Modeling Transformer

December 31, 2020

90% Match

Siyu Ding, Junyuan Shang, Shuohuan Wang, Yu Sun, Hao Tian, ... , Wang Haifeng

Computation and Language

Transformers are not suited for processing long documents, due to their quadratically increasing memory and time consumption. Simply truncating a long document or applying the sparse attention mechanism will incur the context fragmentation problem or lead to an inferior modeling capability against comparable model sizes. In this paper, we propose ERNIE-Doc, a document-level language pretraining model based on Recurrence Transformers. Two well-designed techniques, namely the r...

Find SimilarView on arXiv

Scaling Transformer to 1M tokens and beyond with RMT

Large Memory Layers with Product Keys

$\text{Memory}^3$: Language Modeling with Explicit Memory

Improving language models by retrieving from trillions of tokens

Augmenting Language Models with Long-Term Memory

LaMemo: Language Modeling with Look-Ahead Memory

Retentive Network: A Successor to Transformer for Large Language Models

R-Transformer: Recurrent Neural Network Enhanced Transformer

MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory

Memoria: Hebbian Memory Architecture for Human-Like Sequential Processing

ERNIE-Doc: A Retrospective Long-Document Modeling Transformer