Memformer: A Memory-Augmented Transforme...

Large Memory Layers with Product Keys

July 10, 2019

90% Match

Guillaume Lample, Alexandre Sablayrolles, Marc'Aurelio Ranzato, ... , Jégou Hervé

Computation and Language

Machine Learning

This paper introduces a structured memory which can be easily integrated into a neural network. The memory is very large by design and significantly increases the capacity of the architecture, by up to a billion parameters with a negligible computational overhead. Its design and access pattern is based on product keys, which enable fast and exact nearest neighbor search. The ability to increase the number of parameters while keeping the same computational budget lets the over...

Find SimilarView on arXiv

Efficient Transformers: A Survey

September 14, 2020

90% Match

Yi Tay, Mostafa Dehghani, ... , Metzler Donald

Machine Learning

Artificial Intelligence

Computation and Language

Computer Vision and Pattern ...

Information Retrieval

Transformer model architectures have garnered immense interest lately due to their effectiveness across a range of domains like language, vision and reinforcement learning. In the field of natural language processing for example, Transformers have become an indispensable staple in the modern deep learning stack. Recently, a dizzying number of "X-former" models have been proposed - Reformer, Linformer, Performer, Longformer, to name a few - which improve upon the original Tran...

Find SimilarView on arXiv

Memorizing Transformers

March 16, 2022

90% Match

Yuhuai Wu, Markus N. Rabe, ... , Szegedy Christian

Machine Learning

Artificial Intelligence

Computation and Language

Language models typically need to be trained or finetuned in order to acquire new knowledge, which involves updating their weights. We instead envision language models that can simply read and memorize new data at inference time, thus acquiring new knowledge immediately. In this work, we extend language models with the ability to memorize the internal representations of past inputs. We demonstrate that an approximate kNN lookup into a non-differentiable memory of recent (key,...

Find SimilarView on arXiv

Memoria: Hebbian Memory Architecture for Human-Like Sequential Processing

October 4, 2023

90% Match

Sangjun Park, JinYeong Bak

Machine Learning

Artificial Intelligence

Neural and Evolutionary Comp...

Transformers have demonstrated their success in various domains and tasks. However, Transformers struggle with long input sequences due to their limited capacity. While one solution is to increase input length, endlessly stretching the length is unrealistic. Furthermore, humans selectively remember and use only relevant information from inputs, unlike Transformers which process all raw data from start to end. We introduce Memoria, a general memory network that applies Hebbian...

Find SimilarView on arXiv

Augmenting Self-attention with Persistent Memory

July 2, 2019

90% Match

Sainbayar Sukhbaatar, Edouard Grave, Guillaume Lample, ... , Joulin Armand

Machine Learning

Computation and Language

Machine Learning

Transformer networks have lead to important progress in language modeling and machine translation. These models include two consecutive modules, a feed-forward layer and a self-attention layer. The latter allows the network to capture long term dependencies and are often regarded as the key ingredient in the success of Transformers. Building upon this intuition, we propose a new model that solely consists of attention layers. More precisely, we augment the self-attention laye...

Find SimilarView on arXiv

On Difficulties of Attention Factorization through Shared Memory

March 31, 2024

90% Match

Uladzislau Yorsh, Martin Holeňa, ... , Herel David

Machine Learning

Transformers have revolutionized deep learning in numerous fields, including natural language processing, computer vision, and audio processing. Their strength lies in their attention mechanism, which allows for the discovering of complex input relationships. However, this mechanism's quadratic time and memory complexity pose challenges for larger inputs. Researchers are now investigating models like Linear Unified Nested Attention (Luna) or Memory Augmented Transformer, whic...

Find SimilarView on arXiv

Augmenting Language Models with Long-Term Memory

June 12, 2023

90% Match

Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, ... , Wei Furu

Computation and Language

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history. We design a novel decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memo...

Find SimilarView on arXiv

GMAT: Global Memory Augmentation for Transformers

June 5, 2020

90% Match

Ankit Gupta, Jonathan Berant

Machine Learning

Computation and Language

Machine Learning

Transformer-based models have become ubiquitous in natural language processing thanks to their large capacity, innate parallelism and high performance. The contextualizing component of a Transformer block is the $\textit{pairwise dot-product}$ attention that has a large $\Omega(L^2)$ memory requirement for length $L$ sequences, limiting its ability to process long documents. This has been the subject of substantial interest recently, where multiple approximations were propose...

Find SimilarView on arXiv

AttMEMO : Accelerating Transformers with Memoization on Big Memory Systems

January 23, 2023

90% Match

Yuan Feng, Hyeran Jeon, Filip Blagojevic, Cyril Guyot, ... , Li Dong

Performance

Artificial Intelligence

Machine Learning

Transformer models gain popularity because of their superior inference accuracy and inference throughput. However, the transformer is computation-intensive, causing a long inference time. The existing works on transformer inference acceleration have limitations caused by either the modification of transformer architectures or the need of specialized hardware. In this paper, we identify the opportunities of using memoization to accelerate the self-attention mechanism in transf...

Find SimilarView on arXiv

Not All Memories are Created Equal: Learning to Forget by Expiring

May 13, 2021

90% Match

Sainbayar Sukhbaatar, Da Ju, Spencer Poff, Stephen Roller, Arthur Szlam, ... , Fan Angela

Machine Learning

Artificial Intelligence

Attention mechanisms have shown promising results in sequence modeling tasks that require long-term memory. Recent work investigated mechanisms to reduce the computational cost of preserving and storing memories. However, not all content in the past is equally important to remember. We propose Expire-Span, a method that learns to retain the most important information and expire the irrelevant information. This forgetting of memories enables Transformers to scale to attend ove...

Find SimilarView on arXiv

Memformer: A Memory-Augmented Transformer for Sequence Modeling

Large Memory Layers with Product Keys

Efficient Transformers: A Survey

Memorizing Transformers

Memoria: Hebbian Memory Architecture for Human-Like Sequential Processing

Augmenting Self-attention with Persistent Memory

On Difficulties of Attention Factorization through Shared Memory

Augmenting Language Models with Long-Term Memory

GMAT: Global Memory Augmentation for Transformers

AttMEMO : Accelerating Transformers with Memoization on Big Memory Systems

Not All Memories are Created Equal: Learning to Forget by Expiring