HMT: Hierarchical Memory Transformer for Long Context Language Processing

May 9, 2024

View on ArXiv

HiP Attention: Sparse Sub-Quadratic Attention with Hierarchical Attention Pruning

June 14, 2024

90% Match

Heejun Lee, Geon Park, Youngwan Lee, Jina Kim, Wonyoung Jeong, ... , Hwang Sung Ju

Computation and Language

Computer Vision and Pattern ...

Distributed, Parallel, and C...

Machine Learning

In modern large language models (LLMs), increasing sequence lengths is a crucial challenge for enhancing their comprehension and coherence in handling complex tasks such as multi-modal question answering. However, handling long context sequences with LLMs is prohibitively costly due to the conventional attention mechanism's quadratic time and space complexity, and the context window size is limited by the GPU memory. Although recent works have proposed linear and sparse atten...

Find SimilarView on arXiv

Anchor-based Large Language Models

February 12, 2024

90% Match

Jianhui Pang, Fanghua Ye, ... , Wang Longyue

Computation and Language

Artificial Intelligence

Large language models (LLMs) predominantly employ decoder-only transformer architectures, necessitating the retention of keys/values information for historical tokens to provide contextual information and avoid redundant computation. However, the substantial size and parameter volume of these LLMs require massive GPU memory. This memory demand increases with the length of the input text, leading to an urgent need for more efficient methods of information storage and processin...

Find SimilarView on arXiv

Document-level Neural Machine Translation with Associated Memory Network

October 31, 2019

90% Match

Shu Jiang, Rui Wang, Zuchao Li, Masao Utiyama, Kehai Chen, Eiichiro Sumita, ... , Lu Bao-liang

Computation and Language

Standard neural machine translation (NMT) is on the assumption that the document-level context is independent. Most existing document-level NMT approaches are satisfied with a smattering sense of global document-level information, while this work focuses on exploiting detailed document-level context in terms of a memory network. The capacity of the memory network that detecting the most relevant part of the current sentence from memory renders a natural solution to model the ...

Find SimilarView on arXiv

Global memory transformer for processing long documents

December 3, 2022

89% Match

Arij Al Adel

Computation and Language

Machine Learning

Transformer variants dominate the state-of-the-art in different natural language processing tasks such as translation, reading comprehension and summarization. Our paper is more directed to use general memory slots added to the inputs and studying the results of adding these slots. This paper is a go on study of general memory slots rule that were added to the input of the proposed model in previous work. We have two main tasks;1) pretraining task using masked language modeli...

Find SimilarView on arXiv

Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading

October 8, 2023

89% Match

Howard Chen, Ramakanth Pasunuru, ... , Celikyilmaz Asli

Computation and Language

Large language models (LLMs) have advanced in large strides due to the effectiveness of the self-attention mechanism that processes and compares all tokens at once. However, this mechanism comes with a fundamental issue -- the predetermined context window is bound to be limited. Despite attempts to extend the context window through methods like extrapolating the positional embedding, using recurrence, or selectively retrieving essential parts of the long sequence, long-text u...

Find SimilarView on arXiv

Memformer: A Memory-Augmented Transformer for Sequence Modeling

October 14, 2020

89% Match

Qingyang Wu, Zhenzhong Lan, Kun Qian, Jing Gu, ... , Yu Zhou

Computation and Language

Transformers have reached remarkable success in sequence modeling. However, these models have efficiency issues as they need to store all the history token-level representations as memory. We present Memformer, an efficient neural network for sequence modeling, that utilizes an external dynamic memory to encode and retrieve past information. Our model achieves linear time complexity and constant memory space complexity when processing long sequences. We also propose a new opt...

Find SimilarView on arXiv

In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss

February 16, 2024

89% Match

Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Dmitry Sorokin, ... , Burtsev Mikhail

Computation and Language

Artificial Intelligence

Machine Learning

This paper addresses the challenge of processing long documents using generative transformer models. To evaluate different approaches, we introduce BABILong, a new benchmark designed to assess model capabilities in extracting and processing distributed facts within extensive texts. Our evaluation, which includes benchmarks for GPT-4 and RAG, reveals that common methods are effective only for sequences up to $10^4$ elements. In contrast, fine-tuning GPT-2 with recurrent memory...

Find SimilarView on arXiv

MemoryPrompt: A Light Wrapper to Improve Context Tracking in Pre-trained Language Models

February 23, 2024

89% Match

Nathanaël Carraz Rakotonirina, Marco Baroni

Computation and Language

Artificial Intelligence

Machine Learning

Transformer-based language models (LMs) track contextual information through large, hard-coded input windows. We introduce MemoryPrompt, a leaner approach in which the LM is complemented by a small auxiliary recurrent network that passes information to the LM by prefixing its regular input with a sequence of vectors, akin to soft prompts, without requiring LM finetuning. Tested on a task designed to probe a LM's ability to keep track of multiple fact updates, a MemoryPrompt-a...

Find SimilarView on arXiv

Extending Memory for Language Modelling

May 19, 2023

89% Match

Anupiya Nugaliyadde

Computation and Language

Breakthroughs in deep learning and memory networks have made major advances in natural language understanding. Language is sequential and information carried through the sequence can be captured through memory networks. Learning the sequence is one of the key aspects in learning the language. However, memory networks are not capable of holding infinitely long sequences in their memories and are limited by various constraints such as the vanishing or exploding gradient problem...

Find SimilarView on arXiv

HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models

May 23, 2024

89% Match

Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Gu, ... , Su Yu

Computation and Language

Artificial Intelligence

In order to thrive in hostile and ever-changing natural environments, mammalian brains evolved to store large amounts of knowledge about the world and continually integrate new information while avoiding catastrophic forgetting. Despite the impressive accomplishments, large language models (LLMs), even with retrieval-augmented generation (RAG), still struggle to efficiently and effectively integrate a large amount of new experiences after pre-training. In this work, we introd...

Find SimilarView on arXiv