ID: 2405.06067

HMT: Hierarchical Memory Transformer for Long Context Language Processing

May 9, 2024

View on ArXiv

Similar papers 3

HiP Attention: Sparse Sub-Quadratic Attention with Hierarchical Attention Pruning

June 14, 2024

90% Match
Heejun Lee, Geon Park, Youngwan Lee, Jina Kim, Wonyoung Jeong, ... , Hwang Sung Ju
Computation and Language
Computer Vision and Pattern ...
Distributed, Parallel, and C...
Machine Learning

In modern large language models (LLMs), increasing sequence lengths is a crucial challenge for enhancing their comprehension and coherence in handling complex tasks such as multi-modal question answering. However, handling long context sequences with LLMs is prohibitively costly due to the conventional attention mechanism's quadratic time and space complexity, and the context window size is limited by the GPU memory. Although recent works have proposed linear and sparse atten...

Find SimilarView on arXiv

Anchor-based Large Language Models

February 12, 2024

90% Match
Jianhui Pang, Fanghua Ye, ... , Wang Longyue
Computation and Language
Artificial Intelligence

Large language models (LLMs) predominantly employ decoder-only transformer architectures, necessitating the retention of keys/values information for historical tokens to provide contextual information and avoid redundant computation. However, the substantial size and parameter volume of these LLMs require massive GPU memory. This memory demand increases with the length of the input text, leading to an urgent need for more efficient methods of information storage and processin...

Find SimilarView on arXiv

Document-level Neural Machine Translation with Associated Memory Network

October 31, 2019

90% Match
Shu Jiang, Rui Wang, Zuchao Li, Masao Utiyama, Kehai Chen, Eiichiro Sumita, ... , Lu Bao-liang
Computation and Language

Standard neural machine translation (NMT) is on the assumption that the document-level context is independent. Most existing document-level NMT approaches are satisfied with a smattering sense of global document-level information, while this work focuses on exploiting detailed document-level context in terms of a memory network. The capacity of the memory network that detecting the most relevant part of the current sentence from memory renders a natural solution to model the ...

Find SimilarView on arXiv

Global memory transformer for processing long documents

December 3, 2022

89% Match
Arij Al Adel
Computation and Language
Machine Learning

Transformer variants dominate the state-of-the-art in different natural language processing tasks such as translation, reading comprehension and summarization. Our paper is more directed to use general memory slots added to the inputs and studying the results of adding these slots. This paper is a go on study of general memory slots rule that were added to the input of the proposed model in previous work. We have two main tasks;1) pretraining task using masked language modeli...

Find SimilarView on arXiv

Walking Down the Memory Maze: Beyond Context Limit through Interactive Reading

October 8, 2023

89% Match
Howard Chen, Ramakanth Pasunuru, ... , Celikyilmaz Asli
Computation and Language

Large language models (LLMs) have advanced in large strides due to the effectiveness of the self-attention mechanism that processes and compares all tokens at once. However, this mechanism comes with a fundamental issue -- the predetermined context window is bound to be limited. Despite attempts to extend the context window through methods like extrapolating the positional embedding, using recurrence, or selectively retrieving essential parts of the long sequence, long-text u...

Find SimilarView on arXiv

Memformer: A Memory-Augmented Transformer for Sequence Modeling

October 14, 2020

89% Match
Qingyang Wu, Zhenzhong Lan, Kun Qian, Jing Gu, ... , Yu Zhou
Computation and Language

Transformers have reached remarkable success in sequence modeling. However, these models have efficiency issues as they need to store all the history token-level representations as memory. We present Memformer, an efficient neural network for sequence modeling, that utilizes an external dynamic memory to encode and retrieve past information. Our model achieves linear time complexity and constant memory space complexity when processing long sequences. We also propose a new opt...

Find SimilarView on arXiv

In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss

February 16, 2024

89% Match
Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Dmitry Sorokin, ... , Burtsev Mikhail
Computation and Language
Artificial Intelligence
Machine Learning

This paper addresses the challenge of processing long documents using generative transformer models. To evaluate different approaches, we introduce BABILong, a new benchmark designed to assess model capabilities in extracting and processing distributed facts within extensive texts. Our evaluation, which includes benchmarks for GPT-4 and RAG, reveals that common methods are effective only for sequences up to $10^4$ elements. In contrast, fine-tuning GPT-2 with recurrent memory...

Find SimilarView on arXiv

MemoryPrompt: A Light Wrapper to Improve Context Tracking in Pre-trained Language Models

February 23, 2024

89% Match
Nathanaël Carraz Rakotonirina, Marco Baroni
Computation and Language
Artificial Intelligence
Machine Learning

Transformer-based language models (LMs) track contextual information through large, hard-coded input windows. We introduce MemoryPrompt, a leaner approach in which the LM is complemented by a small auxiliary recurrent network that passes information to the LM by prefixing its regular input with a sequence of vectors, akin to soft prompts, without requiring LM finetuning. Tested on a task designed to probe a LM's ability to keep track of multiple fact updates, a MemoryPrompt-a...

Find SimilarView on arXiv

Extending Memory for Language Modelling

May 19, 2023

89% Match
Anupiya Nugaliyadde
Computation and Language

Breakthroughs in deep learning and memory networks have made major advances in natural language understanding. Language is sequential and information carried through the sequence can be captured through memory networks. Learning the sequence is one of the key aspects in learning the language. However, memory networks are not capable of holding infinitely long sequences in their memories and are limited by various constraints such as the vanishing or exploding gradient problem...

Find SimilarView on arXiv

HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models

May 23, 2024

89% Match
Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Gu, ... , Su Yu
Computation and Language
Artificial Intelligence

In order to thrive in hostile and ever-changing natural environments, mammalian brains evolved to store large amounts of knowledge about the world and continually integrate new information while avoiding catastrophic forgetting. Despite the impressive accomplishments, large language models (LLMs), even with retrieval-augmented generation (RAG), still struggle to efficiently and effectively integrate a large amount of new experiences after pre-training. In this work, we introd...

Find SimilarView on arXiv