ID: 2402.10790

In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss

February 16, 2024

View on ArXiv
Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Dmitry Sorokin, Artyom Sorokin, Mikhail Burtsev
Computer Science
Computation and Language
Artificial Intelligence
Machine Learning

This paper addresses the challenge of processing long documents using generative transformer models. To evaluate different approaches, we introduce BABILong, a new benchmark designed to assess model capabilities in extracting and processing distributed facts within extensive texts. Our evaluation, which includes benchmarks for GPT-4 and RAG, reveals that common methods are effective only for sequences up to $10^4$ elements. In contrast, fine-tuning GPT-2 with recurrent memory augmentations enables it to handle tasks involving up to $11\times 10^6$ elements. This achievement marks a substantial leap, as it is by far the longest input processed by any neural network model to date, demonstrating a significant improvement in the processing capabilities for long sequences.

Similar papers 1

BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack

June 14, 2024

95% Match
Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Ivan Rodkin, Dmitry Sorokin, ... , Burtsev Mikhail
Computation and Language
Artificial Intelligence

In recent years, the input context sizes of large language models (LLMs) have increased dramatically. However, existing evaluation methods have not kept pace, failing to comprehensively assess the efficiency of models in handling long contexts. To bridge this gap, we introduce the BABILong benchmark, designed to test language models' ability to reason across facts distributed in extremely long documents. BABILong includes a diverse set of 20 reasoning tasks, including fact ch...

Find SimilarView on arXiv

$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens

February 21, 2024

91% Match
Xinrong Zhang, Yingfa Chen, Shengding Hu, Zihang Xu, Junhao Chen, Moo Khai Hao, Xu Han, Zhen Leng Thai, Shuo Wang, ... , Sun Maosong
Computation and Language

Processing and reasoning over long contexts is crucial for many practical applications of Large Language Models (LLMs), such as document comprehension and agent construction. Despite recent strides in making LLMs process contexts with more than 100K tokens, there is currently a lack of a standardized benchmark to evaluate this long-context capability. Existing public benchmarks typically focus on contexts around 10K tokens, limiting the assessment and comparison of LLMs in pr...

Find SimilarView on arXiv
Aydar Bulatov, Yuri Kuratov, Mikhail S. Burtsev
Computation and Language
Artificial Intelligence
Machine Learning

This technical report presents the application of a recurrent memory to extend the context length of BERT, one of the most effective Transformer-based models in natural language processing. By leveraging the Recurrent Memory Transformer architecture, we have successfully increased the model's effective context length to an unprecedented two million tokens, while maintaining high memory retrieval accuracy. Our method allows for the storage and processing of both local and glob...

XL$^2$Bench: A Benchmark for Extremely Long Context Understanding with Long-range Dependencies

April 8, 2024

91% Match
Xuanfan Ni, Hengyi Cai, Xiaochi Wei, Shuaiqiang Wang, ... , Li Piji
Computation and Language

Large Language Models (LLMs) have demonstrated remarkable performance across diverse tasks but are constrained by their small context window sizes. Various efforts have been proposed to expand the context window to accommodate even up to 200K input tokens. Meanwhile, building high-quality benchmarks with much longer text lengths and more demanding tasks to provide comprehensive evaluations is of immense practical interest to facilitate long context understanding research of L...

Find SimilarView on arXiv

Augmenting Language Models with Long-Term Memory

June 12, 2023

91% Match
Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, ... , Wei Furu
Computation and Language

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history. We design a novel decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memo...

Find SimilarView on arXiv

MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory

April 17, 2024

90% Match
Ali Modarressi, Abdullatif Köksal, Ayyoob Imani, ... , Schütze Hinrich
Computation and Language

While current large language models (LLMs) demonstrate some capabilities in knowledge-intensive tasks, they are limited by relying on their parameters as an implicit storage mechanism. As a result, they struggle with infrequent knowledge and temporal degradation. In addition, the uninterpretable nature of parametric memorization makes it challenging to understand and prevent hallucination. Parametric memory pools and model editing are only partial solutions. Retrieval Augment...

Find SimilarView on arXiv

A Survey on Long Text Modeling with Transformers

February 28, 2023

90% Match
Zican Dong, Tianyi Tang, ... , Zhao Wayne Xin
Computation and Language

Modeling long texts has been an essential technique in the field of natural language processing (NLP). With the ever-growing number of long documents, it is important to develop effective modeling methods that can process and analyze such texts. However, long texts pose important research challenges for existing text models, with more complex semantics and special characteristics. In this paper, we provide an overview of the recent advances on long texts modeling based on Tra...

Find SimilarView on arXiv

InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory

February 7, 2024

90% Match
Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, ... , Sun Maosong
Computation and Language
Artificial Intelligence
Machine Learning

Large language models (LLMs) have emerged as a cornerstone in real-world applications with lengthy streaming inputs, such as LLM-driven agents. However, existing LLMs, pre-trained on sequences with restricted maximum length, cannot generalize to longer sequences due to the out-of-domain and distraction issues. To alleviate these issues, existing efforts employ sliding attention windows and discard distant tokens to achieve the processing of extremely long sequences. Unfortuna...

Find SimilarView on arXiv

LooGLE: Can Long-Context Language Models Understand Long Contexts?

November 8, 2023

90% Match
Jiaqi Li, Mengmeng Wang, ... , Zhang Muhan
Computation and Language
Artificial Intelligence

Large language models (LLMs), despite their impressive performance in various language tasks, are typically limited to processing texts within context-window size. This limitation has spurred significant research efforts to enhance LLMs' long-context understanding with high-quality long-sequence benchmarks. However, prior datasets in this regard suffer from shortcomings, such as short context length compared to the context window of modern LLMs; outdated documents that have d...

Find SimilarView on arXiv

Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System

April 26, 2023

90% Match
Xinnian Liang, Bing Wang, Hui Huang, Shuangzhi Wu, Peihao Wu, Lu Lu, ... , Li Zhoujun
Computation and Language

Large-scale Language Models (LLMs) are constrained by their inability to process lengthy inputs. To address this limitation, we propose the Self-Controlled Memory (SCM) system to unleash infinite-length input capacity for large-scale language models. Our SCM system is composed of three key modules: the language model agent, the memory stream, and the memory controller. The language model agent iteratively processes ultra-long inputs and stores all historical information in th...

Find SimilarView on arXiv