In Search of Needles in a 11M Haystack: ...

BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack

June 14, 2024

95% Match

Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Ivan Rodkin, Dmitry Sorokin, ... , Burtsev Mikhail

Computation and Language

Artificial Intelligence

In recent years, the input context sizes of large language models (LLMs) have increased dramatically. However, existing evaluation methods have not kept pace, failing to comprehensively assess the efficiency of models in handling long contexts. To bridge this gap, we introduce the BABILong benchmark, designed to test language models' ability to reason across facts distributed in extremely long documents. BABILong includes a diverse set of 20 reasoning tasks, including fact ch...

Find SimilarView on arXiv

M+: Extending MemoryLLM with Scalable Long-Term Memory

February 1, 2025

92% Match

Yu Wang, Dmitry Krotov, Yuanzhe Hu, Yifan Gao, Wangchunshu Zhou, Julian McAuley, Dan Gutfreund, ... , He Zexue

Computation and Language

Equipping large language models (LLMs) with latent-space memory has attracted increasing attention as they can extend the context window of existing language models. However, retaining information from the distant past remains a challenge. For example, MemoryLLM (Wang et al., 2024a), as a representative work with latent-space memory, compresses past information into hidden states across all layers, forming a memory pool of 1B parameters. While effective for sequence lengths u...

Find SimilarView on arXiv

MemLong: Memory-Augmented Retrieval for Long Text Modeling

August 30, 2024

92% Match

Weijie Liu, Zecheng Tang, Juntao Li, ... , Zhang Min

Computation and Language

Artificial Intelligence

Recent advancements in Large Language Models (LLMs) have yielded remarkable success across diverse fields. However, handling long contexts remains a significant challenge for LLMs due to the quadratic time and space complexity of attention mechanisms and the growing memory consumption of the key-value cache during generation. This work introduces MemLong: Memory-Augmented Retrieval for Long Text Generation, a method designed to enhance the capabilities of long-context languag...

Find SimilarView on arXiv

Enhancing Long Context Performance in LLMs Through Inner Loop Query Mechanism

October 11, 2024

91% Match

Yimin Tang, Yurong Xu, ... , Mortazavi Masood

Computation and Language

Artificial Intelligence

Information Retrieval

Transformers have a quadratic scaling of computational complexity with input size, which limits the input context window size of large language models (LLMs) in both training and inference. Meanwhile, retrieval-augmented generation (RAG) besed models can better handle longer contexts by using a retrieval system to filter out unnecessary information. However, most RAG methods only perform retrieval based on the initial query, which may not work well with complex questions that...

Find SimilarView on arXiv

LM2: Large Memory Models

February 9, 2025

91% Match

Jikun Kang, Wenqi Wu, Filippos Christianos, Alex J. Chan, Fraser Greenlee, George Thomas, ... , Toulis Andy

Computation and Language

Artificial Intelligence

This paper introduces the Large Memory Model (LM2), a decoder-only Transformer architecture enhanced with an auxiliary memory module that aims to address the limitations of standard Transformers in multi-step reasoning, relational argumentation, and synthesizing information distributed over long contexts. The proposed LM2 incorporates a memory module that acts as a contextual representation repository, interacting with input tokens via cross attention and updating through gat...

Find SimilarView on arXiv

Associative Recurrent Memory Transformer

July 5, 2024

91% Match

Ivan Rodkin, Yuri Kuratov, ... , Burtsev Mikhail

Computation and Language

Artificial Intelligence

Machine Learning

This paper addresses the challenge of creating a neural architecture for very long sequences that requires constant time for processing new information at each time step. Our approach, Associative Recurrent Memory Transformer (ARMT), is based on transformer self-attention for local context and segment-level recurrence for storage of task specific information distributed over a long context. We demonstrate that ARMT outperfors existing alternatives in associative retrieval tas...

Find SimilarView on arXiv

$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens

February 21, 2024

91% Match

Xinrong Zhang, Yingfa Chen, Shengding Hu, Zihang Xu, Junhao Chen, Moo Khai Hao, Xu Han, Zhen Leng Thai, Shuo Wang, ... , Sun Maosong

Computation and Language

Processing and reasoning over long contexts is crucial for many practical applications of Large Language Models (LLMs), such as document comprehension and agent construction. Despite recent strides in making LLMs process contexts with more than 100K tokens, there is currently a lack of a standardized benchmark to evaluate this long-context capability. Existing public benchmarks typically focus on contexts around 10K tokens, limiting the assessment and comparison of LLMs in pr...

Find SimilarView on arXiv

Scaling Transformer to 1M tokens and beyond with RMT

April 19, 2023

91% Match

Aydar Bulatov, Yuri Kuratov, Mikhail S. Burtsev

Computation and Language

Artificial Intelligence

Machine Learning

This technical report presents the application of a recurrent memory to extend the context length of BERT, one of the most effective Transformer-based models in natural language processing. By leveraging the Recurrent Memory Transformer architecture, we have successfully increased the model's effective context length to an unprecedented two million tokens, while maintaining high memory retrieval accuracy. Our method allows for the storage and processing of both local and glob...

Find Similar View on arXiv

Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing

February 18, 2025

91% Match

Xiaoju Ye, Zhichun Wang, Jingyuan Wang

Computation and Language

Limited by the context window size of Large Language Models(LLMs), handling various tasks with input tokens exceeding the upper limit has been challenging, whether it is a simple direct retrieval task or a complex multi-hop reasoning task. Although various methods have been proposed to enhance the long-context processing capabilities of LLMs, they either incur substantial post-training costs, or require additional tool modules(e.g.,RAG), or have not shown significant improvem...

Find SimilarView on arXiv

NoLiMa: Long-Context Evaluation Beyond Literal Matching

February 7, 2025

91% Match

Ali Modarressi, Hanieh Deilamsalehy, Franck Dernoncourt, Trung Bui, Ryan A. Rossi, ... , Schütze Hinrich

Computation and Language

Recent large language models (LLMs) support long contexts ranging from 128K to 1M tokens. A popular method for evaluating these capabilities is the needle-in-a-haystack (NIAH) test, which involves retrieving a "needle" (relevant information) from a "haystack" (long irrelevant context). Extensions of this approach include increasing distractors, fact chaining, and in-context reasoning. However, in these benchmarks, models can exploit existing literal matches between the needle...

Find SimilarView on arXiv

In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss

BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack

M+: Extending MemoryLLM with Scalable Long-Term Memory

MemLong: Memory-Augmented Retrieval for Long Text Modeling

Enhancing Long Context Performance in LLMs Through Inner Loop Query Mechanism

LM2: Large Memory Models

Associative Recurrent Memory Transformer

$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens

Scaling Transformer to 1M tokens and beyond with RMT

Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing

NoLiMa: Long-Context Evaluation Beyond Literal Matching