In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss

February 16, 2024

R$^3$Mem: Bridging Memory Retention and Retrieval via Reversible Compression

February 21, 2025

90% Match

Xiaoqiang Wang, Suyuchen Wang, ... , Liu Bang

Computation and Language

Artificial Intelligence

Memory plays a key role in enhancing LLMs' performance when deployed to real-world applications. Existing solutions face trade-offs: explicit memory designs based on external storage require complex management and incur storage overhead, while implicit memory designs that store information via parameters struggle with reliable retrieval. In this paper, we propose R$^3$Mem, a memory network that optimizes both information Retention and Retrieval through Reversible context comp...

Find SimilarView on arXiv

LooGLE: Can Long-Context Language Models Understand Long Contexts?

November 8, 2023

90% Match

Jiaqi Li, Mengmeng Wang, ... , Zhang Muhan

Computation and Language

Artificial Intelligence

Large language models (LLMs), despite their impressive performance in various language tasks, are typically limited to processing texts within context-window size. This limitation has spurred significant research efforts to enhance LLMs' long-context understanding with high-quality long-sequence benchmarks. However, prior datasets in this regard suffer from shortcomings, such as short context length compared to the context window of modern LLMs; outdated documents that have d...

Find SimilarView on arXiv

Unleashing Infinite-Length Input Capacity for Large-scale Language Models with Self-Controlled Memory System

April 26, 2023

90% Match

Xinnian Liang, Bing Wang, Hui Huang, Shuangzhi Wu, Peihao Wu, Lu Lu, ... , Li Zhoujun

Computation and Language

Large-scale Language Models (LLMs) are constrained by their inability to process lengthy inputs. To address this limitation, we propose the Self-Controlled Memory (SCM) system to unleash infinite-length input capacity for large-scale language models. Our SCM system is composed of three key modules: the language model agent, the memory stream, and the memory controller. The language model agent iteratively processes ultra-long inputs and stores all historical information in th...

Find SimilarView on arXiv

MuLD: The Multitask Long Document Benchmark

February 15, 2022

90% Match

G Thomas Hudson, Noura Al Moubayed

Computation and Language

Artificial Intelligence

The impressive progress in NLP techniques has been driven by the development of multi-task benchmarks such as GLUE and SuperGLUE. While these benchmarks focus on tasks for one or two input sentences, there has been exciting work in designing efficient techniques for processing much longer inputs. In this paper, we present MuLD: a new long document benchmark consisting of only documents over 10,000 tokens. By modifying existing NLP tasks, we create a diverse benchmark which re...

Find SimilarView on arXiv

Extended Mind Transformers

June 4, 2024

90% Match

Phoebe Klett, Thomas Ahle

Machine Learning

Computation and Language

Pre-trained language models demonstrate general intelligence and common sense, but long inputs quickly become a bottleneck for memorizing information at inference time. We resurface a simple method, Memorizing Transformers (Wu et al., 2022), that gives the model access to a bank of pre-computed memories. We show that it is possible to fix many of the shortcomings of the original method, such as the need for fine-tuning, by critically assessing how positional encodings should ...

Find SimilarView on arXiv

LLM In-Context Recall is Prompt Dependent

April 13, 2024

90% Match

Daniel Machlab, Rick Battle

Computation and Language

Machine Learning

The proliferation of Large Language Models (LLMs) highlights the critical importance of conducting thorough evaluations to discern their comparative advantages, limitations, and optimal use cases. Particularly important is assessing their capacity to accurately retrieve information included in a given prompt. A model's ability to do this significantly influences how effectively it can utilize contextual details, thus impacting its practical efficacy and dependability in real-...

Find SimilarView on arXiv

NeedleBench: Can LLMs Do Retrieval and Reasoning in 1 Million Context Window?

July 16, 2024

90% Match

Mo Li, Songyang Zhang, ... , Chen Kai

Computation and Language

In evaluating the long-context capabilities of large language models (LLMs), identifying content relevant to a user's query from original long documents is a crucial prerequisite for any LLM to answer questions based on long text. We present NeedleBench, a framework consisting of a series of progressively more challenging tasks for assessing bilingual long-context capabilities, spanning multiple length intervals (4k, 8k, 32k, 128k, 200k, 1000k, and beyond) and different depth...

Find SimilarView on arXiv

SEGMENT+: Long Text Processing with Short-Context Language Models

October 9, 2024

90% Match

Wei Shi, Shuang Li, Kerun Yu, Jinglei Chen, Zujie Liang, Xinhui Wu, Yuxi Qian, Feng Wei, Bo Zheng, Jiaqing Liang, ... , Xiao Yanghua

Computation and Language

There is a growing interest in expanding the input capacity of language models (LMs) across various domains. However, simply increasing the context window does not guarantee robust performance across diverse long-input processing tasks, such as understanding extensive documents and extracting detailed information from lengthy and noisy data. In response, we introduce SEGMENT+, a general framework that enables LMs to handle extended inputs within limited context windows effici...

Find SimilarView on arXiv

Memorizing Documents with Guidance in Large Language Models

June 23, 2024

90% Match

Bumjin Park, Jaesik Choi

Computation and Language

Artificial Intelligence

Training data plays a pivotal role in AI models. Large language models (LLMs) are trained with massive amounts of documents, and their parameters hold document-related contents. Recently, several studies identified content-specific locations in LLMs by examining the parameters. Instead of the post hoc interpretation, we propose another approach. We propose document-wise memory architecture to track document memories in training. The proposed architecture maps document represe...

Find SimilarView on arXiv

ACER: Automatic Language Model Context Extension via Retrieval

October 11, 2024

90% Match

Luyu Gao, Yunyi Zhang, Jamie Callan

Computation and Language

Artificial Intelligence

Information Retrieval

Machine Learning

Long-context modeling is one of the critical capabilities of language AI for digesting and reasoning over complex information pieces. In practice, long-context capabilities are typically built into a pre-trained language model~(LM) through a carefully designed context extension stage, with the goal of producing generalist long-context capabilities. In our preliminary experiments, however, we discovered that the current open-weight generalist long-context models are still lack...

Find SimilarView on arXiv