In Search of Needles in a 11M Haystack: ...

Automatic Summarization of Long Documents

October 8, 2024

89% Match

Naman Chhibbar, Jugal Kalita

Computation and Language

Artificial Intelligence

A vast amount of textual data is added to the internet daily, making utilization and interpretation of such data difficult and cumbersome. As a result, automatic text summarization is crucial for extracting relevant information, saving precious reading time. Although many transformer-based models excel in summarization, they are constrained by their input size, preventing them from processing texts longer than their context size. This study introduces three novel algorithms t...

Find SimilarView on arXiv

Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

June 25, 2024

89% Match

Minzheng Wang, Longze Chen, Cheng Fu, Shengyi Liao, Xinghua Zhang, Bingli Wu, Haiyang Yu, Nan Xu, Lei Zhang, Run Luo, Yunshui Li, Min Yang, ... , Li Yongbin

Computation and Language

Artificial Intelligence

Long-context modeling capabilities have garnered widespread attention, leading to the emergence of Large Language Models (LLMs) with ultra-context windows. Meanwhile, benchmarks for evaluating long-context LLMs are gradually catching up. However, existing benchmarks employ irrelevant noise texts to artificially extend the length of test cases, diverging from the real-world scenarios of long-context applications. To bridge this gap, we propose a novel long-context benchmark, L...

Find SimilarView on arXiv

EpMAN: Episodic Memory AttentioN for Generalizing to Longer Contexts

February 20, 2025

89% Match

Subhajit Chaudhury, Payel Das, Sarathkrishna Swaminathan, Georgios Kollias, Elliot Nelson, Khushbu Pahwa, Tejaswini Pedapati, ... , Riemer Matthew

Computation and Language

Artificial Intelligence

Recent advances in Large Language Models (LLMs) have yielded impressive successes on many language tasks. However, efficient processing of long contexts using LLMs remains a significant challenge. We introduce \textbf{EpMAN} -- a method for processing long contexts in an \textit{episodic memory} module while \textit{holistically attending to} semantically relevant context chunks. The output of \textit{episodic attention} is then used to reweigh the decoder's self-attention to...

Find SimilarView on arXiv

LLM$\times$MapReduce: Simplified Long-Sequence Processing using Large Language Models

October 12, 2024

89% Match

Zihan Zhou, Chong Li, Xinyi Chen, Shuo Wang, Yu Chao, Zhili Li, Haoyu Wang, Rongqiao An, Qi Shi, Zhixing Tan, Xu Han, Xiaodong Shi, ... , Sun Maosong

Computation and Language

Enlarging the context window of large language models (LLMs) has become a crucial research area, particularly for applications involving extremely long texts. In this work, we propose a novel training-free framework for processing long texts, utilizing a divide-and-conquer strategy to achieve comprehensive document understanding. The proposed LLM$\times$MapReduce framework splits the entire document into several chunks for LLMs to read and then aggregates the intermediate ans...

Find SimilarView on arXiv

MELODI: Exploring Memory Compression for Long Contexts

October 4, 2024

89% Match

Yinpeng Chen, DeLesley Hutchins, Aren Jansen, Andrey Zhmoginov, ... , Andersen Jesper

Machine Learning

Artificial Intelligence

We present MELODI, a novel memory architecture designed to efficiently process long documents using short context windows. The key principle behind MELODI is to represent short-term and long-term memory as a hierarchical compression scheme across both network layers and context windows. Specifically, the short-term memory is achieved through recurrent compression of context windows across multiple layers, ensuring smooth transitions between windows. In contrast, the long-term...

Find SimilarView on arXiv

Retrieval Head Mechanistically Explains Long-Context Factuality

April 24, 2024

89% Match

Wenhao Wu, Yizhong Wang, Guangxuan Xiao, ... , Fu Yao

Computation and Language

Despite the recent progress in long-context language models, it remains elusive how transformer-based models exhibit the capability to retrieve relevant information from arbitrary locations within the long context. This paper aims to address this question. Our systematic investigation across a wide spectrum of models reveals that a special type of attention heads are largely responsible for retrieving information, which we dub retrieval heads. We identify intriguing propertie...

Find SimilarView on arXiv

HMT: Hierarchical Memory Transformer for Long Context Language Processing

May 9, 2024

89% Match

Zifan He, Zongyue Qin, Neha Prakriya, ... , Cong Jason

Computation and Language

Machine Learning

Transformer-based large language models (LLM) have been widely used in language processing applications. However, most of them restrict the context window that permits the model to attend to every token in the inputs. Previous works in recurrent models can memorize past tokens to enable unlimited context and maintain effectiveness. However, they have "flat" memory architectures, which have limitations in selecting and filtering information. Since humans are good at learning a...

Find SimilarView on arXiv

Structured Token Retention and Computational Memory Paths in Large Language Models

February 5, 2025

89% Match

Jonathan Delena, Augustin Moreau, ... , Chatterton Frederick

Computation and Language

Memory retention mechanisms play a central role in determining the efficiency of computational architectures designed for processing extended sequences. Conventional methods for token management often impose fixed retention thresholds or rely on uniform attention weight distributions, leading to inefficient memory utilization and premature information loss in extended sequence modeling. Structured Token Retention (STR) introduces a probabilistic selection framework that dynam...

Find SimilarView on arXiv

Long Context vs. RAG for LLMs: An Evaluation and Revisits

December 27, 2024

89% Match

Xinze Li, Yixin Cao, ... , Sun Aixin

Computation and Language

Extending context windows (i.e., Long Context, LC) and using retrievers to selectively access relevant information (i.e., Retrieval-Augmented Generation, RAG) are the two main strategies to enable LLMs to incorporate extremely long external contexts. This paper revisits recent studies on this topic, highlighting their key insights and discrepancies. We then provide a more comprehensive evaluation by filtering out questions answerable without external context, identifying the ...

Find SimilarView on arXiv

Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks

December 20, 2024

89% Match

Brian J Chan, Chao-Ting Chen, ... , Huang Hen-Hsen

Computation and Language

Retrieval-augmented generation (RAG) has gained traction as a powerful approach for enhancing language models by integrating external knowledge sources. However, RAG introduces challenges such as retrieval latency, potential errors in document selection, and increased system complexity. With the advent of large language models (LLMs) featuring significantly extended context windows, this paper proposes an alternative paradigm, cache-augmented generation (CAG) that bypasses re...

Find SimilarView on arXiv

In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss

Automatic Summarization of Long Documents

Leave No Document Behind: Benchmarking Long-Context LLMs with Extended Multi-Doc QA

EpMAN: Episodic Memory AttentioN for Generalizing to Longer Contexts

LLM$\times$MapReduce: Simplified Long-Sequence Processing using Large Language Models

MELODI: Exploring Memory Compression for Long Contexts

Retrieval Head Mechanistically Explains Long-Context Factuality

HMT: Hierarchical Memory Transformer for Long Context Language Processing

Structured Token Retention and Computational Memory Paths in Large Language Models

Long Context vs. RAG for LLMs: An Evaluation and Revisits

Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks