M+: Extending MemoryLLM with Scalable Lo...

LIFT: Improving Long Context Understanding of Large Language Models through Long Input Fine-Tuning

February 20, 2025

92% Match

Yansheng Mao, Yufei Xu, Jiaqi Li, Fanxu Meng, Haotong Yang, Zilong Zheng, ... , Zhang Muhan

Computation and Language

Long context understanding remains challenging for large language models due to their limited context windows. This paper presents Long Input Fine-Tuning (LIFT), a novel framework for long-context modeling that can improve the long-context performance of arbitrary (short-context) LLMs by dynamically adapting model parameters based on the long input. Importantly, LIFT, rather than endlessly extending the context window size to accommodate increasingly longer inputs in context,...

Find SimilarView on arXiv

Retrieval Head Mechanistically Explains Long-Context Factuality

April 24, 2024

92% Match

Wenhao Wu, Yizhong Wang, Guangxuan Xiao, ... , Fu Yao

Computation and Language

Despite the recent progress in long-context language models, it remains elusive how transformer-based models exhibit the capability to retrieve relevant information from arbitrary locations within the long context. This paper aims to address this question. Our systematic investigation across a wide spectrum of models reveals that a special type of attention heads are largely responsible for retrieving information, which we dub retrieval heads. We identify intriguing propertie...

Find SimilarView on arXiv

Scaling Transformer to 1M tokens and beyond with RMT

April 19, 2023

92% Match

Aydar Bulatov, Yuri Kuratov, Mikhail S. Burtsev

Computation and Language

Artificial Intelligence

Machine Learning

This technical report presents the application of a recurrent memory to extend the context length of BERT, one of the most effective Transformer-based models in natural language processing. By leveraging the Recurrent Memory Transformer architecture, we have successfully increased the model's effective context length to an unprecedented two million tokens, while maintaining high memory retrieval accuracy. Our method allows for the storage and processing of both local and glob...

Find Similar View on arXiv

In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss

February 16, 2024

92% Match

Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Dmitry Sorokin, ... , Burtsev Mikhail

Computation and Language

Artificial Intelligence

Machine Learning

This paper addresses the challenge of processing long documents using generative transformer models. To evaluate different approaches, we introduce BABILong, a new benchmark designed to assess model capabilities in extracting and processing distributed facts within extensive texts. Our evaluation, which includes benchmarks for GPT-4 and RAG, reveals that common methods are effective only for sequences up to $10^4$ elements. In contrast, fine-tuning GPT-2 with recurrent memory...

Find SimilarView on arXiv

UniMem: Towards a Unified View of Long-Context Large Language Models

February 5, 2024

92% Match

Junjie Fang, Likai Tang, Hongzhe Bi, Yujia Qin, Si Sun, Zhenyu Li, Haolun Li, Yongjian Li, Xin Cong, Yukun Yan, Xiaodong Shi, Sen Song, Yankai Lin, ... , Sun Maosong

Computation and Language

Artificial Intelligence

Long-context processing is a critical ability that constrains the applicability of large language models. Although there exist various methods devoted to enhancing the long-context processing ability of large language models (LLMs), they are developed in an isolated manner and lack systematic analysis and integration of their strengths, hindering further developments. In this paper, we introduce UniMem, a unified framework that reformulates existing long-context methods from ...

Find SimilarView on arXiv

Extended Mind Transformers

June 4, 2024

92% Match

Phoebe Klett, Thomas Ahle

Machine Learning

Computation and Language

Pre-trained language models demonstrate general intelligence and common sense, but long inputs quickly become a bottleneck for memorizing information at inference time. We resurface a simple method, Memorizing Transformers (Wu et al., 2022), that gives the model access to a bank of pre-computed memories. We show that it is possible to fix many of the shortcomings of the original method, such as the need for fine-tuning, by critically assessing how positional encodings should ...

Find SimilarView on arXiv

Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

July 23, 2024

92% Match

Zhuowan Li, Cheng Li, Mingyang Zhang, ... , Bendersky Michael

Computation and Language

Artificial Intelligence

Machine Learning

Retrieval Augmented Generation (RAG) has been a powerful tool for Large Language Models (LLMs) to efficiently process overly lengthy contexts. However, recent LLMs like Gemini-1.5 and GPT-4 show exceptional capabilities to understand long contexts directly. We conduct a comprehensive comparison between RAG and long-context (LC) LLMs, aiming to leverage the strengths of both. We benchmark RAG and LC across various public datasets using three latest LLMs. Results reveal that wh...

Find SimilarView on arXiv

$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens

February 21, 2024

92% Match

Xinrong Zhang, Yingfa Chen, Shengding Hu, Zihang Xu, Junhao Chen, Moo Khai Hao, Xu Han, Zhen Leng Thai, Shuo Wang, ... , Sun Maosong

Computation and Language

Processing and reasoning over long contexts is crucial for many practical applications of Large Language Models (LLMs), such as document comprehension and agent construction. Despite recent strides in making LLMs process contexts with more than 100K tokens, there is currently a lack of a standardized benchmark to evaluate this long-context capability. Existing public benchmarks typically focus on contexts around 10K tokens, limiting the assessment and comparison of LLMs in pr...

Find SimilarView on arXiv

XL$^2$Bench: A Benchmark for Extremely Long Context Understanding with Long-range Dependencies

April 8, 2024

92% Match

Xuanfan Ni, Hengyi Cai, Xiaochi Wei, Shuaiqiang Wang, ... , Li Piji

Computation and Language

Large Language Models (LLMs) have demonstrated remarkable performance across diverse tasks but are constrained by their small context window sizes. Various efforts have been proposed to expand the context window to accommodate even up to 200K input tokens. Meanwhile, building high-quality benchmarks with much longer text lengths and more demanding tasks to provide comprehensive evaluations is of immense practical interest to facilitate long context understanding research of L...

Find SimilarView on arXiv

LM2: Large Memory Models

February 9, 2025

92% Match

Jikun Kang, Wenqi Wu, Filippos Christianos, Alex J. Chan, Fraser Greenlee, George Thomas, ... , Toulis Andy

Computation and Language

Artificial Intelligence

This paper introduces the Large Memory Model (LM2), a decoder-only Transformer architecture enhanced with an auxiliary memory module that aims to address the limitations of standard Transformers in multi-step reasoning, relational argumentation, and synthesizing information distributed over long contexts. The proposed LM2 incorporates a memory module that acts as a contextual representation repository, interacting with input tokens via cross attention and updating through gat...

Find SimilarView on arXiv

M+: Extending MemoryLLM with Scalable Long-Term Memory

LIFT: Improving Long Context Understanding of Large Language Models through Long Input Fine-Tuning

Retrieval Head Mechanistically Explains Long-Context Factuality

Scaling Transformer to 1M tokens and beyond with RMT

In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss

UniMem: Towards a Unified View of Long-Context Large Language Models

Extended Mind Transformers

Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

$\infty$Bench: Extending Long Context Evaluation Beyond 100K Tokens

XL$^2$Bench: A Benchmark for Extremely Long Context Understanding with Long-range Dependencies

LM2: Large Memory Models