MemLong: Memory-Augmented Retrieval for ...

Efficient Long-range Language Modeling with Self-supervised Causal Retrieval

October 2, 2024

93% Match

Xiang Hu, Zhihao Teng, ... , Tu Kewei

Computation and Language

Artificial Intelligence

Recently, retrieval-based language models (RLMs) have received much attention. However, most of them leverage a pre-trained retriever with fixed parameters, which may not adapt well to causal language models. In this work, we propose Grouped Cross-Attention, a novel module enabling joint pre-training of the retriever and causal LM, and apply it to long-context modeling. For a given input sequence, we split it into chunks and use the current chunk to retrieve past chunks for s...

Find SimilarView on arXiv

Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

July 23, 2024

93% Match

Zhuowan Li, Cheng Li, Mingyang Zhang, ... , Bendersky Michael

Computation and Language

Artificial Intelligence

Machine Learning

Retrieval Augmented Generation (RAG) has been a powerful tool for Large Language Models (LLMs) to efficiently process overly lengthy contexts. However, recent LLMs like Gemini-1.5 and GPT-4 show exceptional capabilities to understand long contexts directly. We conduct a comprehensive comparison between RAG and long-context (LC) LLMs, aiming to leverage the strengths of both. We benchmark RAG and LC across various public datasets using three latest LLMs. Results reveal that wh...

Find SimilarView on arXiv

Long Context RAG Performance of Large Language Models

November 5, 2024

93% Match

Quinn Leng, Jacob Portes, Sam Havens, ... , Carbin Michael

Machine Learning

Computation and Language

Retrieval Augmented Generation (RAG) has emerged as a crucial technique for enhancing the accuracy of Large Language Models (LLMs) by incorporating external information. With the advent of LLMs that support increasingly longer context lengths, there is a growing interest in understanding how these models perform in RAG scenarios. Can these new long context models improve RAG performance? This paper presents a comprehensive study of the impact of increased context length on RA...

Find SimilarView on arXiv

Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing

February 18, 2025

92% Match

Xiaoju Ye, Zhichun Wang, Jingyuan Wang

Computation and Language

Limited by the context window size of Large Language Models(LLMs), handling various tasks with input tokens exceeding the upper limit has been challenging, whether it is a simple direct retrieval task or a complex multi-hop reasoning task. Although various methods have been proposed to enhance the long-context processing capabilities of LLMs, they either incur substantial post-training costs, or require additional tool modules(e.g.,RAG), or have not shown significant improvem...

Find SimilarView on arXiv

A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts

October 2, 2024

92% Match

Suyu Ge, Xihui Lin, Yunan Zhang, ... , Peng Hao

Computation and Language

Training and serving long-context large language models (LLMs) incurs substantial overhead. To address this, two critical steps are often required: a pretrained LLM typically undergoes a separate stage for context length extension by training on long-context data, followed by architectural modifications to reduce the overhead of KV cache during serving. This paper argues that integrating length extension with a GPU-friendly KV cache reduction architecture not only reduces tra...

Find SimilarView on arXiv

FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference

May 7, 2024

92% Match

Runheng Liu, Xingchen Xiao, Heyan Huang, ... , Wu Zhijing

Computation and Language

Retrieval-Augmented Language Modeling (RALM) by integrating large language models (LLM) with relevant documents from an external corpus is a proven method for enabling the LLM to generate information beyond the scope of its pre-training corpus. Previous work utilizing retrieved content by simply prepending it to the input poses a high runtime issue, which degrades the inference efficiency of the LLMs because they fail to use the Key-Value (KV) cache efficiently. In this paper...

Find SimilarView on arXiv

Contextual Memory Reweaving in Large Language Models Using Layered Latent State Reconstruction

February 4, 2025

92% Match

Frederick Dillon, Gregor Halvorsen, Simon Tattershall, ... , Vanderpool Gareth

Computation and Language

Memory retention challenges in deep neural architectures have ongoing limitations in the ability to process and recall extended contextual information. Token dependencies degrade as sequence length increases, leading to a decline in coherence and factual consistency across longer outputs. A structured approach is introduced to mitigate this issue through the reweaving of latent states captured at different processing layers, reinforcing token representations over extended seq...

Find SimilarView on arXiv

FocusLLM: Scaling LLM's Context by Parallel Decoding

August 21, 2024

92% Match

Zhenyu Li, Yike Zhang, Tengyu Pan, Yutao Sun, Zhichao Duan, Junjie Fang, Rong Han, ... , Wang Jianyong

Computation and Language

Artificial Intelligence

Empowering LLMs with the ability to utilize useful information from a long context is crucial for many downstream applications. However, achieving long context lengths with the conventional transformer architecture requires substantial training and inference resources. In this paper, we present FocusLLM, a framework designed to extend the context length of any decoder-only LLM, enabling the model to focus on relevant information from very long sequences. FocusLLM processes lo...

Find SimilarView on arXiv

Long Context vs. RAG for LLMs: An Evaluation and Revisits

December 27, 2024

92% Match

Xinze Li, Yixin Cao, ... , Sun Aixin

Computation and Language

Extending context windows (i.e., Long Context, LC) and using retrievers to selectively access relevant information (i.e., Retrieval-Augmented Generation, RAG) are the two main strategies to enable LLMs to incorporate extremely long external contexts. This paper revisits recent studies on this topic, highlighting their key insights and discrepancies. We then provide a more comprehensive evaluation by filtering out questions answerable without external context, identifying the ...

Find SimilarView on arXiv

Long-Context Language Modeling with Parallel Context Encoding

February 26, 2024

92% Match

Howard Yen, Tianyu Gao, Danqi Chen

Computation and Language

Extending large language models (LLMs) to process longer inputs is crucial for numerous applications. However, the considerable computational cost of transformers, coupled with limited generalization of positional encoding, restricts the size of their context window. We introduce Context Expansion with Parallel Encoding (CEPE), a framework that can be applied to any existing decoder-only LLMs to extend their context window. CEPE adopts a small encoder to process long inputs c...

Find SimilarView on arXiv

MemLong: Memory-Augmented Retrieval for Long Text Modeling

Efficient Long-range Language Modeling with Self-supervised Causal Retrieval

Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

Long Context RAG Performance of Large Language Models

Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing

A Little Goes a Long Way: Efficient Long Context Training and Inference with Partial Contexts

FlashBack:Efficient Retrieval-Augmented Language Modeling for Long Context Inference

Contextual Memory Reweaving in Large Language Models Using Layered Latent State Reconstruction

FocusLLM: Scaling LLM's Context by Parallel Decoding

Long Context vs. RAG for LLMs: An Evaluation and Revisits

Long-Context Language Modeling with Parallel Context Encoding