M+: Extending MemoryLLM with Scalable Lo...

Retrieval meets Long Context Large Language Models

October 4, 2023

93% Match

Peng Xu, Wei Ping, Xianchao Wu, Lawrence McAfee, Chen Zhu, Zihan Liu, Sandeep Subramanian, Evelina Bakhturina, ... , Catanzaro Bryan

Computation and Language

Artificial Intelligence

Information Retrieval

Machine Learning

Extending the context window of large language models (LLMs) is getting popular recently, while the solution of augmenting LLMs with retrieval has existed for years. The natural questions are: i) Retrieval-augmentation versus long context window, which one is better for downstream tasks? ii) Can both methods be combined to get the best of both worlds? In this work, we answer these questions by studying both solutions using two state-of-the-art pretrained LLMs, i.e., a proprie...

Find SimilarView on arXiv

Memorizing Documents with Guidance in Large Language Models

June 23, 2024

93% Match

Bumjin Park, Jaesik Choi

Computation and Language

Artificial Intelligence

Training data plays a pivotal role in AI models. Large language models (LLMs) are trained with massive amounts of documents, and their parameters hold document-related contents. Recently, several studies identified content-specific locations in LLMs by examining the parameters. Instead of the post hoc interpretation, we propose another approach. We propose document-wise memory architecture to track document memories in training. The proposed architecture maps document represe...

Find SimilarView on arXiv

Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization

January 15, 2024

93% Match

Ninglu Shao, Shitao Xiao, ... , Zhang Peitian

Computation and Language

Large language models (LLMs) are in need of sufficient contexts to handle many critical applications, such as retrieval augmented generation and few-shot learning. However, due to the constrained window size, the LLMs can only access to the information within a limited context. Although the size of context window can be extended by fine-tuning, it will result in a substantial cost in both training and inference stage. In this paper, we present Extensible Tokenization as an al...

Find SimilarView on arXiv

Long Context RAG Performance of Large Language Models

November 5, 2024

93% Match

Quinn Leng, Jacob Portes, Sam Havens, ... , Carbin Michael

Machine Learning

Computation and Language

Retrieval Augmented Generation (RAG) has emerged as a crucial technique for enhancing the accuracy of Large Language Models (LLMs) by incorporating external information. With the advent of LLMs that support increasingly longer context lengths, there is a growing interest in understanding how these models perform in RAG scenarios. Can these new long context models improve RAG performance? This paper presents a comprehensive study of the impact of increased context length on RA...

Find SimilarView on arXiv

Long Context vs. RAG for LLMs: An Evaluation and Revisits

December 27, 2024

93% Match

Xinze Li, Yixin Cao, ... , Sun Aixin

Computation and Language

Extending context windows (i.e., Long Context, LC) and using retrievers to selectively access relevant information (i.e., Retrieval-Augmented Generation, RAG) are the two main strategies to enable LLMs to incorporate extremely long external contexts. This paper revisits recent studies on this topic, highlighting their key insights and discrepancies. We then provide a more comprehensive evaluation by filtering out questions answerable without external context, identifying the ...

Find SimilarView on arXiv

Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG

October 8, 2024

93% Match

Bowen Jin, Jinsung Yoon, ... , Arik Sercan O.

Computation and Language

Artificial Intelligence

Machine Learning

Retrieval-augmented generation (RAG) empowers large language models (LLMs) to utilize external knowledge sources. The increasing capacity of LLMs to process longer input sequences opens up avenues for providing more retrieved information, to potentially enhance the quality of generated outputs. It is plausible to assume that a larger retrieval set would contain more relevant information (higher recall), that might result in improved performance. However, our empirical finding...

Find SimilarView on arXiv

Leveraging Memory Retrieval to Enhance LLM-based Generative Recommendation

December 23, 2024

92% Match

Chengbing Wang, Yang Zhang, Fengbin Zhu, Jizhi Zhang, ... , Feng Fuli

Information Retrieval

Leveraging Large Language Models (LLMs) to harness user-item interaction histories for item generation has emerged as a promising paradigm in generative recommendation. However, the limited context window of LLMs often restricts them to focusing on recent user interactions only, leading to the neglect of long-term interests involved in the longer histories. To address this challenge, we propose a novel Automatic Memory-Retrieval framework (AutoMR), which is capable of storing...

Find SimilarView on arXiv

RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation

December 16, 2024

92% Match

Xiaoxi Li, Jiajie Jin, Yujia Zhou, Yongkang Wu, Zhonghua Li, ... , Dou Zhicheng

Computation and Language

Artificial Intelligence

Information Retrieval

Large language models (LLMs) exhibit remarkable generative capabilities but often suffer from hallucinations. Retrieval-augmented generation (RAG) offers an effective solution by incorporating external knowledge, but existing methods still face several limitations: additional deployment costs of separate retrievers, redundant input tokens from retrieved text chunks, and the lack of joint optimization of retrieval and generation. To address these issues, we propose \textbf{Ret...

Find SimilarView on arXiv

Does RAG Really Perform Bad For Long-Context Processing?

February 17, 2025

92% Match

Kun Luo, Zheng Liu, Peitian Zhang, Hongjin Qian, ... , Liu Kang

Computation and Language

The efficient processing of long context poses a serious challenge for large language models (LLMs). Recently, retrieval-augmented generation (RAG) has emerged as a promising strategy for this problem, as it enables LLMs to make selective use of the long context for efficient computation. However, existing RAG approaches lag behind other long-context processing methods due to inherent limitations on inaccurate retrieval and fragmented contexts. To address these challenges, we...

Find SimilarView on arXiv

Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing

February 18, 2025

92% Match

Xiaoju Ye, Zhichun Wang, Jingyuan Wang

Computation and Language

Limited by the context window size of Large Language Models(LLMs), handling various tasks with input tokens exceeding the upper limit has been challenging, whether it is a simple direct retrieval task or a complex multi-hop reasoning task. Although various methods have been proposed to enhance the long-context processing capabilities of LLMs, they either incur substantial post-training costs, or require additional tool modules(e.g.,RAG), or have not shown significant improvem...

Find SimilarView on arXiv

M+: Extending MemoryLLM with Scalable Long-Term Memory

Retrieval meets Long Context Large Language Models

Memorizing Documents with Guidance in Large Language Models

Flexibly Scaling Large Language Models Contexts Through Extensible Tokenization

Long Context RAG Performance of Large Language Models

Long Context vs. RAG for LLMs: An Evaluation and Revisits

Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG

Leveraging Memory Retrieval to Enhance LLM-based Generative Recommendation

RetroLLM: Empowering Large Language Models to Retrieve Fine-grained Evidence within Generation

Does RAG Really Perform Bad For Long-Context Processing?

Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing