Contextual Memory Reweaving in Large Language Models Using Layered Latent State Reconstruction

February 4, 2025

View on ArXiv

LightThinker: Thinking Step-by-Step Compression

February 21, 2025

91% Match

Jintian Zhang, Yuqi Zhu, Mengshu Sun, Yujie Luo, Shuofei Qiao, Lun Du, Da Zheng, ... , Zhang Ningyu

Computation and Language

Artificial Intelligence

Information Retrieval

Machine Learning

Multimedia

Large language models (LLMs) have shown remarkable performance in complex reasoning tasks, but their efficiency is hindered by the substantial memory and computational costs associated with generating lengthy tokens. In this paper, we propose LightThinker, a novel method that enables LLMs to dynamically compress intermediate thoughts during reasoning. Inspired by human cognitive processes, LightThinker compresses verbose thought steps into compact representations and discards...

Find SimilarView on arXiv

Writing in the Margins: Better Inference Pattern for Long Context Retrieval

August 27, 2024

91% Match

Melisa Russak, Umar Jamil, Christopher Bryant, Kiran Kamble, Axel Magnuson, ... , AlShikh Waseem

Computation and Language

Information Retrieval

In this paper, we introduce Writing in the Margins (WiM), a new inference pattern for Large Language Models designed to optimize the handling of long input sequences in retrieval-oriented tasks. This approach leverages the chunked prefill of the key-value cache to perform segment-wise inference, which enables efficient processing of extensive contexts along with the generation and classification of intermediate information ("margins") that guide the model towards specific tas...

Find SimilarView on arXiv

Long Context RAG Performance of Large Language Models

November 5, 2024

91% Match

Quinn Leng, Jacob Portes, Sam Havens, ... , Carbin Michael

Machine Learning

Computation and Language

Retrieval Augmented Generation (RAG) has emerged as a crucial technique for enhancing the accuracy of Large Language Models (LLMs) by incorporating external information. With the advent of LLMs that support increasingly longer context lengths, there is a growing interest in understanding how these models perform in RAG scenarios. Can these new long context models improve RAG performance? This paper presents a comprehensive study of the impact of increased context length on RA...

Find SimilarView on arXiv

Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell

June 20, 2024

91% Match

Taiming Lu, Muhan Gao, Kuai Yu, ... , Khashabi Daniel

Computation and Language

Large Language Models (LLMs) exhibit positional bias, struggling to utilize information from the middle or end of long contexts. Our study explores LLMs' long-context reasoning by probing their hidden representations. We find that while LLMs encode the position of target information, they often fail to leverage this in generating accurate responses. This reveals a disconnect between information retrieval and utilization, a "know but don't tell" phenomenon. We further analyze ...

Find SimilarView on arXiv

Emulating Retrieval Augmented Generation via Prompt Engineering for Enhanced Long Context Comprehension in LLMs

February 18, 2025

91% Match

Joon Park, Kyohei Atarashi, ... , Kashima Hisashi

Computation and Language

This paper addresses the challenge of comprehending very long contexts in Large Language Models (LLMs) by proposing a method that emulates Retrieval Augmented Generation (RAG) through specialized prompt engineering and chain-of-thought (CoT) reasoning. While recent LLMs support over 100,000 tokens in a single prompt, simply enlarging context windows has not guaranteed robust multi-hop reasoning when key details are scattered across massive input. Our approach treats the model...

Find SimilarView on arXiv

From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs

October 17, 2024

91% Match

Alireza Rezazadeh, Zichao Li, ... , Bao Yujia

Computation and Language

Artificial Intelligence

Machine Learning

Recent advancements in large language models have significantly improved their context windows, yet challenges in effective long-term memory management remain. We introduce MemTree, an algorithm that leverages a dynamic, tree-structured memory representation to optimize the organization, retrieval, and integration of information, akin to human cognitive schemas. MemTree organizes memory hierarchically, with each node encapsulating aggregated textual content, corresponding sem...

Find SimilarView on arXiv

HMT: Hierarchical Memory Transformer for Long Context Language Processing

May 9, 2024

91% Match

Zifan He, Zongyue Qin, Neha Prakriya, ... , Cong Jason

Computation and Language

Machine Learning

Transformer-based large language models (LLM) have been widely used in language processing applications. However, most of them restrict the context window that permits the model to attend to every token in the inputs. Previous works in recurrent models can memorize past tokens to enable unlimited context and maintain effectiveness. However, they have "flat" memory architectures, which have limitations in selecting and filtering information. Since humans are good at learning a...

Find SimilarView on arXiv

Understanding Synthetic Context Extension via Retrieval Heads

October 29, 2024

91% Match

Xinyu Zhao, Fangcong Yin, Greg Durrett

Computation and Language

Long-context LLMs are increasingly in demand for applications such as retrieval-augmented generation. To defray the cost of pretraining LLMs over long contexts, recent work takes an approach of synthetic context extension: fine-tuning LLMs with synthetically generated long-context data in a post-training stage. However, it remains unclear how and why this synthetic context extension imparts abilities for downstream long-context tasks. In this paper, we investigate fine-tuning...

Find SimilarView on arXiv

Rethinking with Retrieval: Faithful Large Language Model Inference

December 31, 2022

91% Match

Hangfeng He, Hongming Zhang, Dan Roth

Computation and Language

Artificial Intelligence

Despite the success of large language models (LLMs) in various natural language processing (NLP) tasks, the stored knowledge in these models may inevitably be incomplete, out-of-date, or incorrect. This motivates the need to utilize external knowledge to assist LLMs. Unfortunately, current methods for incorporating external knowledge often require additional training or fine-tuning, which can be costly and may not be feasible for LLMs. To address this issue, we propose a nove...

Find SimilarView on arXiv

Long Context vs. RAG for LLMs: An Evaluation and Revisits

December 27, 2024

91% Match

Xinze Li, Yixin Cao, ... , Sun Aixin

Computation and Language

Extending context windows (i.e., Long Context, LC) and using retrievers to selectively access relevant information (i.e., Retrieval-Augmented Generation, RAG) are the two main strategies to enable LLMs to incorporate extremely long external contexts. This paper revisits recent studies on this topic, highlighting their key insights and discrepancies. We then provide a more comprehensive evaluation by filtering out questions answerable without external context, identifying the ...

Find SimilarView on arXiv