R$^3$Mem: Bridging Memory Retention and ...

Memorizing Documents with Guidance in Large Language Models

June 23, 2024

91% Match

Bumjin Park, Jaesik Choi

Computation and Language

Artificial Intelligence

Training data plays a pivotal role in AI models. Large language models (LLMs) are trained with massive amounts of documents, and their parameters hold document-related contents. Recently, several studies identified content-specific locations in LLMs by examining the parameters. Instead of the post hoc interpretation, we propose another approach. We propose document-wise memory architecture to track document memories in training. The proposed architecture maps document represe...

Find SimilarView on arXiv

An Evolved Universal Transformer Memory

October 17, 2024

91% Match

Edoardo Cetin, Qi Sun, ... , Tang Yujin

Machine Learning

Artificial Intelligence

Computation and Language

Prior methods propose to offset the escalating costs of modern foundation models by dropping specific parts of their contexts with hand-designed rules, while attempting to preserve their original performance. We overcome this trade-off with Neural Attention Memory Models (NAMMs), introducing a learned network for memory management that improves both the performance and efficiency of transformers. We evolve NAMMs atop pre-trained transformers to provide different latent contex...

Find SimilarView on arXiv

From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs

October 17, 2024

91% Match

Alireza Rezazadeh, Zichao Li, ... , Bao Yujia

Computation and Language

Artificial Intelligence

Machine Learning

Recent advancements in large language models have significantly improved their context windows, yet challenges in effective long-term memory management remain. We introduce MemTree, an algorithm that leverages a dynamic, tree-structured memory representation to optimize the organization, retrieval, and integration of information, akin to human cognitive schemas. MemTree organizes memory hierarchically, with each node encapsulating aggregated textual content, corresponding sem...

Find SimilarView on arXiv

Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

May 25, 2024

91% Match

Yun Zhu, Jia-Chen Gu, Caitlin Sikora, Ho Ko, Yinxiao Liu, Chu-Cheng Lin, Lei Shu, Liangchen Luo, Lei Meng, ... , Chen Jindong

Computation and Language

Large language models (LLMs) augmented with retrieval exhibit robust performance and extensive versatility by incorporating external contexts. However, the input length grows linearly in the number of retrieved documents, causing a dramatic increase in latency. In this paper, we propose a novel paradigm named Sparse RAG, which seeks to cut computation costs through sparsity. Specifically, Sparse RAG encodes retrieved documents in parallel, which eliminates latency introduced ...

Find SimilarView on arXiv

Stateful Memory-Augmented Transformers for Efficient Dialogue Modeling

September 15, 2022

91% Match

Qingyang Wu, Zhou Yu

Computation and Language

Transformer encoder-decoder models have achieved great performance in dialogue generation tasks, however, their inability to process long dialogue history often leads to truncation of the context To address this problem, we propose a novel memory-augmented transformer that is compatible with existing pre-trained encoder-decoder models and enables efficient preservation of the dialogue history information. By incorporating a separate memory module alongside the pre-trained tra...

Find SimilarView on arXiv

Semantic Compression With Large Language Models

April 25, 2023

91% Match

Henry Gilbert, Michael Sandborn, Douglas C. Schmidt, ... , White Jules

Artificial Intelligence

The rise of large language models (LLMs) is revolutionizing information retrieval, question answering, summarization, and code generation tasks. However, in addition to confidently presenting factually inaccurate information at times (known as "hallucinations"), LLMs are also inherently limited by the number of input and output tokens that can be processed at once, making them potentially less effective on tasks that require processing a large set or continuous stream of info...

Find SimilarView on arXiv

In-context Autoencoder for Context Compression in a Large Language Model

July 13, 2023

91% Match

Tao Ge, Jing Hu, Lei Wang, Xun Wang, ... , Wei Furu

Computation and Language

Artificial Intelligence

Machine Learning

We propose the In-context Autoencoder (ICAE), leveraging the power of a large language models (LLM) to compress a long context into short compact memory slots that can be directly conditioned on by the LLM for various purposes. ICAE is first pretrained using both autoencoding and language modeling objectives on massive text data, enabling it to generate memory slots that accurately and comprehensively represent the original context; Then, it is fine-tuned on instruction data ...

Find SimilarView on arXiv

Autonomous Structural Memory Manipulation for Large Language Models Using Hierarchical Embedding Augmentation

January 23, 2025

91% Match

Derek Yotheringhay, Alistair Kirkland, ... , Whitesteeple Josiah

Computation and Language

Artificial Intelligence

Transformative innovations in model architectures have introduced hierarchical embedding augmentation as a means to redefine the representation of tokens through multi-level semantic structures, offering enhanced adaptability to complex linguistic inputs. Autonomous structural memory manipulation further advances this paradigm through dynamic memory reallocation mechanisms that prioritize critical contextual features while suppressing less relevant information, enabling scala...

Find SimilarView on arXiv

Does RAG Really Perform Bad For Long-Context Processing?

February 17, 2025

91% Match

Kun Luo, Zheng Liu, Peitian Zhang, Hongjin Qian, ... , Liu Kang

Computation and Language

The efficient processing of long context poses a serious challenge for large language models (LLMs). Recently, retrieval-augmented generation (RAG) has emerged as a promising strategy for this problem, as it enables LLMs to make selective use of the long context for efficient computation. However, existing RAG approaches lag behind other long-context processing methods due to inherent limitations on inaccurate retrieval and fragmented contexts. To address these challenges, we...

Find SimilarView on arXiv

Beyond Words: A Latent Memory Approach to Internal Reasoning in LLMs

February 28, 2025

91% Match

José I. Orlicki

Computation and Language

Artificial Intelligence

Recent advances in large language models (LLMs) have popularized the chain-of-thought (CoT) paradigm, in which models produce explicit reasoning steps in natural language. Although this approach improves interpretability and facilitates external auditing, it may not represent the most computationally efficient method for internal reasoning. In contrast, human cognition relies on implicit mental representations that recall past sensory and episodic information without requirin...

Find SimilarView on arXiv

R$^3$Mem: Bridging Memory Retention and Retrieval via Reversible Compression

Memorizing Documents with Guidance in Large Language Models

An Evolved Universal Transformer Memory

From Isolated Conversations to Hierarchical Schemas: Dynamic Tree Memory Representation for LLMs

Accelerating Inference of Retrieval-Augmented Generation via Sparse Context Selection

Stateful Memory-Augmented Transformers for Efficient Dialogue Modeling

Semantic Compression With Large Language Models

In-context Autoencoder for Context Compression in a Large Language Model

Autonomous Structural Memory Manipulation for Large Language Models Using Hierarchical Embedding Augmentation

Does RAG Really Perform Bad For Long-Context Processing?

Beyond Words: A Latent Memory Approach to Internal Reasoning in LLMs