MemLong: Memory-Augmented Retrieval for ...

Augmenting Language Models with Long-Term Memory

June 12, 2023

95% Match

Weizhi Wang, Li Dong, Hao Cheng, Xiaodong Liu, Xifeng Yan, ... , Wei Furu

Computation and Language

Existing large language models (LLMs) can only afford fix-sized inputs due to the input length limit, preventing them from utilizing rich long-context information from past inputs. To address this, we propose a framework, Language Models Augmented with Long-Term Memory (LongMem), which enables LLMs to memorize long history. We design a novel decoupled network architecture with the original backbone LLM frozen as a memory encoder and an adaptive residual side-network as a memo...

Find SimilarView on arXiv

M+: Extending MemoryLLM with Scalable Long-Term Memory

February 1, 2025

95% Match

Yu Wang, Dmitry Krotov, Yuanzhe Hu, Yifan Gao, Wangchunshu Zhou, Julian McAuley, Dan Gutfreund, ... , He Zexue

Computation and Language

Equipping large language models (LLMs) with latent-space memory has attracted increasing attention as they can extend the context window of existing language models. However, retaining information from the distant past remains a challenge. For example, MemoryLLM (Wang et al., 2024a), as a representative work with latent-space memory, compresses past information into hidden states across all layers, forming a memory pool of 1B parameters. While effective for sequence lengths u...

Find SimilarView on arXiv

Long-range Language Modeling with Self-retrieval

June 23, 2023

94% Match

Ohad Rubin, Jonathan Berant

Computation and Language

Retrieval-augmented language models (LMs) have received much attention recently. However, typically the retriever is not trained jointly as a native component of the LM, but added to an already-pretrained LM, which limits the ability of the LM and the retriever to adapt to one another. In this work, we propose the Retrieval-Pretrained Transformer (RPT), an architecture and training procedure for jointly training a retrieval-augmented LM from scratch for the task of modeling l...

Find SimilarView on arXiv

Does RAG Really Perform Bad For Long-Context Processing?

February 17, 2025

93% Match

Kun Luo, Zheng Liu, Peitian Zhang, Hongjin Qian, ... , Liu Kang

Computation and Language

The efficient processing of long context poses a serious challenge for large language models (LLMs). Recently, retrieval-augmented generation (RAG) has emerged as a promising strategy for this problem, as it enables LLMs to make selective use of the long context for efficient computation. However, existing RAG approaches lag behind other long-context processing methods due to inherent limitations on inaccurate retrieval and fragmented contexts. To address these challenges, we...

Find SimilarView on arXiv

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

September 16, 2024

93% Match

Di Liu, Meng Chen, Baotong Lu, Huiqiang Jiang, Zhenhua Han, Qianxi Zhang, Qi Chen, Chengruidong Zhang, Bailu Ding, Kai Zhang, Chen Chen, Fan Yang, ... , Qiu Lili

Machine Learning

Computation and Language

Transformer-based Large Language Models (LLMs) have become increasingly important. However, due to the quadratic time complexity of attention computation, scaling LLMs to longer contexts incurs extremely slow inference latency and high GPU memory consumption for caching key-value (KV) vectors. This paper proposes RetrievalAttention, a training-free approach to both accelerate attention computation and reduce GPU memory consumption. By leveraging the dynamic sparsity of attent...

Find SimilarView on arXiv

Retrieval meets Long Context Large Language Models

October 4, 2023

93% Match

Peng Xu, Wei Ping, Xianchao Wu, Lawrence McAfee, Chen Zhu, Zihan Liu, Sandeep Subramanian, Evelina Bakhturina, ... , Catanzaro Bryan

Computation and Language

Artificial Intelligence

Information Retrieval

Machine Learning

Extending the context window of large language models (LLMs) is getting popular recently, while the solution of augmenting LLMs with retrieval has existed for years. The natural questions are: i) Retrieval-augmentation versus long context window, which one is better for downstream tasks? ii) Can both methods be combined to get the best of both worlds? In this work, we answer these questions by studying both solutions using two state-of-the-art pretrained LLMs, i.e., a proprie...

Find SimilarView on arXiv

R$^3$Mem: Bridging Memory Retention and Retrieval via Reversible Compression

February 21, 2025

93% Match

Xiaoqiang Wang, Suyuchen Wang, ... , Liu Bang

Computation and Language

Artificial Intelligence

Memory plays a key role in enhancing LLMs' performance when deployed to real-world applications. Existing solutions face trade-offs: explicit memory designs based on external storage require complex management and incur storage overhead, while implicit memory designs that store information via parameters struggle with reliable retrieval. In this paper, we propose R$^3$Mem, a memory network that optimizes both information Retention and Retrieval through Reversible context comp...

Find SimilarView on arXiv

EpMAN: Episodic Memory AttentioN for Generalizing to Longer Contexts

February 20, 2025

93% Match

Subhajit Chaudhury, Payel Das, Sarathkrishna Swaminathan, Georgios Kollias, Elliot Nelson, Khushbu Pahwa, Tejaswini Pedapati, ... , Riemer Matthew

Computation and Language

Artificial Intelligence

Recent advances in Large Language Models (LLMs) have yielded impressive successes on many language tasks. However, efficient processing of long contexts using LLMs remains a significant challenge. We introduce \textbf{EpMAN} -- a method for processing long contexts in an \textit{episodic memory} module while \textit{holistically attending to} semantically relevant context chunks. The output of \textit{episodic attention} is then used to reweigh the decoder's self-attention to...

Find SimilarView on arXiv

MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory

April 17, 2024

93% Match

Ali Modarressi, Abdullatif Köksal, Ayyoob Imani, ... , Schütze Hinrich

Computation and Language

While current large language models (LLMs) demonstrate some capabilities in knowledge-intensive tasks, they are limited by relying on their parameters as an implicit storage mechanism. As a result, they struggle with infrequent knowledge and temporal degradation. In addition, the uninterpretable nature of parametric memorization makes it challenging to understand and prevent hallucination. Parametric memory pools and model editing are only partial solutions. Retrieval Augment...

Find SimilarView on arXiv

$\text{Memory}^3$: Language Modeling with Explicit Memory

July 1, 2024

93% Match

Hongkang Yang, Zehao Lin, Wenjin Wang, Hao Wu, Zhiyu Li, Bo Tang, Wenqiang Wei, Jinbo Wang, Zeyun Tang, Shichao Song, Chenyang Xi, Yu Yu, Kai Chen, Feiyu Xiong, ... , E Weinan

Computation and Language

Artificial Intelligence

Machine Learning

The training and inference of large language models (LLMs) are together a costly process that transports knowledge from raw data to meaningful computation. Inspired by the memory hierarchy of the human brain, we reduce this cost by equipping LLMs with explicit memory, a memory format cheaper than model parameters and text retrieval-augmented generation (RAG). Conceptually, with most of its knowledge externalized to explicit memories, the LLM can enjoy a smaller parameter size...

Find SimilarView on arXiv

MemLong: Memory-Augmented Retrieval for Long Text Modeling

Augmenting Language Models with Long-Term Memory

M+: Extending MemoryLLM with Scalable Long-Term Memory

Long-range Language Modeling with Self-retrieval

Does RAG Really Perform Bad For Long-Context Processing?

RetrievalAttention: Accelerating Long-Context LLM Inference via Vector Retrieval

Retrieval meets Long Context Large Language Models

R$^3$Mem: Bridging Memory Retention and Retrieval via Reversible Compression

EpMAN: Episodic Memory AttentioN for Generalizing to Longer Contexts

MemLLM: Finetuning LLMs to Use An Explicit Read-Write Memory

$\text{Memory}^3$: Language Modeling with Explicit Memory