ID: 2405.06067

HMT: Hierarchical Memory Transformer for Long Context Language Processing

May 9, 2024

View on ArXiv

Similar papers 2

Autonomous Structural Memory Manipulation for Large Language Models Using Hierarchical Embedding Augmentation

January 23, 2025

91% Match
Derek Yotheringhay, Alistair Kirkland, ... , Whitesteeple Josiah
Computation and Language
Artificial Intelligence

Transformative innovations in model architectures have introduced hierarchical embedding augmentation as a means to redefine the representation of tokens through multi-level semantic structures, offering enhanced adaptability to complex linguistic inputs. Autonomous structural memory manipulation further advances this paradigm through dynamic memory reallocation mechanisms that prioritize critical contextual features while suppressing less relevant information, enabling scala...

Find SimilarView on arXiv

EpMAN: Episodic Memory AttentioN for Generalizing to Longer Contexts

February 20, 2025

91% Match
Subhajit Chaudhury, Payel Das, Sarathkrishna Swaminathan, Georgios Kollias, Elliot Nelson, Khushbu Pahwa, Tejaswini Pedapati, ... , Riemer Matthew
Computation and Language
Artificial Intelligence

Recent advances in Large Language Models (LLMs) have yielded impressive successes on many language tasks. However, efficient processing of long contexts using LLMs remains a significant challenge. We introduce \textbf{EpMAN} -- a method for processing long contexts in an \textit{episodic memory} module while \textit{holistically attending to} semantically relevant context chunks. The output of \textit{episodic attention} is then used to reweigh the decoder's self-attention to...

Find SimilarView on arXiv

Parallel Context Windows for Large Language Models

December 21, 2022

90% Match
Nir Ratner, Yoav Levine, Yonatan Belinkov, Ori Ram, Inbal Magar, Omri Abend, Ehud Karpas, Amnon Shashua, ... , Shoham Yoav
Computation and Language

When applied to processing long text, Large Language Models (LLMs) are limited by their context window. Existing efforts to address this limitation involve training specialized architectures, and cannot be easily applied to off-the-shelf LLMs. We present Parallel Context Windows (PCW), a method that alleviates the context window restriction for any off-the-shelf LLM without further training. The key to the approach is to carve a long context into chunks (``windows''), restric...

Find SimilarView on arXiv

MELODI: Exploring Memory Compression for Long Contexts

October 4, 2024

90% Match
Yinpeng Chen, DeLesley Hutchins, Aren Jansen, Andrey Zhmoginov, ... , Andersen Jesper
Machine Learning
Artificial Intelligence

We present MELODI, a novel memory architecture designed to efficiently process long documents using short context windows. The key principle behind MELODI is to represent short-term and long-term memory as a hierarchical compression scheme across both network layers and context windows. Specifically, the short-term memory is achieved through recurrent compression of context windows across multiple layers, ensuring smooth transitions between windows. In contrast, the long-term...

Find SimilarView on arXiv

R$^3$Mem: Bridging Memory Retention and Retrieval via Reversible Compression

February 21, 2025

90% Match
Xiaoqiang Wang, Suyuchen Wang, ... , Liu Bang
Computation and Language
Artificial Intelligence

Memory plays a key role in enhancing LLMs' performance when deployed to real-world applications. Existing solutions face trade-offs: explicit memory designs based on external storage require complex management and incur storage overhead, while implicit memory designs that store information via parameters struggle with reliable retrieval. In this paper, we propose R$^3$Mem, a memory network that optimizes both information Retention and Retrieval through Reversible context comp...

Find SimilarView on arXiv

LaMemo: Language Modeling with Look-Ahead Memory

April 15, 2022

90% Match
Haozhe Ji, Rongsheng Zhang, Zhenyu Yang, ... , Huang Minlie
Computation and Language

Although Transformers with fully connected self-attentions are powerful to model long-term dependencies, they are struggling to scale to long texts with thousands of words in language modeling. One of the solutions is to equip the model with a recurrence memory. However, existing approaches directly reuse hidden states from the previous segment that encodes contexts in a uni-directional way. As a result, this prohibits the memory to dynamically interact with the current conte...

Find SimilarView on arXiv

Hierarchical Context Merging: Better Long Context Understanding for Pre-trained LLMs

April 16, 2024

90% Match
Woomin Song, Seunghyuk Oh, Sangwoo Mo, Jaehyung Kim, Sukmin Yun, ... , Shin Jinwoo
Machine Learning
Artificial Intelligence

Large language models (LLMs) have shown remarkable performance in various natural language processing tasks. However, a primary constraint they face is the context limit, i.e., the maximum number of tokens they can process. Previous works have explored architectural changes and modifications in positional encoding to relax the constraint, but they often require expensive training or do not address the computational demands of self-attention. In this paper, we present Hierarch...

Find SimilarView on arXiv

Associative Recurrent Memory Transformer

July 5, 2024

90% Match
Ivan Rodkin, Yuri Kuratov, ... , Burtsev Mikhail
Computation and Language
Artificial Intelligence
Machine Learning

This paper addresses the challenge of creating a neural architecture for very long sequences that requires constant time for processing new information at each time step. Our approach, Associative Recurrent Memory Transformer (ARMT), is based on transformer self-attention for local context and segment-level recurrence for storage of task specific information distributed over a long context. We demonstrate that ARMT outperfors existing alternatives in associative retrieval tas...

Find SimilarView on arXiv

MEMORYLLM: Towards Self-Updatable Large Language Models

February 7, 2024

90% Match
Yu Wang, Xiusi Chen, ... , McAuley Julian
Computation and Language

Existing Large Language Models (LLMs) usually remain static after deployment, which might make it hard to inject new knowledge into the model. We aim to build models containing a considerable portion of self-updatable parameters, enabling the model to integrate new knowledge effectively and efficiently. To this end, we introduce MEMORYLLM, a model that comprises a transformer and a fixed-size memory pool within the latent space of the transformer. MEMORYLLM can self-update wi...

Find SimilarView on arXiv

Memoria: Hebbian Memory Architecture for Human-Like Sequential Processing

October 4, 2023

90% Match
Sangjun Park, JinYeong Bak
Machine Learning
Artificial Intelligence
Neural and Evolutionary Comp...

Transformers have demonstrated their success in various domains and tasks. However, Transformers struggle with long input sequences due to their limited capacity. While one solution is to increase input length, endlessly stretching the length is unrealistic. Furthermore, humans selectively remember and use only relevant information from inputs, unlike Transformers which process all raw data from start to end. We introduce Memoria, a general memory network that applies Hebbian...

Find SimilarView on arXiv