ID: 2405.06067

HMT: Hierarchical Memory Transformer for Long Context Language Processing

May 9, 2024

View on ArXiv

Similar papers 4

Online Adaptation of Language Models with a Memory of Amortized Contexts

March 7, 2024

89% Match
Jihoon Tack, Jaehyung Kim, Eric Mitchell, Jinwoo Shin, ... , Schwarz Jonathan Richard
Machine Learning
Computation and Language

Due to the rapid generation and dissemination of information, large language models (LLMs) quickly run out of date despite enormous development costs. Due to this crucial need to keep models updated, online learning has emerged as a critical necessity when utilizing LLMs for real-world applications. However, given the ever-expanding corpus of unseen documents and the large parameter space of modern LLMs, efficient adaptation is essential. To address these challenges, we propo...

Find SimilarView on arXiv

ReadTwice: Reading Very Large Documents with Memories

May 10, 2021

89% Match
Yury Zemlyanskiy, Joshua Ainslie, Jong Michiel de, Philip Pham, ... , Sha Fei
Computation and Language
Machine Learning

Knowledge-intensive tasks such as question answering often require assimilating information from different sections of large inputs such as books or article collections. We propose ReadTwice, a simple and effective technique that combines several strengths of prior approaches to model long-range dependencies with Transformers. The main idea is to read text in small segments, in parallel, summarizing each segment into a memory table to be used in a second read of the text. We ...

Find SimilarView on arXiv

BABILong: Testing the Limits of LLMs with Long Context Reasoning-in-a-Haystack

June 14, 2024

89% Match
Yuri Kuratov, Aydar Bulatov, Petr Anokhin, Ivan Rodkin, Dmitry Sorokin, ... , Burtsev Mikhail
Computation and Language
Artificial Intelligence

In recent years, the input context sizes of large language models (LLMs) have increased dramatically. However, existing evaluation methods have not kept pace, failing to comprehensively assess the efficiency of models in handling long contexts. To bridge this gap, we introduce the BABILong benchmark, designed to test language models' ability to reason across facts distributed in extremely long documents. BABILong includes a diverse set of 20 reasoning tasks, including fact ch...

Find SimilarView on arXiv

Recursively Summarizing Enables Long-Term Dialogue Memory in Large Language Models

August 29, 2023

89% Match
Qingyue Wang, Liang Ding, Yanan Cao, Zhiliang Tian, Shi Wang, ... , Guo Li
Computation and Language
Artificial Intelligence

Most open-domain dialogue systems suffer from forgetting important information, especially in a long-term conversation. Existing works usually train the specific retriever or summarizer to obtain key information from the past, which is time-consuming and highly depends on the quality of labeled data. To alleviate this problem, we propose to recursively generate summaries/ memory using large language models (LLMs) to enhance long-term memory ability. Specifically, our method f...

Find SimilarView on arXiv

UIO-LLMs: Unbiased Incremental Optimization for Long-Context LLMs

June 26, 2024

89% Match
Wenhao Li, Mingbao Lin, Yunshan Zhong, ... , Ji Rongrong
Computation and Language

Managing long texts is challenging for large language models (LLMs) due to limited context window sizes. This study introduces UIO-LLMs, an unbiased incremental optimization approach for memory-enhanced transformers under long-context settings. We initially conceptualize the process as a streamlined encoder-decoder framework where the weights-shared encoder and decoder respectively encapsulate a context segment into memories and leverage these memories to predict outputs of t...

Find SimilarView on arXiv

RET-LLM: Towards a General Read-Write Memory for Large Language Models

May 23, 2023

89% Match
Ali Modarressi, Ayyoob Imani, ... , Schütze Hinrich
Computation and Language

Large language models (LLMs) have significantly advanced the field of natural language processing (NLP) through their extensive parameters and comprehensive data utilization. However, existing LLMs lack a dedicated memory unit, limiting their ability to explicitly store and retrieve knowledge for various tasks. In this paper, we propose RET-LLM a novel framework that equips LLMs with a general write-read memory unit, allowing them to extract, store, and recall knowledge from ...

Find SimilarView on arXiv

Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size

August 17, 2020

89% Match
Davis Yoshida, Allyson Ettinger, Kevin Gimpel
Computation and Language

Fine-tuning a pretrained transformer for a downstream task has become a standard method in NLP in the last few years. While the results from these models are impressive, applying them can be extremely computationally expensive, as is pretraining new models with the latest architectures. We present a novel method for applying pretrained transformer language models which lowers their memory requirement both at training and inference time. An additional benefit is that our metho...

Find SimilarView on arXiv

Extended Mind Transformers

June 4, 2024

89% Match
Phoebe Klett, Thomas Ahle
Machine Learning
Computation and Language

Pre-trained language models demonstrate general intelligence and common sense, but long inputs quickly become a bottleneck for memorizing information at inference time. We resurface a simple method, Memorizing Transformers (Wu et al., 2022), that gives the model access to a bank of pre-computed memories. We show that it is possible to fix many of the shortcomings of the original method, such as the need for fine-tuning, by critically assessing how positional encodings should ...

Find SimilarView on arXiv

ERNIE-Doc: A Retrospective Long-Document Modeling Transformer

December 31, 2020

89% Match
Siyu Ding, Junyuan Shang, Shuohuan Wang, Yu Sun, Hao Tian, ... , Wang Haifeng
Computation and Language

Transformers are not suited for processing long documents, due to their quadratically increasing memory and time consumption. Simply truncating a long document or applying the sparse attention mechanism will incur the context fragmentation problem or lead to an inferior modeling capability against comparable model sizes. In this paper, we propose ERNIE-Doc, a document-level language pretraining model based on Recurrence Transformers. Two well-designed techniques, namely the r...

Find SimilarView on arXiv

RecallM: An Adaptable Memory Mechanism with Temporal Understanding for Large Language Models

July 6, 2023

89% Match
Brandon Kynoch, Hugo Latapie, der Sluis Dwane van
Artificial Intelligence
Computation and Language
Symbolic Computation

Large Language Models (LLMs) have made extraordinary progress in the field of Artificial Intelligence and have demonstrated remarkable capabilities across a large variety of tasks and domains. However, as we venture closer to creating Artificial General Intelligence (AGI) systems, we recognize the need to supplement LLMs with long-term memory to overcome the context window limitation and more importantly, to create a foundation for sustained reasoning, cumulative learning and...

Find SimilarView on arXiv