Associative Recurrent Memory Transformer

July 5, 2024

An All-MLP Sequence Modeling Architecture That Excels at Copying

June 23, 2024

88% Match

Chenwei Cui, Zehao Yan, ... , Kerner Hannah

Machine Learning

Recent work demonstrated Transformers' ability to efficiently copy strings of exponential sizes, distinguishing them from other architectures. We present the Causal Relation Network (CausalRN), an all-MLP sequence modeling architecture that can match Transformers on the copying task. Extending Relation Networks (RNs), we implemented key innovations to support autoregressive sequence modeling while maintaining computational feasibility. We discovered that exponentially-activat...

Find SimilarView on arXiv

How Context Affects Language Models' Factual Predictions

May 10, 2020

88% Match

Fabio Petroni, Patrick Lewis, Aleksandra Piktus, Tim Rocktäschel, Yuxiang Wu, ... , Riedel Sebastian

Computation and Language

When pre-trained on large unsupervised textual corpora, language models are able to store and retrieve factual knowledge to some extent, making it possible to use them directly for zero-shot cloze-style question answering. However, storing factual knowledge in a fixed number of weights of a language model clearly has limitations. Previous approaches have successfully provided access to information outside the model weights using supervised architectures that combine an inform...

Find SimilarView on arXiv

Re2G: Retrieve, Rerank, Generate

July 13, 2022

88% Match

Michael Glass, Gaetano Rossiello, Md Faisal Mahbub Chowdhury, Ankita Rajaram Naik, ... , Gliozzo Alfio

Computation and Language

Artificial Intelligence

Information Retrieval

As demonstrated by GPT-3 and T5, transformers grow in capability as parameter spaces become larger and larger. However, for tasks that require a large amount of knowledge, non-parametric memory allows models to grow dramatically with a sub-linear increase in computational cost and GPU memory requirements. Recent models such as RAG and REALM have introduced retrieval into conditional generation. These models incorporate neural initial retrieval from a corpus of passages. We bu...

Find SimilarView on arXiv

Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell

June 20, 2024

88% Match

Taiming Lu, Muhan Gao, Kuai Yu, ... , Khashabi Daniel

Computation and Language

Large Language Models (LLMs) exhibit positional bias, struggling to utilize information from the middle or end of long contexts. Our study explores LLMs' long-context reasoning by probing their hidden representations. We find that while LLMs encode the position of target information, they often fail to leverage this in generating accurate responses. This reveals a disconnect between information retrieval and utilization, a "know but don't tell" phenomenon. We further analyze ...

Find SimilarView on arXiv

LLoCO: Learning Long Contexts Offline

April 11, 2024

88% Match

Sijun Tan, Xiuyu Li, Shishir Patil, Ziyang Wu, Tianjun Zhang, Kurt Keutzer, ... , Popa Raluca Ada

Computation and Language

Artificial Intelligence

Machine Learning

Processing long contexts remains a challenge for large language models (LLMs) due to the quadratic computational and memory overhead of the self-attention mechanism and the substantial KV cache sizes during generation. We propose a novel approach to address this problem by learning contexts offline through context compression and in-domain parameter-efficient finetuning. Our method enables an LLM to create a concise representation of the original context and efficiently retri...

Find SimilarView on arXiv

RecallM: An Adaptable Memory Mechanism with Temporal Understanding for Large Language Models

July 6, 2023

88% Match

Brandon Kynoch, Hugo Latapie, der Sluis Dwane van

Artificial Intelligence

Computation and Language

Symbolic Computation

Large Language Models (LLMs) have made extraordinary progress in the field of Artificial Intelligence and have demonstrated remarkable capabilities across a large variety of tasks and domains. However, as we venture closer to creating Artificial General Intelligence (AGI) systems, we recognize the need to supplement LLMs with long-term memory to overcome the context window limitation and more importantly, to create a foundation for sustained reasoning, cumulative learning and...

Find SimilarView on arXiv

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

January 31, 2024

88% Match

Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, ... , Manning Christopher D.

Computation and Language

Machine Learning

Retrieval-augmented language models can better adapt to changes in world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic understanding of the overall document context. We introduce the novel approach of recursively embedding, clustering, and summarizing chunks of text, constructing a tree with differing levels of summarization from the bottom up. At inference time, our R...

Find SimilarView on arXiv

Memoria: Hebbian Memory Architecture for Human-Like Sequential Processing

October 4, 2023

88% Match

Sangjun Park, JinYeong Bak

Machine Learning

Artificial Intelligence

Neural and Evolutionary Comp...

Transformers have demonstrated their success in various domains and tasks. However, Transformers struggle with long input sequences due to their limited capacity. While one solution is to increase input length, endlessly stretching the length is unrealistic. Furthermore, humans selectively remember and use only relevant information from inputs, unlike Transformers which process all raw data from start to end. We introduce Memoria, a general memory network that applies Hebbian...

Find SimilarView on arXiv

Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model

May 24, 2023

88% Match

Yinghan Long, Sayeed Shafayet Chowdhury, Kaushik Roy

Computation and Language

Artificial Intelligence

Machine Learning

Transformers have shown dominant performance across a range of domains including language and vision. However, their computational cost grows quadratically with the sequence length, making their usage prohibitive for resource-constrained applications. To counter this, our approach is to divide the whole sequence into segments and apply attention to the individual segments. We propose a segmented recurrent transformer (SRformer) that combines segmented (local) attention with r...

Find SimilarView on arXiv

Larimar: Large Language Models with Episodic Memory Control

March 18, 2024

88% Match

Payel Das, Subhajit Chaudhury, Elliot Nelson, Igor Melnyk, Sarath Swaminathan, Sihui Dai, Aurélie Lozano, Georgios Kollias, Vijil Chenthamarakshan, Jiří, Navrátil, ... , Chen Pin-Yu

Machine Learning

Artificial Intelligence

Efficient and accurate updating of knowledge stored in Large Language Models (LLMs) is one of the most pressing research challenges today. This paper presents Larimar - a novel, brain-inspired architecture for enhancing LLMs with a distributed episodic memory. Larimar's memory allows for dynamic, one-shot updates of knowledge without the need for computationally expensive re-training or fine-tuning. Experimental results on multiple fact editing benchmarks demonstrate that Lar...

Find SimilarView on arXiv