M+: Extending MemoryLLM with Scalable Lo...

HMT: Hierarchical Memory Transformer for Long Context Language Processing

May 9, 2024

92% Match

Zifan He, Zongyue Qin, Neha Prakriya, ... , Cong Jason

Computation and Language

Machine Learning

Transformer-based large language models (LLM) have been widely used in language processing applications. However, most of them restrict the context window that permits the model to attend to every token in the inputs. Previous works in recurrent models can memorize past tokens to enable unlimited context and maintain effectiveness. However, they have "flat" memory architectures, which have limitations in selecting and filtering information. Since humans are good at learning a...

Find SimilarView on arXiv

Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks

December 20, 2024

92% Match

Brian J Chan, Chao-Ting Chen, ... , Huang Hen-Hsen

Computation and Language

Retrieval-augmented generation (RAG) has gained traction as a powerful approach for enhancing language models by integrating external knowledge sources. However, RAG introduces challenges such as retrieval latency, potential errors in document selection, and increased system complexity. With the advent of large language models (LLMs) featuring significantly extended context windows, this paper proposes an alternative paradigm, cache-augmented generation (CAG) that bypasses re...

Find SimilarView on arXiv

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

May 22, 2020

92% Match

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, ... , Kiela Douwe

Computation and Language

Machine Learning

Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks. However, their ability to access and precisely manipulate knowledge is still limited, and hence on knowledge-intensive tasks, their performance lags behind task-specific architectures. Additionally, providing provenance for their decisions and updating their world knowledge remain open research problems...

Find SimilarView on arXiv

RET-LLM: Towards a General Read-Write Memory for Large Language Models

May 23, 2023

92% Match

Ali Modarressi, Ayyoob Imani, ... , Schütze Hinrich

Computation and Language

Large language models (LLMs) have significantly advanced the field of natural language processing (NLP) through their extensive parameters and comprehensive data utilization. However, existing LLMs lack a dedicated memory unit, limiting their ability to explicitly store and retrieve knowledge for various tasks. In this paper, we propose RET-LLM a novel framework that equips LLMs with a general write-read memory unit, allowing them to extract, store, and recall knowledge from ...

Find SimilarView on arXiv

Inference Scaling for Long-Context Retrieval Augmented Generation

October 6, 2024

92% Match

Zhenrui Yue, Honglei Zhuang, Aijun Bai, Kai Hui, Rolf Jagerman, Hansi Zeng, Zhen Qin, Dong Wang, ... , Bendersky Michael

Computation and Language

The scaling of inference computation has unlocked the potential of long-context large language models (LLMs) across diverse settings. For knowledge-intensive tasks, the increased compute is often allocated to incorporate more external knowledge. However, without effectively utilizing such knowledge, solely expanding context does not always enhance performance. In this work, we investigate inference scaling for retrieval augmented generation (RAG), exploring strategies beyond ...

Find SimilarView on arXiv

ACER: Automatic Language Model Context Extension via Retrieval

October 11, 2024

92% Match

Luyu Gao, Yunyi Zhang, Jamie Callan

Computation and Language

Artificial Intelligence

Information Retrieval

Machine Learning

Long-context modeling is one of the critical capabilities of language AI for digesting and reasoning over complex information pieces. In practice, long-context capabilities are typically built into a pre-trained language model~(LM) through a carefully designed context extension stage, with the goal of producing generalist long-context capabilities. In our preliminary experiments, however, we discovered that the current open-weight generalist long-context models are still lack...

Find SimilarView on arXiv

CacheFocus: Dynamic Cache Re-Positioning for Efficient Retrieval-Augmented Generation

February 16, 2025

92% Match

Kun-Hui Lee, Eunhwan Park, ... , Na Seung-Hoon

Computation and Language

Artificial Intelligence

Large Language Models (LLMs) excel across a variety of language tasks yet are constrained by limited input lengths and high computational costs. Existing approaches\textemdash such as relative positional encodings (e.g., RoPE, ALiBi) and sliding window mechanisms\textemdash partially alleviate these issues but often require additional training or suffer from performance degradation with longer inputs. In this paper, we introduce \textbf{\textit{CacheFocus}}, a method that enh...

Find SimilarView on arXiv

A Controlled Study on Long Context Extension and Generalization in LLMs

September 18, 2024

92% Match

Yi Lu, Jing Nathan Yan, Songlin Yang, Justin T. Chiu, Siyu Ren, Fei Yuan, Wenting Zhao, ... , Rush Alexander M.

Computation and Language

Machine Learning

Broad textual understanding and in-context learning require language models that utilize full document contexts. Due to the implementation challenges associated with directly training long-context models, many methods have been proposed for extending models to handle long contexts. However, owing to differences in data and model classes, it has been challenging to compare these approaches, leading to uncertainty as to how to evaluate long-context performance and whether it di...

Find SimilarView on arXiv

Exploring the landscape of large language models: Foundations, techniques, and challenges

April 18, 2024

92% Match

Milad Moradi, Ke Yan, David Colwell, ... , Asgari Rhona

Artificial Intelligence

In this review paper, we delve into the realm of Large Language Models (LLMs), covering their foundational principles, diverse applications, and nuanced training processes. The article sheds light on the mechanics of in-context learning and a spectrum of fine-tuning approaches, with a special focus on methods that optimize efficiency in parameter usage. Additionally, it explores how LLMs can be more closely aligned with human preferences through innovative reinforcement learn...

Find SimilarView on arXiv

LooGLE: Can Long-Context Language Models Understand Long Contexts?

November 8, 2023

92% Match

Jiaqi Li, Mengmeng Wang, ... , Zhang Muhan

Computation and Language

Artificial Intelligence

Large language models (LLMs), despite their impressive performance in various language tasks, are typically limited to processing texts within context-window size. This limitation has spurred significant research efforts to enhance LLMs' long-context understanding with high-quality long-sequence benchmarks. However, prior datasets in this regard suffer from shortcomings, such as short context length compared to the context window of modern LLMs; outdated documents that have d...

Find SimilarView on arXiv

M+: Extending MemoryLLM with Scalable Long-Term Memory

HMT: Hierarchical Memory Transformer for Long Context Language Processing

Don't Do RAG: When Cache-Augmented Generation is All You Need for Knowledge Tasks

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

RET-LLM: Towards a General Read-Write Memory for Large Language Models

Inference Scaling for Long-Context Retrieval Augmented Generation

ACER: Automatic Language Model Context Extension via Retrieval

CacheFocus: Dynamic Cache Re-Positioning for Efficient Retrieval-Augmented Generation

A Controlled Study on Long Context Extension and Generalization in LLMs

Exploring the landscape of large language models: Foundations, techniques, and challenges

LooGLE: Can Long-Context Language Models Understand Long Contexts?