Structured Token Retention and Computati...

Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing

February 18, 2025

90% Match

Xiaoju Ye, Zhichun Wang, Jingyuan Wang

Computation and Language

Limited by the context window size of Large Language Models(LLMs), handling various tasks with input tokens exceeding the upper limit has been challenging, whether it is a simple direct retrieval task or a complex multi-hop reasoning task. Although various methods have been proposed to enhance the long-context processing capabilities of LLMs, they either incur substantial post-training costs, or require additional tool modules(e.g.,RAG), or have not shown significant improvem...

Find SimilarView on arXiv

Current Limitations of Language Models: What You Need is Retrieval

September 15, 2020

90% Match

Aran Komatsuzaki

Computation and Language

Machine Learning

We classify and re-examine some of the current approaches to improve the performance-computes trade-off of language models, including (1) non-causal models (such as masked language models), (2) extension of batch length with efficient attention, (3) recurrence, (4) conditional computation and (5) retrieval. We identify some limitations (1) - (4) suffer from. For example, (1) currently struggles with open-ended text generation with the output loosely constrained by the input a...

Find SimilarView on arXiv

An Evolved Universal Transformer Memory

October 17, 2024

90% Match

Edoardo Cetin, Qi Sun, ... , Tang Yujin

Machine Learning

Artificial Intelligence

Computation and Language

Prior methods propose to offset the escalating costs of modern foundation models by dropping specific parts of their contexts with hand-designed rules, while attempting to preserve their original performance. We overcome this trade-off with Neural Attention Memory Models (NAMMs), introducing a learned network for memory management that improves both the performance and efficiency of transformers. We evolve NAMMs atop pre-trained transformers to provide different latent contex...

Find SimilarView on arXiv

Does RAG Really Perform Bad For Long-Context Processing?

February 17, 2025

90% Match

Kun Luo, Zheng Liu, Peitian Zhang, Hongjin Qian, ... , Liu Kang

Computation and Language

The efficient processing of long context poses a serious challenge for large language models (LLMs). Recently, retrieval-augmented generation (RAG) has emerged as a promising strategy for this problem, as it enables LLMs to make selective use of the long context for efficient computation. However, existing RAG approaches lag behind other long-context processing methods due to inherent limitations on inaccurate retrieval and fragmented contexts. To address these challenges, we...

Find SimilarView on arXiv

Exploring the landscape of large language models: Foundations, techniques, and challenges

April 18, 2024

90% Match

Milad Moradi, Ke Yan, David Colwell, ... , Asgari Rhona

Artificial Intelligence

In this review paper, we delve into the realm of Large Language Models (LLMs), covering their foundational principles, diverse applications, and nuanced training processes. The article sheds light on the mechanics of in-context learning and a spectrum of fine-tuning approaches, with a special focus on methods that optimize efficiency in parameter usage. Additionally, it explores how LLMs can be more closely aligned with human preferences through innovative reinforcement learn...

Find SimilarView on arXiv

Attention is All You Need Until You Need Retention

January 15, 2025

90% Match

M. Murat Yaslioglu

Machine Learning

Artificial Intelligence

This work introduces a novel Retention Layer mechanism for Transformer based architectures, addressing their inherent lack of intrinsic retention capabilities. Unlike human cognition, which can encode and dynamically recall symbolic templates, Generative Pretrained Transformers rely solely on fixed pretrained weights and ephemeral context windows, limiting their adaptability. The proposed Retention Layer incorporates a persistent memory module capable of real time data popula...

Find SimilarView on arXiv

Contextual Compression Encoding for Large Language Models: A Novel Framework for Multi-Layered Parameter Space Pruning

February 12, 2025

90% Match

Barnaby Schmitt, Alistair Grosvenor, Matthias Cunningham, Clementine Walsh, ... , Teel Jonathan

Computation and Language

Context-aware compression techniques have gained increasing attention as model sizes continue to grow, introducing computational bottlenecks that hinder efficient deployment. A structured encoding approach was proposed to selectively eliminate redundant parameter groups while ensuring that representational fidelity was preserved across multiple layers. Contextual Compression Encoding (CCE) introduced a multi-stage encoding mechanism that dynamically restructured parameter dis...

Find SimilarView on arXiv

Framework for Progressive Knowledge Fusion in Large Language Models Through Structured Conceptual Redundancy Analysis

January 23, 2025

90% Match

Joseph Sakau, Evander Kozlowski, ... , Steinberger Basil

Computation and Language

Artificial Intelligence

The organization of latent knowledge within large-scale models poses unique challenges when addressing overlapping representations and optimizing contextual accuracy. Conceptual redundancies embedded across layers often result in inefficiencies that affect both computational demands and task-specific outcomes. A framework was proposed to restructure these redundancies through advanced clustering techniques and dynamic thresholding, ensuring that critical semantic relationship...

Find SimilarView on arXiv

Intrinsic Tensor Field Propagation in Large Language Models: A Novel Approach to Contextual Information Flow

January 31, 2025

90% Match

Alfred Bexley, Lukas Radcliffe, ... , Sakau Joseph

Computation and Language

Context propagation remains a central challenge in language model architectures, particularly in tasks requiring the retention of long-range dependencies. Conventional attention mechanisms, while effective in many applications, exhibit limitations in maintaining coherent contextual representations over extended sequences due to their reliance on discrete token interactions. A novel approach is introduced through the formulation of Intrinsic Tensor Field Propagation (ITFP), wh...

Find SimilarView on arXiv

Long-Range Tasks Using Short-Context LLMs: Incremental Reasoning With Structured Memories

December 25, 2024

90% Match

Dulhan Jayalath, James Bradley Wendt, Nicholas Monath, ... , Gunel Beliz

Artificial Intelligence

Long-range tasks require reasoning over long inputs. Existing solutions either need large compute budgets, training data, access to model weights, or use complex, task-specific approaches. We present PRISM, which alleviates these concerns by processing information as a stream of chunks, maintaining a structured in-context memory specified by a typed hierarchy schema. This approach demonstrates superior performance to baselines on diverse tasks while using at least 4x smaller ...

Find SimilarView on arXiv

Structured Token Retention and Computational Memory Paths in Large Language Models

Infinite Retrieval: Attention Enhanced LLMs in Long-Context Processing

Current Limitations of Language Models: What You Need is Retrieval

An Evolved Universal Transformer Memory

Does RAG Really Perform Bad For Long-Context Processing?

Exploring the landscape of large language models: Foundations, techniques, and challenges

Attention is All You Need Until You Need Retention

Contextual Compression Encoding for Large Language Models: A Novel Framework for Multi-Layered Parameter Space Pruning

Framework for Progressive Knowledge Fusion in Large Language Models Through Structured Conceptual Redundancy Analysis

Intrinsic Tensor Field Propagation in Large Language Models: A Novel Approach to Contextual Information Flow

Long-Range Tasks Using Short-Context LLMs: Incremental Reasoning With Structured Memories