In Search of Needles in a 11M Haystack: ...

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

August 28, 2023

90% Match

Yushi Bai, Xin Lv, Jiajie Zhang, Hongchang Lyu, Jiankai Tang, Zhidian Huang, Zhengxiao Du, Xiao Liu, Aohan Zeng, Lei Hou, Yuxiao Dong, ... , Li Juanzi

Computation and Language

Although large language models (LLMs) demonstrate impressive performance for many language tasks, most of them can only handle texts a few thousand tokens long, limiting their applications on longer sequence inputs, such as books, reports, and codebases. Recent works have proposed methods to improve LLMs' long context capabilities by extending context windows and more sophisticated memory mechanisms. However, comprehensive benchmarks tailored for evaluating long context under...

Find SimilarView on arXiv

UniMem: Towards a Unified View of Long-Context Large Language Models

February 5, 2024

90% Match

Junjie Fang, Likai Tang, Hongzhe Bi, Yujia Qin, Si Sun, Zhenyu Li, Haolun Li, Yongjian Li, Xin Cong, Yukun Yan, Xiaodong Shi, Sen Song, Yankai Lin, ... , Sun Maosong

Computation and Language

Artificial Intelligence

Long-context processing is a critical ability that constrains the applicability of large language models. Although there exist various methods devoted to enhancing the long-context processing ability of large language models (LLMs), they are developed in an isolated manner and lack systematic analysis and integration of their strengths, hindering further developments. In this paper, we introduce UniMem, a unified framework that reformulates existing long-context methods from ...

Find SimilarView on arXiv

Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?

November 7, 2024

90% Match

Jonathan Roberts, Kai Han, Samuel Albanie

Computation and Language

As the context limits of Large Language Models (LLMs) increase, the range of possible applications and downstream functions broadens. In many real-world tasks, decisions depend on details scattered across collections of often disparate documents containing mostly irrelevant information. Long-context LLMs appear well-suited to this form of complex information retrieval and reasoning, which has traditionally proven costly and time-consuming. However, although the development of...

Find SimilarView on arXiv

Can't Remember Details in Long Documents? You Need Some R&R

March 8, 2024

90% Match

Devanshu Agrawal, Shang Gao, Martin Gajek

Computation and Language

Artificial Intelligence

Information Retrieval

Machine Learning

Long-context large language models (LLMs) hold promise for tasks such as question-answering (QA) over long documents, but they tend to miss important information in the middle of context documents (arXiv:2307.03172v3). Here, we introduce $\textit{R&R}$ -- a combination of two novel prompt-based methods called $\textit{reprompting}$ and $\textit{in-context retrieval}$ (ICR) -- to alleviate this effect in document-based QA. In reprompting, we repeat the prompt instructions peri...

Find SimilarView on arXiv

Human-like Episodic Memory for Infinite Context LLMs

July 12, 2024

90% Match

Zafeirios Fountas, Martin A Benfeghoul, Adnan Oomerjee, Fenia Christopoulou, Gerasimos Lampouras, ... , Wang Jun

Artificial Intelligence

Computation and Language

Machine Learning

Neurons and Cognition

Large language models (LLMs) have shown remarkable capabilities, but still struggle with processing extensive contexts, limiting their ability to maintain coherence and accuracy over long sequences. In contrast, the human brain excels at organising and retrieving episodic experiences across vast temporal scales, spanning a lifetime. In this work, we introduce EM-LLM, a novel approach that integrates key aspects of human episodic memory and event cognition into LLMs, enabling ...

Find SimilarView on arXiv

M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models

October 30, 2023

90% Match

Wai-Chung Kwan, Xingshan Zeng, Yufei Wang, Yusen Sun, Liangyou Li, Lifeng Shang, ... , Wong Kam-Fai

Computation and Language

Managing long sequences has become an important and necessary feature for large language models (LLMs). However, it is still an open question of how to comprehensively and systematically evaluate the long-sequence capability of LLMs. One of the reasons is that conventional and widely-used benchmarks mainly consist of short sequences. In this paper, we propose M4LE, a Multi-ability, Multi-range, Multi-task, Multi-domain benchmark for Long-context Evaluation. M4LE is based on a...

Find SimilarView on arXiv

Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG

October 8, 2024

90% Match

Bowen Jin, Jinsung Yoon, ... , Arik Sercan O.

Computation and Language

Artificial Intelligence

Machine Learning

Retrieval-augmented generation (RAG) empowers large language models (LLMs) to utilize external knowledge sources. The increasing capacity of LLMs to process longer input sequences opens up avenues for providing more retrieved information, to potentially enhance the quality of generated outputs. It is plausible to assume that a larger retrieval set would contain more relevant information (higher recall), that might result in improved performance. However, our empirical finding...

Find SimilarView on arXiv

Long-range Language Modeling with Self-retrieval

June 23, 2023

90% Match

Ohad Rubin, Jonathan Berant

Computation and Language

Retrieval-augmented language models (LMs) have received much attention recently. However, typically the retriever is not trained jointly as a native component of the LM, but added to an already-pretrained LM, which limits the ability of the LM and the retriever to adapt to one another. In this work, we propose the Retrieval-Pretrained Transformer (RPT), an architecture and training procedure for jointly training a retrieval-augmented LM from scratch for the task of modeling l...

Find SimilarView on arXiv

Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

July 23, 2024

89% Match

Zhuowan Li, Cheng Li, Mingyang Zhang, ... , Bendersky Michael

Computation and Language

Artificial Intelligence

Machine Learning

Retrieval Augmented Generation (RAG) has been a powerful tool for Large Language Models (LLMs) to efficiently process overly lengthy contexts. However, recent LLMs like Gemini-1.5 and GPT-4 show exceptional capabilities to understand long contexts directly. We conduct a comprehensive comparison between RAG and long-context (LC) LLMs, aiming to leverage the strengths of both. We benchmark RAG and LC across various public datasets using three latest LLMs. Results reveal that wh...

Find SimilarView on arXiv

From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data

June 27, 2024

89% Match

Zheyang Xiong, Vasilis Papageorgiou, ... , Papailiopoulos Dimitris

Machine Learning

Artificial Intelligence

Computation and Language

Recent studies have shown that Large Language Models (LLMs) struggle to accurately retrieve information and maintain reasoning capabilities when processing long-context inputs. To address these limitations, we propose a finetuning approach utilizing a carefully designed synthetic dataset comprising numerical key-value retrieval tasks. Our experiments on models like GPT-3.5 Turbo and Mistral 7B demonstrate that finetuning LLMs on this dataset significantly improves LLMs' infor...

Find SimilarView on arXiv

In Search of Needles in a 11M Haystack: Recurrent Memory Finds What LLMs Miss

LongBench: A Bilingual, Multitask Benchmark for Long Context Understanding

UniMem: Towards a Unified View of Long-Context Large Language Models

Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?

Can't Remember Details in Long Documents? You Need Some R&R

Human-like Episodic Memory for Infinite Context LLMs

M4LE: A Multi-Ability Multi-Range Multi-Task Multi-Domain Long-Context Evaluation Benchmark for Large Language Models

Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG

Long-range Language Modeling with Self-retrieval

Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach

From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data