RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models

December 31, 2023

Yuanhao Wu, Juno Zhu, Siliang Xu, Kashun Shum, Cheng Niu, Randy Zhong, Juntong Song, Tong Zhang

Computer Science

Computation and Language

Retrieval-augmented generation (RAG) has become a main technique for alleviating hallucinations in large language models (LLMs). Despite the integration of RAG, LLMs may still present unsupported or contradictory claims to the retrieved contents. In order to develop effective hallucination prevention strategies under RAG, it is important to create benchmark datasets that can measure the extent of hallucination. This paper presents RAGTruth, a corpus tailored for analyzing word-level hallucinations in various domains and tasks within the standard RAG frameworks for LLM applications. RAGTruth comprises nearly 18,000 naturally generated responses from diverse LLMs using RAG. These responses have undergone meticulous manual annotations at both the individual cases and word levels, incorporating evaluations of hallucination intensity. We not only benchmark hallucination frequencies across different LLMs, but also critically assess the effectiveness of several existing hallucination detection methodologies. Furthermore, we show that using a high-quality dataset such as RAGTruth, it is possible to finetune a relatively small LLM and achieve a competitive level of performance in hallucination detection when compared to the existing prompt-based approaches using state-of-the-art large language models such as GPT-4.

Hallucination Detection and Hallucination Mitigation: An Investigation

January 16, 2024

95% Match

Junliang Luo, Tianyu Li, Di Wu, Michael Jenkin, ... , Dudek Gregory

Computation and Language

Artificial Intelligence

Large language models (LLMs), including ChatGPT, Bard, and Llama, have achieved remarkable successes over the last two years in a range of different applications. In spite of these successes, there exist concerns that limit the wide application of LLMs. A key problem is the problem of hallucination. Hallucination refers to the fact that in addition to correct responses, LLMs can also generate seemingly correct but factually incorrect responses. This report aims to present a c...

Find SimilarView on arXiv

LRP4RAG: Detecting Hallucinations in Retrieval-Augmented Generation via Layer-wise Relevance Propagation

August 28, 2024

95% Match

Haichuan Hu, Yuhan Sun, Quanjun Zhang

Computation and Language

Artificial Intelligence

Retrieval-Augmented Generation (RAG) has become a primary technique for mitigating hallucinations in large language models (LLMs). However, incomplete knowledge extraction and insufficient understanding can still mislead LLMs to produce irrelevant or even contradictory responses, which means hallucinations persist in RAG. In this paper, we propose LRP4RAG, a method based on the Layer-wise Relevance Propagation (LRP) algorithm for detecting hallucinations in RAG. Specifically,...

Find SimilarView on arXiv

HaluEval-Wild: Evaluating Hallucinations of Language Models in the Wild

March 7, 2024

95% Match

Zhiying Zhu, Zhiqing Sun, Yiming Yang

Computation and Language

Hallucinations pose a significant challenge to the reliability of large language models (LLMs) in critical domains. Recent benchmarks designed to assess LLM hallucinations within conventional NLP tasks, such as knowledge-intensive question answering (QA) and summarization, are insufficient for capturing the complexities of user-LLM interactions in dynamic, real-world settings. To address this gap, we introduce HaluEval-Wild, the first benchmark specifically designed to evalua...

Find SimilarView on arXiv

Lynx: An Open Source Hallucination Evaluation Model

July 11, 2024

95% Match

Selvan Sunitha Ravi, Bartosz Mielczarek, Anand Kannappan, ... , Qian Rebecca

Artificial Intelligence

Computation and Language

Retrieval Augmented Generation (RAG) techniques aim to mitigate hallucinations in Large Language Models (LLMs). However, LLMs can still produce information that is unsupported or contradictory to the retrieved contexts. We introduce LYNX, a SOTA hallucination detection LLM that is capable of advanced reasoning on challenging real-world hallucination scenarios. To evaluate LYNX, we present HaluBench, a comprehensive hallucination evaluation benchmark, consisting of 15k samples...

Find SimilarView on arXiv

Chainpoll: A high efficacy method for LLM hallucination detection

October 22, 2023

95% Match

Robert Friel, Atindriyo Sanyal

Computation and Language

Artificial Intelligence

Machine Learning

Large language models (LLMs) have experienced notable advancements in generating coherent and contextually relevant responses. However, hallucinations - incorrect or unfounded claims - are still prevalent, prompting the creation of automated metrics to detect these in LLM outputs. Our contributions include: introducing ChainPoll, an innovative hallucination detection method that excels compared to its counterparts, and unveiling RealHall, a refined collection of benchmark dat...

Find SimilarView on arXiv

Luna: An Evaluation Foundation Model to Catch Language Model Hallucinations with High Accuracy and Low Cost

June 3, 2024

95% Match

Masha Belyi, Robert Friel, ... , Sanyal Atindriyo

Computation and Language

Artificial Intelligence

Retriever Augmented Generation (RAG) systems have become pivotal in enhancing the capabilities of language models by incorporating external knowledge retrieval mechanisms. However, a significant challenge in deploying these systems in industry applications is the detection and mitigation of hallucinations: instances where the model generates information that is not grounded in the retrieved context. Addressing this issue is crucial for ensuring the reliability and accuracy of...

Find SimilarView on arXiv

RAGged Edges: The Double-Edged Sword of Retrieval-Augmented Chatbots

March 2, 2024

94% Match

Philip Feldman. James R. Foulds, Shimei Pan

Computation and Language

Artificial Intelligence

Large language models (LLMs) like ChatGPT demonstrate the remarkable progress of artificial intelligence. However, their tendency to hallucinate -- generate plausible but false information -- poses a significant challenge. This issue is critical, as seen in recent court cases where ChatGPT's use led to citations of non-existent legal rulings. This paper explores how Retrieval-Augmented Generation (RAG) can counter hallucinations by integrating external knowledge with prompts....

Find SimilarView on arXiv

100% Hallucination Elimination Using Acurai

December 6, 2024

94% Match

Michael C. Wood, Adam A. Forbes

Computation and Language

The issue of hallucinations in large language models (LLMs) remains a critical barrier to the adoption of AI in enterprise and other high-stakes applications. Despite advancements in retrieval-augmented generation (RAG) systems, current state-of-the-art methods fail to achieve more than 80% accuracy in generating faithful and factually correct outputs, even when provided with relevant and accurate context. In this work, we introduce Acurai, a novel systematic approach that ac...

Find SimilarView on arXiv

Benchmarking Large Language Models in Retrieval-Augmented Generation

September 4, 2023

94% Match

Jiawei Chen, Hongyu Lin, ... , Sun Le

Computation and Language

Retrieval-Augmented Generation (RAG) is a promising approach for mitigating the hallucination of large language models (LLMs). However, existing research lacks rigorous evaluation of the impact of retrieval-augmented generation on different large language models, which make it challenging to identify the potential bottlenecks in the capabilities of RAG for different LLMs. In this paper, we systematically investigate the impact of Retrieval-Augmented Generation on large langua...

Find SimilarView on arXiv

In Search of Truth: An Interrogation Approach to Hallucination Detection

March 5, 2024

94% Match

Yakir Yehuda, Itzik Malkiel, Oren Barkan, Jonathan Weill, ... , Koenigstein Noam

Computation and Language

Machine Learning

Despite the many advances of Large Language Models (LLMs) and their unprecedented rapid evolution, their impact and integration into every facet of our daily lives is limited due to various reasons. One critical factor hindering their widespread adoption is the occurrence of hallucinations, where LLMs invent answers that sound realistic, yet drift away from factual truth. In this paper, we present a novel method for detecting hallucinations in large language models, which tac...

Find SimilarView on arXiv