ID: 2407.01393

POLygraph: Polish Fake News Dataset

July 1, 2024

View on ArXiv
Daniel Dzienisiewicz, Filip Graliński, Piotr Jabłoński, Marek Kubis, Paweł Skórzewski, Piotr Wierzchoń
Computer Science
Computation and Language

This paper presents the POLygraph dataset, a unique resource for fake news detection in Polish. The dataset, created by an interdisciplinary team, is composed of two parts: the "fake-or-not" dataset with 11,360 pairs of news articles (identified by their URLs) and corresponding labels, and the "fake-they-say" dataset with 5,082 news articles (identified by their URLs) and tweets commenting on them. Unlike existing datasets, POLygraph encompasses a variety of approaches from source literature, providing a comprehensive resource for fake news detection. The data was collected through manual annotation by expert and non-expert annotators. The project also developed a software tool that uses advanced machine learning techniques to analyze the data and determine content authenticity. The tool and dataset are expected to benefit various entities, from public sector institutions to publishers and fact-checking organizations. Further dataset exploration will foster fake news detection and potentially stimulate the implementation of similar models in other languages. The paper focuses on the creation and composition of the dataset, so it does not include a detailed evaluation of the software tool for content authenticity analysis, which is planned at a later stage of the project.

Similar papers 1

Dataset of Fake News Detection and Fact Verification: A Survey

November 5, 2021

94% Match
Taichi Murayama
Machine Learning
Computation and Language
Computers and Society

The rapid increase in fake news, which causes significant damage to society, triggers many fake news related studies, including the development of fake news detection and fact verification techniques. The resources for these studies are mainly available as public datasets taken from Web data. We surveyed 118 datasets related to fake news research on a large scale from three perspectives: (1) fake news detection, (2) fact verification, and (3) other tasks; for example, the ana...

Find SimilarView on arXiv

Fake News Detection: It's All in the Data!

July 2, 2024

94% Match
Soveatin Kuntur, Anna Wróblewska, ... , Ganzha Maria
Computation and Language

This comprehensive survey serves as an indispensable resource for researchers embarking on the journey of fake news detection. By highlighting the pivotal role of dataset quality and diversity, it underscores the significance of these elements in the effectiveness and robustness of detection models. The survey meticulously outlines the key features of datasets, various labeling systems employed, and prevalent biases that can impact model performance. Additionally, it addresse...

Find SimilarView on arXiv

Automatic Detection of Fake News

August 23, 2017

93% Match
Verónica Pérez-Rosas, Bennett Kleinberg, ... , Mihalcea Rada
Computation and Language

The proliferation of misleading information in everyday access media outlets such as social media feeds, news blogs, and online newspapers have made it challenging to identify trustworthy news sources, thus increasing the need for computational tools able to provide insights into the reliability of online content. In this paper, we focus on the automatic identification of fake content in online news. Our contribution is twofold. First, we introduce two novel datasets for the ...

Find SimilarView on arXiv

A Survey on Natural Language Processing for Fake News Detection

November 2, 2018

93% Match
Ray Oshikawa, Jing Qian, William Yang Wang
Computation and Language
Artificial Intelligence

Fake news detection is a critical yet challenging problem in Natural Language Processing (NLP). The rapid rise of social networking platforms has not only yielded a vast increase in information accessibility but has also accelerated the spread of fake news. Thus, the effect of fake news has been growing, sometimes extending to the offline world and threatening public safety. Given the massive amount of Web content, automatic fake news detection is a practical NLP problem usef...

Find SimilarView on arXiv

Combating Fake News: A Survey on Identification and Mitigation Techniques

January 18, 2019

92% Match
Karishma Sharma, Feng Qian, He Jiang, Natali Ruchansky, ... , Liu Yan
Machine Learning
Artificial Intelligence
Social and Information Netwo...
Machine Learning

The proliferation of fake news on social media has opened up new directions of research for timely identification and containment of fake news, and mitigation of its widespread impact on public opinion. While much of the earlier research was focused on identification of fake news based on its contents or by exploiting users' engagements with the news on social media, there has been a rising interest in proactive intervention strategies to counter the spread of misinformation ...

Find SimilarView on arXiv

Fake News Detection: Experiments and Approaches beyond Linguistic Features

September 27, 2021

92% Match
Shaily Bhatt, Sakshi Kalra, ... , Sharma Yashvardhan
Computation and Language

Easier access to the internet and social media has made disseminating information through online sources very easy. Sources like Facebook, Twitter, online news sites and personal blogs of self-proclaimed journalists have become significant players in providing news content. The sheer amount of information and the speed at which it is generated online makes it practically beyond the scope of human verification. There is, hence, a pressing need to develop technologies that can ...

Find SimilarView on arXiv

Fact-checking based fake news detection: a review

January 3, 2024

92% Match
Yuzhou Yang, Yangming Zhou, Qichao Ying, Zhenxing Qian, ... , Liu Liang
Computer Vision and Pattern ...

This paper reviews and summarizes the research results on fact-based fake news from the perspectives of tasks and problems, algorithm strategies, and datasets. First, the paper systematically explains the task definition and core problems of fact-based fake news detection. Second, the paper summarizes the existing detection methods based on the algorithm principles. Third, the paper analyzes the classic and newly proposed datasets in the field, and summarizes the experimental...

Find SimilarView on arXiv

A Benchmark Study of Machine Learning Models for Online Fake News Detection

May 12, 2019

92% Match
Junaed Younus Khan, Md. Tawkat Islam Khondaker, Sadia Afroz, ... , Iqbal Anindya
Computation and Language
Information Retrieval
Machine Learning
Machine Learning

The proliferation of fake news and its propagation on social media has become a major concern due to its ability to create devastating impacts. Different machine learning approaches have been suggested to detect fake news. However, most of those focused on a specific type of news (such as political) which leads us to the question of dataset-bias of the models used. In this research, we conducted a benchmark study to assess the performance of different applicable machine learn...

Find SimilarView on arXiv

Sieving Fake News From Genuine: A Synopsis

November 19, 2019

92% Match
Shahid Alam, Abdulaziz Ravshanbekov
Cryptography and Security
Computers and Society

With the rise of social media, it has become easier to disseminate fake news faster and cheaper, compared to traditional news media, such as television and newspapers. Recently this phenomenon has attracted lot of public attention, because it is causing significant social and financial impacts on their lives and businesses. Fake news are responsible for creating false, deceptive, misleading, and suspicious information that can greatly effect the outcome of an event. This pape...

Find SimilarView on arXiv

FakeNewsNet: A Data Repository with News Content, Social Context and Spatialtemporal Information for Studying Fake News on Social Media

September 5, 2018

92% Match
Kai Shu, Deepak Mahudeswaran, Suhang Wang, ... , Liu Huan
Social and Information Netwo...

Social media has become a popular means for people to consume news. Meanwhile, it also enables the wide dissemination of fake news, i.e., news with intentionally false information, which brings significant negative effects to the society. Thus, fake news detection is attracting increasing attention. However, fake news detection is a non-trivial task, which requires multi-source information such as news content, social context, and dynamic information. First, fake news is writ...

Find SimilarView on arXiv