Ukrainian Texts Classification: Exploration of Cross-lingual Knowledge Transfer Approaches

April 2, 2024

Zero-shot Cross-lingual Stance Detection via Adversarial Language Adaptation

April 22, 2024

88% Match

Bharathi A, Arkaitz Zubiaga

Computation and Language

Stance detection has been widely studied as the task of determining if a social media post is positive, negative or neutral towards a specific issue, such as support towards vaccines. Research in stance detection has however often been limited to a single language and, where more than one language has been studied, research has focused on few-shot settings, overlooking the challenges of developing a zero-shot cross-lingual stance detection model. This paper makes the first su...

Find SimilarView on arXiv

When a Language Question Is at Stake. A Revisited Approach to Label Sensitive Content

November 17, 2023

88% Match

Stetsenko Daria

Computation and Language

Many under-resourced languages require high-quality datasets for specific tasks such as offensive language detection, disinformation, or misinformation identification. However, the intricacies of the content may have a detrimental effect on the annotators. The article aims to revisit an approach of pseudo-labeling sensitive data on the example of Ukrainian tweets covering the Russian-Ukrainian war. Nowadays, this acute topic is in the spotlight of various language manipulatio...

Find SimilarView on arXiv

Exploring Methods for Cross-lingual Text Style Transfer: The Case of Text Detoxification

November 23, 2023

88% Match

Daryna Dementieva, Daniil Moskovskiy, ... , Panchenko Alexander

Computation and Language

Text detoxification is the task of transferring the style of text from toxic to neutral. While here are approaches yielding promising results in monolingual setup, e.g., (Dale et al., 2021; Hallinan et al., 2022), cross-lingual transfer for this task remains a challenging open problem (Moskovskiy et al., 2022). In this work, we present a large-scale study of strategies for cross-lingual text detoxification -- given a parallel detoxification corpus for one language; the goal i...

Find SimilarView on arXiv

Enhancing Multilingual Voice Toxicity Detection with Speech-Text Alignment

June 14, 2024

88% Match

Joseph Liu, Mahesh Kumar Nandwana, Janne Pylkkönen, ... , McGuire Morgan

Computation and Language

Machine Learning

Audio and Speech Processing

Toxicity classification for voice heavily relies on the semantic content of speech. We propose a novel framework that utilizes cross-modal learning to integrate the semantic embedding of text into a multilabel speech toxicity classifier during training. This enables us to incorporate textual information during training while still requiring only audio during inference. We evaluate this classifier on large-scale datasets with real-world characteristics to validate the effectiv...

Find SimilarView on arXiv

Automated multilingual detection of Pro-Kremlin propaganda in newspapers and Telegram posts

January 25, 2023

88% Match

Veronika Solopova, Oana-Iuliana Popescu, ... , Landgraf Tim

Computation and Language

Machine Learning

The full-scale conflict between the Russian Federation and Ukraine generated an unprecedented amount of news articles and social media data reflecting opposing ideologies and narratives. These polarized campaigns have led to mutual accusations of misinformation and fake news, shaping an atmosphere of confusion and mistrust for readers worldwide. This study analyses how the media affected and mirrored public opinion during the first month of the war using news articles and Tel...

Find SimilarView on arXiv

Monolingual and Cross-Lingual Knowledge Transfer for Topic Classification

June 13, 2023

88% Match

Dmitry Karpov, Mikhail Burtsev

Computation and Language

Artificial Intelligence

This article investigates the knowledge transfer from the RuQTopics dataset. This Russian topical dataset combines a large sample number (361,560 single-label, 170,930 multi-label) with extensive class coverage (76 classes). We have prepared this dataset from the "Yandex Que" raw data. By evaluating the RuQTopics - trained models on the six matching classes of the Russian MASSIVE subset, we have proved that the RuQTopics dataset is suitable for real-world conversational tasks...

Find Similar View on arXiv

Universal Language Model Fine-tuning for Text Classification

January 18, 2018

88% Match

Jeremy Howard, Sebastian Ruder

Computation and Language

Machine Learning

Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch. We propose Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a language model. Our method significantly outperforms the state-of-the-art on six text classification tasks, reducing the e...

Find SimilarView on arXiv

Transfer Learning for Multi-lingual Tasks -- a Survey

August 28, 2021

88% Match

Amir Reza Jafari, Behnam Heidary, Reza Farahbakhsh, ... , Jalili Mahdi

Computation and Language

These days different platforms such as social media provide their clients from different backgrounds and languages the possibility to connect and exchange information. It is not surprising anymore to see comments from different languages in posts published by international celebrities or data providers. In this era, understanding cross languages content and multilingualism in natural language processing (NLP) are hot topics, and multiple efforts have tried to leverage existin...

Find SimilarView on arXiv

Multilingual LLMs are Better Cross-lingual In-context Learners with Alignment

May 10, 2023

88% Match

Eshaan Tanwar, Subhabrata Dutta, ... , Chakraborty Tanmoy

Computation and Language

In-context learning (ICL) unfolds as large language models become capable of inferring test labels conditioned on a few labeled samples without any gradient update. ICL-enabled large language models provide a promising step forward toward bypassing recurrent annotation costs in a low-resource setting. Yet, only a handful of past studies have explored ICL in a cross-lingual setting, in which the need for transferring label-knowledge from a high-resource language to a low-resou...

Find SimilarView on arXiv

A New Generation of Perspective API: Efficient Multilingual Character-level Transformers

February 22, 2022

88% Match

Alyssa Lees, Vinh Q. Tran, Yi Tay, Jeffrey Sorensen, Jai Gupta, ... , Vasserman Lucy

Computation and Language

Artificial Intelligence

Computers and Society

Machine Learning

On the world wide web, toxic content detectors are a crucial line of defense against potentially hateful and offensive messages. As such, building highly effective classifiers that enable a safer internet is an important research area. Moreover, the web is a highly multilingual, cross-cultural community that develops its own lingo over time. As such, it is crucial to develop models that are effective across a diverse range of languages, usages, and styles. In this paper, we p...

Find SimilarView on arXiv