ID: 2404.02043

Ukrainian Texts Classification: Exploration of Cross-lingual Knowledge Transfer Approaches

April 2, 2024

View on ArXiv

Similar papers 2

Zero-shot Cross-lingual Stance Detection via Adversarial Language Adaptation

April 22, 2024

88% Match
Bharathi A, Arkaitz Zubiaga
Computation and Language

Stance detection has been widely studied as the task of determining if a social media post is positive, negative or neutral towards a specific issue, such as support towards vaccines. Research in stance detection has however often been limited to a single language and, where more than one language has been studied, research has focused on few-shot settings, overlooking the challenges of developing a zero-shot cross-lingual stance detection model. This paper makes the first su...

Find SimilarView on arXiv

When a Language Question Is at Stake. A Revisited Approach to Label Sensitive Content

November 17, 2023

88% Match
Stetsenko Daria
Computation and Language

Many under-resourced languages require high-quality datasets for specific tasks such as offensive language detection, disinformation, or misinformation identification. However, the intricacies of the content may have a detrimental effect on the annotators. The article aims to revisit an approach of pseudo-labeling sensitive data on the example of Ukrainian tweets covering the Russian-Ukrainian war. Nowadays, this acute topic is in the spotlight of various language manipulatio...

Find SimilarView on arXiv

Exploring Methods for Cross-lingual Text Style Transfer: The Case of Text Detoxification

November 23, 2023

88% Match
Daryna Dementieva, Daniil Moskovskiy, ... , Panchenko Alexander
Computation and Language

Text detoxification is the task of transferring the style of text from toxic to neutral. While here are approaches yielding promising results in monolingual setup, e.g., (Dale et al., 2021; Hallinan et al., 2022), cross-lingual transfer for this task remains a challenging open problem (Moskovskiy et al., 2022). In this work, we present a large-scale study of strategies for cross-lingual text detoxification -- given a parallel detoxification corpus for one language; the goal i...

Find SimilarView on arXiv

Enhancing Multilingual Voice Toxicity Detection with Speech-Text Alignment

June 14, 2024

88% Match
Joseph Liu, Mahesh Kumar Nandwana, Janne Pylkkönen, ... , McGuire Morgan
Computation and Language
Machine Learning
Audio and Speech Processing

Toxicity classification for voice heavily relies on the semantic content of speech. We propose a novel framework that utilizes cross-modal learning to integrate the semantic embedding of text into a multilabel speech toxicity classifier during training. This enables us to incorporate textual information during training while still requiring only audio during inference. We evaluate this classifier on large-scale datasets with real-world characteristics to validate the effectiv...

Find SimilarView on arXiv

Automated multilingual detection of Pro-Kremlin propaganda in newspapers and Telegram posts

January 25, 2023

88% Match
Veronika Solopova, Oana-Iuliana Popescu, ... , Landgraf Tim
Computation and Language
Machine Learning

The full-scale conflict between the Russian Federation and Ukraine generated an unprecedented amount of news articles and social media data reflecting opposing ideologies and narratives. These polarized campaigns have led to mutual accusations of misinformation and fake news, shaping an atmosphere of confusion and mistrust for readers worldwide. This study analyses how the media affected and mirrored public opinion during the first month of the war using news articles and Tel...

Find SimilarView on arXiv
Dmitry Karpov, Mikhail Burtsev
Computation and Language
Artificial Intelligence

This article investigates the knowledge transfer from the RuQTopics dataset. This Russian topical dataset combines a large sample number (361,560 single-label, 170,930 multi-label) with extensive class coverage (76 classes). We have prepared this dataset from the "Yandex Que" raw data. By evaluating the RuQTopics - trained models on the six matching classes of the Russian MASSIVE subset, we have proved that the RuQTopics dataset is suitable for real-world conversational tasks...

Universal Language Model Fine-tuning for Text Classification

January 18, 2018

88% Match
Jeremy Howard, Sebastian Ruder
Computation and Language
Machine Learning
Machine Learning

Inductive transfer learning has greatly impacted computer vision, but existing approaches in NLP still require task-specific modifications and training from scratch. We propose Universal Language Model Fine-tuning (ULMFiT), an effective transfer learning method that can be applied to any task in NLP, and introduce techniques that are key for fine-tuning a language model. Our method significantly outperforms the state-of-the-art on six text classification tasks, reducing the e...

Find SimilarView on arXiv

Transfer Learning for Multi-lingual Tasks -- a Survey

August 28, 2021

88% Match
Amir Reza Jafari, Behnam Heidary, Reza Farahbakhsh, ... , Jalili Mahdi
Computation and Language

These days different platforms such as social media provide their clients from different backgrounds and languages the possibility to connect and exchange information. It is not surprising anymore to see comments from different languages in posts published by international celebrities or data providers. In this era, understanding cross languages content and multilingualism in natural language processing (NLP) are hot topics, and multiple efforts have tried to leverage existin...

Find SimilarView on arXiv

Multilingual LLMs are Better Cross-lingual In-context Learners with Alignment

May 10, 2023

88% Match
Eshaan Tanwar, Subhabrata Dutta, ... , Chakraborty Tanmoy
Computation and Language

In-context learning (ICL) unfolds as large language models become capable of inferring test labels conditioned on a few labeled samples without any gradient update. ICL-enabled large language models provide a promising step forward toward bypassing recurrent annotation costs in a low-resource setting. Yet, only a handful of past studies have explored ICL in a cross-lingual setting, in which the need for transferring label-knowledge from a high-resource language to a low-resou...

Find SimilarView on arXiv

A New Generation of Perspective API: Efficient Multilingual Character-level Transformers

February 22, 2022

88% Match
Alyssa Lees, Vinh Q. Tran, Yi Tay, Jeffrey Sorensen, Jai Gupta, ... , Vasserman Lucy
Computation and Language
Artificial Intelligence
Computers and Society
Machine Learning

On the world wide web, toxic content detectors are a crucial line of defense against potentially hateful and offensive messages. As such, building highly effective classifiers that enable a safer internet is an important research area. Moreover, the web is a highly multilingual, cross-cultural community that develops its own lingo over time. As such, it is crucial to develop models that are effective across a diverse range of languages, usages, and styles. In this paper, we p...

Find SimilarView on arXiv