Monolingual and Cross-Lingual Knowledge ...

Transfer Learning for Improving Results on Russian Sentiment Datasets

July 6, 2021

88% Match

Anton Golubev, Natalia Loukachevitch

Computation and Language

In this study, we test transfer learning approach on Russian sentiment benchmark datasets using additional train sample created with distant supervision technique. We compare several variants of combining additional data with benchmark train samples. The best results were achieved using three-step approach of sequential training on general, thematic and original train samples. For most datasets, the results were improved by more than 3% to the current state-of-the-art methods...

Find Similar View on arXiv

Revisiting the Primacy of English in Zero-shot Cross-lingual Transfer

June 30, 2021

88% Match

Iulia Turc, Kenton Lee, Jacob Eisenstein, ... , Toutanova Kristina

Computation and Language

Despite their success, large pre-trained multilingual models have not completely alleviated the need for labeled data, which is cumbersome to collect for all target languages. Zero-shot cross-lingual transfer is emerging as a practical solution: pre-trained models later fine-tuned on one transfer language exhibit surprising performance when tested on many target languages. English is the dominant source language for transfer, as reinforced by popular zero-shot benchmarks. How...

Find Similar View on arXiv

Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language

May 17, 2019

88% Match

Yuri Kuratov, Mikhail Arkhipov

Computation and Language

The paper introduces methods of adaptation of multilingual masked language models for a specific language. Pre-trained bidirectional language models show state-of-the-art performance on a wide range of tasks including reading comprehension, natural language inference, and sentiment analysis. At the moment there are two alternative approaches to train such models: monolingual and multilingual. While language specific models show superior performance, multilingual models allow ...

Find Similar View on arXiv

Universal Cross-Lingual Text Classification

June 16, 2024

88% Match

Riya Savant, Anushka Shelke, Sakshi Todmal, Sanskruti Kanphade, ... , Joshi Raviraj

Computation and Language

Machine Learning

Text classification, an integral task in natural language processing, involves the automatic categorization of text into predefined classes. Creating supervised labeled datasets for low-resource languages poses a considerable challenge. Unlocking the language potential of low-resource languages requires robust datasets with supervised labels. However, such datasets are scarce, and the label space is often limited. In our pursuit to address this gap, we aim to optimize existin...

Find Similar View on arXiv

Multilingual Few-Shot Learning via Language Model Retrieval

June 19, 2023

88% Match

Genta Indra Winata, Liang-Kang Huang, ... , Chandarana Yash

Computation and Language

Transformer-based language models have achieved remarkable success in few-shot in-context learning and drawn a lot of research interest. However, these models' performance greatly depends on the choice of the example prompts and also has high variability depending on how samples are chosen. In this paper, we conduct a comprehensive study of retrieving semantically similar few-shot samples and using them as the context, as it helps the model decide the correct label without an...

Find Similar View on arXiv

Zero-Shot Cross-Lingual Transfer in Legal Domain Using Transformer Models

November 28, 2021

88% Match

Zein Shaheen, Gerhard Wohlgenannt, Dmitry Mouromtsev

Computation and Language

Artificial Intelligence

Zero-shot cross-lingual transfer is an important feature in modern NLP models and architectures to support low-resource languages. In this work, We study zero-shot cross-lingual transfer from English to French and German under Multi-Label Text Classification, where we train a classifier using English training set, and we test using French and German test sets. We extend EURLEX57K dataset, the English dataset for topic classification of legal documents, with French and German ...

Find Similar View on arXiv

Fine-tuning Encoders for Improved Monolingual and Zero-shot Polylingual Neural Topic Modeling

April 11, 2021

88% Match

Aaron Mueller, Mark Dredze

Computation and Language

Neural topic models can augment or replace bag-of-words inputs with the learned representations of deep pre-trained transformer-based word prediction models. One added benefit when using representations from multilingual models is that they facilitate zero-shot polylingual topic modeling. However, while it has been widely observed that pre-trained embeddings should be fine-tuned to a given task, it is not immediately clear what supervision should look like for an unsupervised...

Find Similar View on arXiv

Ukrainian Texts Classification: Exploration of Cross-lingual Knowledge Transfer Approaches

April 2, 2024

88% Match

Daryna Dementieva, Valeriia Khylenko, Georg Groh

Computation and Language

Artificial Intelligence

Despite the extensive amount of labeled datasets in the NLP text classification field, the persistent imbalance in data availability across various languages remains evident. Ukrainian, in particular, stands as a language that still can benefit from the continued refinement of cross-lingual methodologies. Due to our knowledge, there is a tremendous lack of Ukrainian corpora for typical text classification tasks. In this work, we leverage the state-of-the-art advances in NLP, ...

Find Similar View on arXiv

DRAFT: Dense Retrieval Augmented Few-shot Topic classifier Framework

December 5, 2023

88% Match

Keonwoo Kim, Younggun Lee

Information Retrieval

Computation and Language

With the growing volume of diverse information, the demand for classifying arbitrary topics has become increasingly critical. To address this challenge, we introduce DRAFT, a simple framework designed to train a classifier for few-shot topic classification. DRAFT uses a few examples of a specific topic as queries to construct Customized dataset with a dense retriever model. Multi-query retrieval (MQR) algorithm, which effectively handles multiple queries related to a specific...

Find Similar View on arXiv

RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark

October 29, 2020

88% Match

Tatiana Shavrina, Alena Fenogenova, Anton Emelyanov, Denis Shevelev, Ekaterina Artemova, Valentin Malykh, Vladislav Mikhailov, Maria Tikhonova, ... , Evlampiev Andrey

Computation and Language

Artificial Intelligence

In this paper, we introduce an advanced Russian general language understanding evaluation benchmark -- RussianGLUE. Recent advances in the field of universal language models and transformers require the development of a methodology for their broad diagnostics and testing for general intellectual skills - detection of natural language inference, commonsense reasoning, ability to perform simple logical operations regardless of text subject or lexicon. For the first time, a benc...

Find Similar View on arXiv

Monolingual and Cross-Lingual Knowledge Transfer for Topic Classification

Transfer Learning for Improving Results on Russian Sentiment Datasets

Revisiting the Primacy of English in Zero-shot Cross-lingual Transfer

Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language

Universal Cross-Lingual Text Classification

Multilingual Few-Shot Learning via Language Model Retrieval

Zero-Shot Cross-Lingual Transfer in Legal Domain Using Transformer Models

Fine-tuning Encoders for Improved Monolingual and Zero-shot Polylingual Neural Topic Modeling

Ukrainian Texts Classification: Exploration of Cross-lingual Knowledge Transfer Approaches

DRAFT: Dense Retrieval Augmented Few-shot Topic classifier Framework

RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark