Adaptation of Deep Bidirectional Multili...

Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP models

February 15, 2022

90% Match

Alena Fenogenova, Maria Tikhonova, Vladislav Mikhailov, Tatiana Shavrina, Anton Emelyanov, Denis Shevelev, Alexandr Kukushkin, ... , Artemova Ekaterina

Computation and Language

Artificial Intelligence

In the last year, new neural architectures and multilingual pre-trained models have been released for Russian, which led to performance evaluation problems across a range of language understanding tasks. This paper presents Russian SuperGLUE 1.1, an updated benchmark styled after GLUE for Russian NLP models. The new version includes a number of technical, user experience and methodological improvements, including fixes of the benchmark vulnerabilities unresolved in the prev...

Find SimilarView on arXiv

Unsupervised Cross-lingual Representation Learning at Scale

November 5, 2019

89% Match

Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Edouard Grave, Myle Ott, ... , Stoyanov Veselin

Computation and Language

This paper shows that pretraining multilingual language models at scale leads to significant performance gains for a wide range of cross-lingual transfer tasks. We train a Transformer-based masked language model on one hundred languages, using more than two terabytes of filtered CommonCrawl data. Our model, dubbed XLM-R, significantly outperforms multilingual BERT (mBERT) on a variety of cross-lingual benchmarks, including +14.6% average accuracy on XNLI, +13% average F1 scor...

Find SimilarView on arXiv

mGPT: Few-Shot Learners Go Multilingual

April 15, 2022

89% Match

Oleh Shliazhko, Alena Fenogenova, Maria Tikhonova, Vladislav Mikhailov, ... , Shavrina Tatiana

Computation and Language

Artificial Intelligence

Recent studies report that autoregressive language models can successfully solve many NLP tasks via zero- and few-shot learning paradigms, which opens up new possibilities for using the pre-trained language models. This paper introduces two autoregressive GPT-like models with 1.3 billion and 13 billion parameters trained on 60 languages from 25 language families using Wikipedia and Colossal Clean Crawled Corpus. We reproduce the GPT-3 architecture using GPT-2 sources and the ...

Find SimilarView on arXiv

Cross-Lingual Ability of Multilingual BERT: An Empirical Study

December 17, 2019

89% Match

Karthikeyan K, Zihan Wang, ... , Roth Dan

Computation and Language

Artificial Intelligence

Machine Learning

Recent work has exhibited the surprising cross-lingual abilities of multilingual BERT (M-BERT) -- surprising since it is trained without any cross-lingual objective and with no aligned data. In this work, we provide a comprehensive study of the contribution of different components in M-BERT to its cross-lingual ability. We study the impact of linguistic properties of the languages, the architecture of the model, and the learning objectives. The experimental study is done in t...

Find SimilarView on arXiv

Transfer Learning for Improving Results on Russian Sentiment Datasets

July 6, 2021

89% Match

Anton Golubev, Natalia Loukachevitch

Computation and Language

In this study, we test transfer learning approach on Russian sentiment benchmark datasets using additional train sample created with distant supervision technique. We compare several variants of combining additional data with benchmark train samples. The best results were achieved using three-step approach of sequential training on general, thematic and original train samples. For most datasets, the results were improved by more than 3% to the current state-of-the-art methods...

Find SimilarView on arXiv

TAPE: Assessing Few-shot Russian Language Understanding

October 23, 2022

89% Match

Ekaterina Taktasheva, Tatiana Shavrina, Alena Fenogenova, Denis Shevelev, Nadezhda Katricheva, Maria Tikhonova, Albina Akhmetgareeva, Oleg Zinkevich, Anastasiia Bashmakova, Svetlana Iordanskaia, Alena Spiridonova, Valentina Kurenshchikova, ... , Mikhailov Vladislav

Computation and Language

Recent advances in zero-shot and few-shot learning have shown promise for a scope of research and practical purposes. However, this fast-growing area lacks standardized evaluation suites for non-English languages, hindering progress outside the Anglo-centric paradigm. To address this line of research, we propose TAPE (Text Attack and Perturbation Evaluation), a novel benchmark that includes six more complex NLU tasks for Russian, covering multi-hop reasoning, ethical concepts...

Find SimilarView on arXiv

Towards Fully Bilingual Deep Language Modeling

October 22, 2020

89% Match

Li-Hsin Chang, Sampo Pyysalo, ... , Ginter Filip

Computation and Language

Language models based on deep neural networks have facilitated great advances in natural language processing and understanding tasks in recent years. While models covering a large number of languages have been introduced, their multilinguality has come at a cost in terms of monolingual performance, and the best-performing models at most tasks not involving cross-lingual transfer remain monolingual. In this paper, we consider the question of whether it is possible to pre-train...

Find SimilarView on arXiv

Long Input Benchmark for Russian Analysis

August 5, 2024

89% Match

Igor Churin, Murat Apishev, Maria Tikhonova, Denis Shevelev, Aydar Bulatov, Yuri Kuratov, ... , Fenogenova Alena

Computation and Language

Artificial Intelligence

Recent advancements in Natural Language Processing (NLP) have fostered the development of Large Language Models (LLMs) that can solve an immense variety of tasks. One of the key aspects of their application is their ability to work with long text documents and to process long sequences of tokens. This has created a demand for proper evaluation of long-context understanding. To address this need for the Russian language, we propose LIBRA (Long Input Benchmark for Russian Analy...

Find SimilarView on arXiv

Romanization-based Large-scale Adaptation of Multilingual Language Models

April 18, 2023

89% Match

Sukannya Purkayastha, Sebastian Ruder, Jonas Pfeiffer, ... , Vulić Ivan

Computation and Language

Machine Learning

Large multilingual pretrained language models (mPLMs) have become the de facto state of the art for cross-lingual transfer in NLP. However, their large-scale deployment to many languages, besides pretraining data scarcity, is also hindered by the increase in vocabulary size and limitations in their parameter budget. In order to boost the capacity of mPLMs to deal with low-resource and unseen languages, we explore the potential of leveraging transliteration on a massive scale....

Find SimilarView on arXiv

Pre-training via Paraphrasing

June 26, 2020

89% Match

Mike Lewis, Marjan Ghazvininejad, Gargi Ghosh, Armen Aghajanyan, ... , Zettlemoyer Luke

Computation and Language

Machine Learning

We introduce MARGE, a pre-trained sequence-to-sequence model learned with an unsupervised multi-lingual multi-document paraphrasing objective. MARGE provides an alternative to the dominant masked language modeling paradigm, where we self-supervise the reconstruction of target text by retrieving a set of related texts (in many languages) and conditioning on them to maximize the likelihood of generating the original. We show it is possible to jointly learn to do retrieval and r...

Find SimilarView on arXiv

Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language

Russian SuperGLUE 1.1: Revising the Lessons not Learned by Russian NLP models

Unsupervised Cross-lingual Representation Learning at Scale

mGPT: Few-Shot Learners Go Multilingual

Cross-Lingual Ability of Multilingual BERT: An Empirical Study

Transfer Learning for Improving Results on Russian Sentiment Datasets

TAPE: Assessing Few-shot Russian Language Understanding

Towards Fully Bilingual Deep Language Modeling

Long Input Benchmark for Russian Analysis

Romanization-based Large-scale Adaptation of Multilingual Language Models

Pre-training via Paraphrasing