Adaptation of Deep Bidirectional Multili...

A Family of Pretrained Transformer Language Models for Russian

September 19, 2023

93% Match

Dmitry Zmitrovich, Alexander Abramov, Andrey Kalmykov, Maria Tikhonova, Ekaterina Taktasheva, Danil Astafurov, Mark Baushenko, Artem Snegirev, Tatiana Shavrina, Sergey Markov, ... , Fenogenova Alena

Computation and Language

Nowadays, Transformer language models (LMs) represent a fundamental component of the NLP research methodologies and applications. However, the development of such models specifically for the Russian language has received little attention. This paper presents a collection of 13 Russian Transformer LMs based on the encoder (ruBERT, ruRoBERTa, ruELECTRA), decoder (ruGPT-3), and encoder-decoder (ruT5, FRED-T5) models in multiple sizes. Access to these models is readily available ...

Find SimilarView on arXiv

Vikhr: The Family of Open-Source Instruction-Tuned Large Language Models for Russian

May 22, 2024

92% Match

Aleksandr Nikolich, Konstantin Korolev, Artem Shelmanov

Computation and Language

Artificial Intelligence

There has been a surge in the development of various Large Language Models (LLMs). However, text generation for languages other than English often faces significant challenges, including poor generation quality and the reduced computational performance due to the disproportionate representation of tokens in model's vocabulary. In this work, we address these issues and introduce Vikhr, a new state-of-the-art open-source instruction-tuned LLM designed specifically for the Russi...

Find SimilarView on arXiv

RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark

October 29, 2020

91% Match

Tatiana Shavrina, Alena Fenogenova, Anton Emelyanov, Denis Shevelev, Ekaterina Artemova, Valentin Malykh, Vladislav Mikhailov, Maria Tikhonova, ... , Evlampiev Andrey

Computation and Language

Artificial Intelligence

In this paper, we introduce an advanced Russian general language understanding evaluation benchmark -- RussianGLUE. Recent advances in the field of universal language models and transformers require the development of a methodology for their broad diagnostics and testing for general intellectual skills - detection of natural language inference, commonsense reasoning, ability to perform simple logical operations regardless of text subject or lexicon. For the first time, a benc...

Find SimilarView on arXiv

RuSentEval: Linguistic Source, Encoder Force!

February 28, 2021

91% Match

Vladislav Mikhailov, Ekaterina Taktasheva, ... , Artemova Ekaterina

Computation and Language

The success of pre-trained transformer language models has brought a great deal of interest on how these models work, and what they learn about language. However, prior research in the field is mainly devoted to English, and little is known regarding other languages. To this end, we introduce RuSentEval, an enhanced set of 14 probing tasks for Russian, including ones that have not been explored yet. We apply a combination of complementary probing methods to explore the distri...

Find SimilarView on arXiv

Impact of Tokenization on LLaMa Russian Adaptation

December 5, 2023

90% Match

Mikhail Tikhomirov, Daniil Chernyshev

Computation and Language

Artificial Intelligence

Latest instruction-tuned large language models (LLM) show great results on various tasks, however, they often face performance degradation for non-English input. There is evidence that the reason lies in inefficient tokenization caused by low language representation in pre-training data which hinders the comprehension of non-English instructions, limiting the potential of target language instruction-tuning. In this work we investigate the possibility of addressing the issue w...

Find SimilarView on arXiv

A Primer on Pretrained Multilingual Language Models

July 1, 2021

90% Match

Sumanth Doddapaneni, Gowtham Ramesh, Mitesh M. Khapra, ... , Kumar Pratyush

Computation and Language

Multilingual Language Models (\MLLMs) such as mBERT, XLM, XLM-R, \textit{etc.} have emerged as a viable option for bringing the power of pretraining to a large number of languages. Given their success in zero-shot transfer learning, there has emerged a large body of work in (i) building bigger \MLLMs~covering a large number of languages (ii) creating exhaustive benchmarks covering a wider variety of tasks and languages for evaluating \MLLMs~ (iii) analysing the performance of...

Find SimilarView on arXiv

Larger-Scale Transformers for Multilingual Masked Language Modeling

May 3, 2021

90% Match

Naman Goyal, Jingfei Du, Myle Ott, ... , Conneau Alexis

Computation and Language

Recent work has demonstrated the effectiveness of cross-lingual language model pretraining for cross-lingual understanding. In this study, we present the results of two larger multilingual masked language models, with 3.5B and 10.7B parameters. Our two new models dubbed XLM-R XL and XLM-R XXL outperform XLM-R by 1.8% and 2.4% average accuracy on XNLI. Our model also outperforms the RoBERTa-Large model on several English tasks of the GLUE benchmark by 0.3% on average while han...

Find SimilarView on arXiv

Knowledge Distillation of Russian Language Models with Reduction of Vocabulary

May 4, 2022

90% Match

Alina Kolesnikova, Yuri Kuratov, ... , Burtsev Mikhail

Computation and Language

Machine Learning

Today, transformer language models serve as a core component for majority of natural language processing tasks. Industrial application of such models requires minimization of computation time and memory footprint. Knowledge distillation is one of approaches to address this goal. Existing methods in this field are mainly focused on reducing the number of layers or dimension of embeddings/hidden representations. Alternative option is to reduce the number of tokens in vocabulary...

Find SimilarView on arXiv

Exploiting Out-of-Domain Parallel Data through Multilingual Transfer Learning for Low-Resource Neural Machine Translation

July 6, 2019

90% Match

Aizhan Imankulova, Raj Dabre, ... , Imamura Kenji

Computation and Language

This paper proposes a novel multilingual multistage fine-tuning approach for low-resource neural machine translation (NMT), taking a challenging Japanese--Russian pair for benchmarking. Although there are many solutions for low-resource scenarios, such as multilingual NMT and back-translation, we have empirically confirmed their limited success when restricted to in-domain data. We therefore propose to exploit out-of-domain data through transfer learning, by using it to first...

Find SimilarView on arXiv

MERA: A Comprehensive LLM Evaluation in Russian

January 9, 2024

90% Match

Alena Fenogenova, Artem Chervyakov, Nikita Martynov, Anastasia Kozlova, Maria Tikhonova, Albina Akhmetgareeva, Anton Emelyanov, Denis Shevelev, Pavel Lebedev, Leonid Sinev, Ulyana Isaeva, Katerina Kolomeytseva, Daniil Moskovskiy, Elizaveta Goncharova, Nikita Savushkin, Polina Mikhailova, Denis Dimitrov, ... , Markov Sergei

Computation and Language

Artificial Intelligence

Over the past few years, one of the most notable advancements in AI research has been in foundation models (FMs), headlined by the rise of language models (LMs). As the models' size increases, LMs demonstrate enhancements in measurable aspects and the development of new qualitative features. However, despite researchers' attention and the rapid growth in LM application, the capabilities, limitations, and associated risks still need to be better understood. To address these is...

Find SimilarView on arXiv

Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language

A Family of Pretrained Transformer Language Models for Russian

Vikhr: The Family of Open-Source Instruction-Tuned Large Language Models for Russian

RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark

RuSentEval: Linguistic Source, Encoder Force!

Impact of Tokenization on LLaMa Russian Adaptation

A Primer on Pretrained Multilingual Language Models

Larger-Scale Transformers for Multilingual Masked Language Modeling

Knowledge Distillation of Russian Language Models with Reduction of Vocabulary

Exploiting Out-of-Domain Parallel Data through Multilingual Transfer Learning for Low-Resource Neural Machine Translation

MERA: A Comprehensive LLM Evaluation in Russian