RussianSuperGLUE: A Russian Language Und...

Long Input Benchmark for Russian Analysis

August 5, 2024

91% Match

Igor Churin, Murat Apishev, Maria Tikhonova, Denis Shevelev, Aydar Bulatov, Yuri Kuratov, ... , Fenogenova Alena

Computation and Language

Artificial Intelligence

Recent advancements in Natural Language Processing (NLP) have fostered the development of Large Language Models (LLMs) that can solve an immense variety of tasks. One of the key aspects of their application is their ability to work with long text documents and to process long sequences of tokens. This has created a demand for proper evaluation of long-context understanding. To address this need for the Russian language, we propose LIBRA (Long Input Benchmark for Russian Analy...

Find SimilarView on arXiv

SberQuAD -- Russian Reading Comprehension Dataset: Description and Analysis

December 20, 2019

90% Match

Pavel Efimov, Andrey Chertok, ... , Braslavski Pavel

Computation and Language

SberQuAD -- a large scale analog of Stanford SQuAD in the Russian language - is a valuable resource that has not been properly presented to the scientific community. We fill this gap by providing a description, a thorough analysis, and baseline experimental results.

Find SimilarView on arXiv

jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models

March 4, 2020

90% Match

Yada Pruksachatkun, Phil Yeres, Haokun Liu, Jason Phang, Phu Mon Htut, Alex Wang, ... , Bowman Samuel R.

Computation and Language

We introduce jiant, an open source toolkit for conducting multitask and transfer learning experiments on English NLU tasks. jiant enables modular and configuration-driven experimentation with state-of-the-art models and implements a broad set of tasks for probing, transfer learning, and multitask training experiments. jiant implements over 50 NLU tasks, including all GLUE and SuperGLUE benchmark tasks. We demonstrate that jiant reproduces published performance on a variety of...

Find SimilarView on arXiv

XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

April 3, 2020

90% Match

Yaobo Liang, Nan Duan, Yeyun Gong, Ning Wu, Fenfei Guo, Weizhen Qi, Ming Gong, Linjun Shou, Daxin Jiang, Guihong Cao, Xiaodong Fan, Ruofei Zhang, Rahul Agrawal, Edward Cui, Sining Wei, Taroon Bharti, Ying Qiao, Jiun-Hung Chen, Winnie Wu, Shuguang Liu, Fan Yang, Daniel Campos, ... , Zhou Ming

Computation and Language

In this paper, we introduce XGLUE, a new benchmark dataset that can be used to train large-scale cross-lingual pre-trained models using multilingual and bilingual corpora and evaluate their performance across a diverse set of cross-lingual tasks. Comparing to GLUE(Wang et al., 2019), which is labeled in English for natural language understanding tasks only, XGLUE has two main advantages: (1) it provides 11 diversified tasks that cover both natural language understanding and g...

Find SimilarView on arXiv

Impact of Tokenization on LLaMa Russian Adaptation

December 5, 2023

90% Match

Mikhail Tikhomirov, Daniil Chernyshev

Computation and Language

Artificial Intelligence

Latest instruction-tuned large language models (LLM) show great results on various tasks, however, they often face performance degradation for non-English input. There is evidence that the reason lies in inefficient tokenization caused by low language representation in pre-training data which hinders the comprehension of non-English instructions, limiting the potential of target language instruction-tuning. In this work we investigate the possibility of addressing the issue w...

Find SimilarView on arXiv

MERA: A Comprehensive LLM Evaluation in Russian

January 9, 2024

90% Match

Alena Fenogenova, Artem Chervyakov, Nikita Martynov, Anastasia Kozlova, Maria Tikhonova, Albina Akhmetgareeva, Anton Emelyanov, Denis Shevelev, Pavel Lebedev, Leonid Sinev, Ulyana Isaeva, Katerina Kolomeytseva, Daniil Moskovskiy, Elizaveta Goncharova, Nikita Savushkin, Polina Mikhailova, Denis Dimitrov, ... , Markov Sergei

Computation and Language

Artificial Intelligence

Over the past few years, one of the most notable advancements in AI research has been in foundation models (FMs), headlined by the rise of language models (LMs). As the models' size increases, LMs demonstrate enhancements in measurable aspects and the development of new qualitative features. However, despite researchers' attention and the rapid growth in LM application, the capabilities, limitations, and associated risks still need to be better understood. To address these is...

Find SimilarView on arXiv

What's the Meaning of Superhuman Performance in Today's NLU?

May 15, 2023

90% Match

Simone Tedeschi, Johan Bos, Thierry Declerck, Jan Hajic, Daniel Hershcovich, Eduard H. Hovy, Alexander Koller, Simon Krek, Steven Schockaert, Rico Sennrich, ... , Navigli Roberto

Computation and Language

Artificial Intelligence

In the last five years, there has been a significant focus in Natural Language Processing (NLP) on developing larger Pretrained Language Models (PLMs) and introducing benchmarks such as SuperGLUE and SQuAD to measure their abilities in language understanding, reasoning, and reading comprehension. These PLMs have achieved impressive results on these benchmarks, even surpassing human performance in some cases. This has led to claims of superhuman capabilities and the provocativ...

Find SimilarView on arXiv

CLUE: A Chinese Language Understanding Evaluation Benchmark

April 13, 2020

90% Match

Liang Xu, Hai Hu, Xuanwei Zhang, Lu Li, Chenjie Cao, Yudong Li, Yechen Xu, Kai Sun, Dian Yu, Cong Yu, Yin Tian, Qianqian Dong, Weitang Liu, Bo Shi, Yiming Cui, Junyi Li, Jun Zeng, Rongzhao Wang, Weijian Xie, Yanting Li, Yina Patterson, Zuoyu Tian, Yiwen Zhang, He Zhou, Shaoweihua Liu, Zhe Zhao, Qipeng Zhao, Cong Yue, Xinrui Zhang, Zhengliang Yang, ... , Lan Zhenzhong

Computation and Language

Machine Learning

The advent of natural language understanding (NLU) benchmarks for English, such as GLUE and SuperGLUE allows new NLU models to be evaluated across a diverse set of tasks. These comprehensive benchmarks have facilitated a broad range of research and applications in natural language processing (NLP). The problem, however, is that most such benchmarks are limited to English, which has made it difficult to replicate many of the successes in English NLU for other languages. To hel...

Find SimilarView on arXiv

Vikhr: The Family of Open-Source Instruction-Tuned Large Language Models for Russian

May 22, 2024

90% Match

Aleksandr Nikolich, Konstantin Korolev, Artem Shelmanov

Computation and Language

Artificial Intelligence

There has been a surge in the development of various Large Language Models (LLMs). However, text generation for languages other than English often faces significant challenges, including poor generation quality and the reduced computational performance due to the disproportionate representation of tokens in model's vocabulary. In this work, we address these issues and introduce Vikhr, a new state-of-the-art open-source instruction-tuned LLM designed specifically for the Russi...

Find SimilarView on arXiv

GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-distribution Generalization Perspective

November 15, 2022

89% Match

Linyi Yang, Shuibai Zhang, Libo Qin, Yafu Li, Yidong Wang, Hanmeng Liu, Jindong Wang, ... , Zhang Yue

Computation and Language

Artificial Intelligence

Machine Learning

Performance

Pre-trained language models (PLMs) are known to improve the generalization performance of natural language understanding models by leveraging large amounts of data during the pre-training phase. However, the out-of-distribution (OOD) generalization problem remains a challenge in many NLP tasks, limiting the real-world deployment of these methods. This paper presents the first attempt at creating a unified benchmark named GLUE-X for evaluating OOD robustness in NLP models, hig...

Find SimilarView on arXiv

RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark

Long Input Benchmark for Russian Analysis

SberQuAD -- Russian Reading Comprehension Dataset: Description and Analysis

jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models

XGLUE: A New Benchmark Dataset for Cross-lingual Pre-training, Understanding and Generation

Impact of Tokenization on LLaMa Russian Adaptation

MERA: A Comprehensive LLM Evaluation in Russian

What's the Meaning of Superhuman Performance in Today's NLU?

CLUE: A Chinese Language Understanding Evaluation Benchmark

Vikhr: The Family of Open-Source Instruction-Tuned Large Language Models for Russian

GLUE-X: Evaluating Natural Language Understanding Models from an Out-of-distribution Generalization Perspective