ID: 2106.16171

Revisiting the Primacy of English in Zero-shot Cross-lingual Transfer

June 30, 2021

View on ArXiv
Iulia Turc, Kenton Lee, Jacob Eisenstein, Ming-Wei Chang, Kristina Toutanova
Computer Science
Computation and Language

Despite their success, large pre-trained multilingual models have not completely alleviated the need for labeled data, which is cumbersome to collect for all target languages. Zero-shot cross-lingual transfer is emerging as a practical solution: pre-trained models later fine-tuned on one transfer language exhibit surprising performance when tested on many target languages. English is the dominant source language for transfer, as reinforced by popular zero-shot benchmarks. However, this default choice has not been systematically vetted. In our study, we compare English against other transfer languages for fine-tuning, on two pre-trained multilingual models (mBERT and mT5) and multiple classification and question answering tasks. We find that other high-resource languages such as German and Russian often transfer more effectively, especially when the set of target languages is diverse or unknown a priori. Unexpectedly, this can be true even when the training sets were automatically translated from English. This finding can have immediate impact on multilingual zero-shot systems, and should inform future benchmark designs.

Similar papers 1

A Primer on Pretrained Multilingual Language Models

July 1, 2021

93% Match
Sumanth Doddapaneni, Gowtham Ramesh, Mitesh M. Khapra, ... , Kumar Pratyush
Computation and Language

Multilingual Language Models (\MLLMs) such as mBERT, XLM, XLM-R, \textit{etc.} have emerged as a viable option for bringing the power of pretraining to a large number of languages. Given their success in zero-shot transfer learning, there has emerged a large body of work in (i) building bigger \MLLMs~covering a large number of languages (ii) creating exhaustive benchmarks covering a wider variety of tasks and languages for evaluating \MLLMs~ (iii) analysing the performance of...

Find SimilarView on arXiv

From Zero to Hero: On the Limitations of Zero-Shot Cross-Lingual Transfer with Multilingual Transformers

May 1, 2020

93% Match
Anne Lauscher, Vinit Ravishankar, ... , Glavaš Goran
Computation and Language

Massively multilingual transformers pretrained with language modeling objectives (e.g., mBERT, XLM-R) have become a de facto default transfer paradigm for zero-shot cross-lingual transfer in NLP, offering unmatched transfer performance. Current downstream evaluations, however, verify their efficacy predominantly in transfer settings involving languages with sufficient amounts of pretraining data, and with lexically and typologically close languages. In this work, we analyze t...

Find SimilarView on arXiv

Model Selection for Cross-Lingual Transfer

October 13, 2020

93% Match
Yang Chen, Alan Ritter
Computation and Language
Machine Learning

Transformers that are pre-trained on multilingual corpora, such as, mBERT and XLM-RoBERTa, have achieved impressive cross-lingual transfer capabilities. In the zero-shot transfer setting, only English training data is used, and the fine-tuned model is evaluated on another target language. While this works surprisingly well, substantial variance has been observed in target language performance between different fine-tuning runs, and in the zero-shot setup, no target-language d...

Find SimilarView on arXiv

Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

September 1, 2019

93% Match
Aditya Siddhant, Melvin Johnson, Henry Tsai, Naveen Arivazhagan, Jason Riesa, Ankur Bapna, ... , Raman Karthik
Computation and Language

The recently proposed massively multilingual neural machine translation (NMT) system has been shown to be capable of translating over 100 languages to and from English within a single model. Its improved translation performance on low resource languages hints at potential cross-lingual transfer capability for downstream tasks. In this paper, we evaluate the cross-lingual effectiveness of representations from the encoder of a massively multilingual NMT model on 5 downstream cl...

Find SimilarView on arXiv

A Closer Look at Few-Shot Crosslingual Transfer: The Choice of Shots Matters

December 31, 2020

93% Match
Mengjie Zhao, Yi Zhu, Ehsan Shareghi, Ivan Vulić, Roi Reichart, ... , Schütze Hinrich
Computation and Language

Few-shot crosslingual transfer has been shown to outperform its zero-shot counterpart with pretrained encoders like multilingual BERT. Despite its growing popularity, little to no attention has been paid to standardizing and analyzing the design of few-shot experiments. In this work, we highlight a fundamental risk posed by this shortcoming, illustrating that the model exhibits a high degree of sensitivity to the selection of few shots. We conduct a large-scale experimental s...

Find SimilarView on arXiv

Zero-Shot Cross-Lingual Transfer with Meta Learning

March 5, 2020

93% Match
Farhad Nooralahzadeh, Giannis Bekoulis, ... , Augenstein Isabelle
Computation and Language

Learning what to share between tasks has been a topic of great importance recently, as strategic sharing of knowledge has been shown to improve downstream task performance. This is particularly important for multilingual applications, as most languages in the world are under-resourced. Here, we consider the setting of training models on multiple different languages at the same time, when little or no data is available for languages other than English. We show that this challe...

Find SimilarView on arXiv

Analyzing the Evaluation of Cross-Lingual Knowledge Transfer in Multilingual Language Models

February 3, 2024

93% Match
Sara Rajaee, Christof Monz
Computation and Language
Artificial Intelligence
Machine Learning

Recent advances in training multilingual language models on large datasets seem to have shown promising results in knowledge transfer across languages and achieve high performance on downstream tasks. However, we question to what extent the current evaluation benchmarks and setups accurately measure zero-shot cross-lingual knowledge transfer. In this work, we challenge the assumption that high zero-shot performance on target tasks reflects high cross-lingual ability by introd...

Find SimilarView on arXiv

Zero-shot Reading Comprehension by Cross-lingual Transfer Learning with Multi-lingual Language Representation Model

September 15, 2019

93% Match
Tsung-yuan Hsu, Chi-liang Liu, Hung-yi Lee
Computation and Language
Machine Learning
Machine Learning

Because it is not feasible to collect training data for every language, there is a growing interest in cross-lingual transfer learning. In this paper, we systematically explore zero-shot cross-lingual transfer learning on reading comprehension tasks with a language representation model pre-trained on multi-lingual corpus. The experimental results show that with pre-trained language representation zero-shot learning is feasible, and translating the source data into the target ...

Find SimilarView on arXiv

How multilingual is Multilingual BERT?

June 4, 2019

93% Match
Telmo Pires, Eva Schlinger, Dan Garrette
Computation and Language
Artificial Intelligence
Machine Learning

In this paper, we show that Multilingual BERT (M-BERT), released by Devlin et al. (2018) as a single language model pre-trained from monolingual corpora in 104 languages, is surprisingly good at zero-shot cross-lingual model transfer, in which task-specific annotations in one language are used to fine-tune the model for evaluation in another language. To understand why, we present a large number of probing experiments, showing that transfer is possible even to languages in di...

Find SimilarView on arXiv

Multi Task Learning For Zero Shot Performance Prediction of Multilingual Models

May 12, 2022

93% Match
Kabir Ahuja, Shanu Kumar, ... , Choudhury Monojit
Computation and Language

Massively Multilingual Transformer based Language Models have been observed to be surprisingly effective on zero-shot transfer across languages, though the performance varies from language to language depending on the pivot language(s) used for fine-tuning. In this work, we build upon some of the existing techniques for predicting the zero-shot performance on a task, by modeling it as a multi-task learning problem. We jointly train predictive models for different tasks which ...

Find SimilarView on arXiv