ID: 2406.11028

Universal Cross-Lingual Text Classification

June 16, 2024

View on ArXiv

Similar papers 2

Cross-Lingual Transfer for Low-Resource Natural Language Processing

February 4, 2025

91% Match
Iker García-Ferrero
Computation and Language

Natural Language Processing (NLP) has seen remarkable advances in recent years, particularly with the emergence of Large Language Models that have achieved unprecedented performance across many tasks. However, these developments have mainly benefited a small number of high-resource languages such as English. The majority of languages still face significant challenges due to the scarcity of training data and computational resources. To address this issue, this thesis focuses o...

Find SimilarView on arXiv

Establishing Baselines for Text Classification in Low-Resource Languages

May 5, 2020

91% Match
Jan Christian Blaise Cruz, Charibeth Cheng
Computation and Language

While transformer-based finetuning techniques have proven effective in tasks that involve low-resource, low-data environments, a lack of properly established baselines and benchmark datasets make it hard to compare different approaches that are aimed at tackling the low-resource setting. In this work, we provide three contributions. First, we introduce two previously unreleased datasets as benchmark datasets for text classification and low-resource multilabel text classificat...

Find SimilarView on arXiv

Low-Resource Text Classification using Domain-Adversarial Learning

July 13, 2018

91% Match
Daniel Grießhaber, Ngoc Thang Vu, Johannes Maucher
Computation and Language

Deep learning techniques have recently shown to be successful in many natural language processing tasks forming state-of-the-art systems. They require, however, a large amount of annotated data which is often missing. This paper explores the use of domain-adversarial learning as a regularizer to avoid overfitting when training domain invariant features for deep, complex neural networks in low-resource and zero-resource settings in new target domains or languages. In case of n...

Find SimilarView on arXiv

T3L: Translate-and-Test Transfer Learning for Cross-Lingual Text Classification

June 8, 2023

91% Match
Inigo Jauregi Unanue, Gholamreza Haffari, Massimo Piccardi
Computation and Language

Cross-lingual text classification leverages text classifiers trained in a high-resource language to perform text classification in other languages with no or minimal fine-tuning (zero/few-shots cross-lingual transfer). Nowadays, cross-lingual text classifiers are typically built on large-scale, multilingual language models (LMs) pretrained on a variety of languages of interest. However, the performance of these models vary significantly across languages and classification tas...

Find SimilarView on arXiv

XAMPLER: Learning to Retrieve Cross-Lingual In-Context Examples

May 8, 2024

91% Match
Peiqin Lin, André F. T. Martins, Hinrich Schütze
Computation and Language

Recent studies have shown that leveraging off-the-shelf or fine-tuned retrievers, capable of retrieving high-quality in-context examples, significantly improves in-context learning of English. However, adapting these methods to other languages, especially low-resource ones, presents challenges due to the scarcity of available cross-lingual retrievers and annotated data. In this paper, we introduce XAMPLER: Cross-Lingual Example Retrieval, a method tailored to tackle the chall...

Find SimilarView on arXiv

Multi-Source Cross-Lingual Model Transfer: Learning What to Share

October 8, 2018

91% Match
Xilun Chen, Ahmed Hassan Awadallah, Hany Hassan, ... , Cardie Claire
Computation and Language
Machine Learning

Modern NLP applications have enjoyed a great boost utilizing neural networks models. Such deep neural models, however, are not applicable to most human languages due to the lack of annotated training data for various NLP tasks. Cross-lingual transfer learning (CLTL) is a viable method for building NLP models for a low-resource target language by leveraging labeled data from other (source) languages. In this work, we focus on the multilingual transfer setting where training da...

Find SimilarView on arXiv

Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

September 1, 2019

91% Match
Aditya Siddhant, Melvin Johnson, Henry Tsai, Naveen Arivazhagan, Jason Riesa, Ankur Bapna, ... , Raman Karthik
Computation and Language

The recently proposed massively multilingual neural machine translation (NMT) system has been shown to be capable of translating over 100 languages to and from English within a single model. Its improved translation performance on low resource languages hints at potential cross-lingual transfer capability for downstream tasks. In this paper, we evaluate the cross-lingual effectiveness of representations from the encoder of a massively multilingual NMT model on 5 downstream cl...

Find SimilarView on arXiv

Zero-shot Cross-lingual Transfer without Parallel Corpus

October 7, 2023

91% Match
Yuyang Zhang, Xiaofeng Han, Baojun Wang
Computation and Language

Recently, although pre-trained language models have achieved great success on multilingual NLP (Natural Language Processing) tasks, the lack of training data on many tasks in low-resource languages still limits their performance. One effective way of solving that problem is to transfer knowledge from rich-resource languages to low-resource languages. However, many previous works on cross-lingual transfer rely heavily on the parallel corpus or translation models, which are oft...

Find SimilarView on arXiv

Multilingual Text Classification for Dravidian Languages

December 3, 2021

91% Match
Xiaotian Lin, Nankai Lin, Kanoksak Wattanachote, ... , Wang Lianxi
Computation and Language

As the fourth largest language family in the world, the Dravidian languages have become a research hotspot in natural language processing (NLP). Although the Dravidian languages contain a large number of languages, there are relatively few public available resources. Besides, text classification task, as a basic task of natural language processing, how to combine it to multiple languages in the Dravidian languages, is still a major difficulty in Dravidian Natural Language Pro...

Find SimilarView on arXiv

Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings

October 23, 2022

91% Match
Iker García-Ferrero, Rodrigo Agerri, German Rigau
Computation and Language

Zero-resource cross-lingual transfer approaches aim to apply supervised models from a source language to unlabelled target languages. In this paper we perform an in-depth study of the two main techniques employed so far for cross-lingual zero-resource sequence labelling, based either on data or model transfer. Although previous research has proposed translation and annotation projection (data-based cross-lingual transfer) as an effective technique for cross-lingual sequence l...

Find SimilarView on arXiv