Universal Cross-Lingual Text Classificat...

MetaXL: Meta Representation Transformation for Low-resource Cross-lingual Learning

April 16, 2021

90% Match

Mengzhou Xia, Guoqing Zheng, Subhabrata Mukherjee, Milad Shokouhi, ... , Awadallah Ahmed Hassan

Computation and Language

Machine Learning

The combination of multilingual pre-trained representations and cross-lingual transfer learning is one of the most effective methods for building functional NLP systems for low-resource languages. However, for extremely low-resource languages without large-scale monolingual corpora for pre-training or sufficient annotated data for fine-tuning, transfer learning remains an under-studied and challenging task. Moreover, recent work shows that multilingual representations are sur...

Find SimilarView on arXiv

Cross-lingual Transfer of Sentiment Classifiers

May 15, 2020

90% Match

Marko Robnik-Sikonja, Kristjan Reba, Igor Mozetic

Computation and Language

Machine Learning

Word embeddings represent words in a numeric space so that semantic relations between words are represented as distances and directions in the vector space. Cross-lingual word embeddings transform vector spaces of different languages so that similar words are aligned. This is done by constructing a mapping between vector spaces of two languages or learning a joint vector space for multiple languages. Cross-lingual embeddings can be used to transfer machine learning models bet...

Find SimilarView on arXiv

Adaptive Cross-lingual Text Classification through In-Context One-Shot Demonstrations

April 3, 2024

90% Match

Emilio Villa-Cueva, A. Pastor López-Monroy, ... , Solorio Thamar

Computation and Language

Zero-Shot Cross-lingual Transfer (ZS-XLT) utilizes a model trained in a source language to make predictions in another language, often with a performance loss. To alleviate this, additional improvements can be achieved through subsequent adaptation using examples in the target language. In this paper, we exploit In-Context Tuning (ICT) for One-Shot Cross-lingual transfer in the classification task by introducing In-Context Cross-lingual Transfer (IC-XLT). The novel concept in...

Find SimilarView on arXiv

Transfer Learning for Multi-lingual Tasks -- a Survey

August 28, 2021

90% Match

Amir Reza Jafari, Behnam Heidary, Reza Farahbakhsh, ... , Jalili Mahdi

Computation and Language

These days different platforms such as social media provide their clients from different backgrounds and languages the possibility to connect and exchange information. It is not surprising anymore to see comments from different languages in posts published by international celebrities or data providers. In this era, understanding cross languages content and multilingualism in natural language processing (NLP) are hot topics, and multiple efforts have tried to leverage existin...

Find SimilarView on arXiv

Exploring Multilingual Text Data Distillation

August 9, 2023

90% Match

Shivam Sahni, Harsh Patel

Computation and Language

Artificial Intelligence

With the rise of deep learning, large datasets and complex models have become common, requiring significant computing power. To address this, data distillation has emerged as a technique to quickly train models with lower memory and time requirements. However, data distillation on text-based datasets hasn't been explored much because of the challenges rising due to its discrete nature. Additionally, existing dataset distillation methods often struggle to generalize to new arc...

Find SimilarView on arXiv

Cross-Lingual Relevance Transfer for Document Retrieval

November 8, 2019

90% Match

Peng Shi, Jimmy Lin

Information Retrieval

Computation and Language

Recent work has shown the surprising ability of multi-lingual BERT to serve as a zero-shot cross-lingual transfer model for a number of language processing tasks. We combine this finding with a similarly-recently proposal on sentence-level relevance modeling for document retrieval to demonstrate the ability of multi-lingual BERT to transfer models of relevance across languages. Experiments on test collections in five different languages from diverse language families (Chinese...

Find SimilarView on arXiv

Explicit Alignment Objectives for Multilingual Bidirectional Encoders

October 15, 2020

90% Match

Junjie Hu, Melvin Johnson, Orhan Firat, ... , Neubig Graham

Computation and Language

Artificial Intelligence

Pre-trained cross-lingual encoders such as mBERT (Devlin et al., 2019) and XLMR (Conneau et al., 2020) have proven to be impressively effective at enabling transfer-learning of NLP systems from high-resource languages to low-resource languages. This success comes despite the fact that there is no explicit objective to align the contextual embeddings of words/sentences with similar meanings across languages together in the same space. In this paper, we present a new method for...

Find SimilarView on arXiv

DocBERT: BERT for Document Classification

April 17, 2019

90% Match

Ashutosh Adhikari, Achyudh Ram, ... , Lin Jimmy

Computation and Language

We present, to our knowledge, the first application of BERT to document classification. A few characteristics of the task might lead one to think that BERT is not the most appropriate model: syntactic structures matter less for content categories, documents can often be longer than typical BERT input, and documents often have multiple labels. Nevertheless, we show that a straightforward classification model using BERT is able to achieve the state of the art across four popula...

Find SimilarView on arXiv

Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages

December 19, 2022

90% Match

Ercong Nie, Sheng Liang, ... , Schütze Hinrich

Computation and Language

Multilingual Pretrained Language Models (MPLMs) have shown their strong multilinguality in recent empirical cross-lingual transfer studies. In this paper, we propose the Prompts Augmented by Retrieval Crosslingually (PARC) pipeline to improve the zero-shot performance on low-resource languages (LRLs) by augmenting the context with semantically similar sentences retrieved from a high-resource language (HRL) as prompts. PARC improves the zero-shot performance on three downstrea...

Find SimilarView on arXiv

Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data

March 27, 2021

90% Match

Akshat Gupta, Sargam Menghani, ... , Black Alan W

Computation and Language

Machine Learning

Sentiment analysis is an important task in understanding social media content like customer reviews, Twitter and Facebook feeds etc. In multilingual communities around the world, a large amount of social media text is characterized by the presence of Code-Switching. Thus, it has become important to build models that can handle code-switched data. However, annotated code-switched data is scarce and there is a need for unsupervised models and algorithms. We propose a general fr...

Find SimilarView on arXiv

Universal Cross-Lingual Text Classification

MetaXL: Meta Representation Transformation for Low-resource Cross-lingual Learning

Cross-lingual Transfer of Sentiment Classifiers

Adaptive Cross-lingual Text Classification through In-Context One-Shot Demonstrations

Transfer Learning for Multi-lingual Tasks -- a Survey

Exploring Multilingual Text Data Distillation

Cross-Lingual Relevance Transfer for Document Retrieval

Explicit Alignment Objectives for Multilingual Bidirectional Encoders

DocBERT: BERT for Document Classification

Cross-Lingual Retrieval Augmented Prompt for Low-Resource Languages

Unsupervised Self-Training for Sentiment Analysis of Code-Switched Data