Zero-Shot Cross-Lingual Transfer in Lega...

Adopting the Multi-answer Questioning Task with an Auxiliary Metric for Extreme Multi-label Text Classification Utilizing the Label Hierarchy

March 2, 2023

90% Match

Li Wang, Ying Wah Teh, Mohammed Ali Al-Garadi

Computation and Language

Extreme multi-label text classification utilizes the label hierarchy to partition extreme labels into multiple label groups, turning the task into simple multi-group multi-label classification tasks. Current research encodes labels as a vector with fixed length which needs establish multiple classifiers for different label groups. The problem is how to build only one classifier without sacrificing the label relationship in the hierarchy. This paper adopts the multi-answer que...

Find SimilarView on arXiv

Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

September 1, 2019

90% Match

Aditya Siddhant, Melvin Johnson, Henry Tsai, Naveen Arivazhagan, Jason Riesa, Ankur Bapna, ... , Raman Karthik

Computation and Language

The recently proposed massively multilingual neural machine translation (NMT) system has been shown to be capable of translating over 100 languages to and from English within a single model. Its improved translation performance on low resource languages hints at potential cross-lingual transfer capability for downstream tasks. In this paper, we evaluate the cross-lingual effectiveness of representations from the encoder of a massively multilingual NMT model on 5 downstream cl...

Find SimilarView on arXiv

LegalTurk Optimized BERT for Multi-Label Text Classification and NER

June 30, 2024

90% Match

Farnaz Zeidi, Mehmet Fatih Amasyali, Çiğdem Erol

Computation and Language

The introduction of the Transformer neural network, along with techniques like self-supervised pre-training and transfer learning, has paved the way for advanced models like BERT. Despite BERT's impressive performance, opportunities for further enhancement exist. To our knowledge, most efforts are focusing on improving BERT's performance in English and in general domains, with no study specifically addressing the legal Turkish domain. Our study is primarily dedicated to enhan...

Find SimilarView on arXiv

Multi-granular Legal Topic Classification on Greek Legislation

September 30, 2021

90% Match

Christos Papaloukas, Ilias Chalkidis, Konstantinos Athinaios, ... , Koubarakis Manolis

Computation and Language

In this work, we study the task of classifying legal texts written in the Greek language. We introduce and make publicly available a novel dataset based on Greek legislation, consisting of more than 47 thousand official, categorized Greek legislation resources. We experiment with this dataset and evaluate a battery of advanced methods and classifiers, ranging from traditional machine learning and RNN-based methods to state-of-the-art Transformer-based methods. We show that re...

Find SimilarView on arXiv

LegaLMFiT: Efficient Short Legal Text Classification with LSTM Language Model Pre-Training

September 2, 2021

90% Match

Benjamin Clavié, Akshita Gheewala, Paul Briton, Marc Alphonsus, ... , Piccoli Francesco

Computation and Language

Large Transformer-based language models such as BERT have led to broad performance improvements on many NLP tasks. Domain-specific variants of these models have demonstrated excellent performance on a variety of specialised tasks. In legal NLP, BERT-based models have led to new state-of-the-art results on multiple tasks. The exploration of these models has demonstrated the importance of capturing the specificity of the legal language and its vocabulary. However, such approach...

Find SimilarView on arXiv

Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings

October 23, 2022

90% Match

Iker García-Ferrero, Rodrigo Agerri, German Rigau

Computation and Language

Zero-resource cross-lingual transfer approaches aim to apply supervised models from a source language to unlabelled target languages. In this paper we perform an in-depth study of the two main techniques employed so far for cross-lingual zero-resource sequence labelling, based either on data or model transfer. Although previous research has proposed translation and annotation projection (data-based cross-lingual transfer) as an effective technique for cross-lingual sequence l...

Find SimilarView on arXiv

A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification

October 15, 2021

90% Match

Sosuke Nishikawa, Ikuya Yamada, ... , Echizen Isao

Computation and Language

We present a multilingual bag-of-entities model that effectively boosts the performance of zero-shot cross-lingual text classification by extending a multilingual pre-trained language model (e.g., M-BERT). It leverages the multilingual nature of Wikidata: entities in multiple languages representing the same concept are defined with a unique identifier. This enables entities described in multiple languages to be represented using shared embeddings. A model trained on entity fe...

Find SimilarView on arXiv

Universal Cross-Lingual Text Classification

June 16, 2024

89% Match

Riya Savant, Anushka Shelke, Sakshi Todmal, Sanskruti Kanphade, ... , Joshi Raviraj

Computation and Language

Machine Learning

Text classification, an integral task in natural language processing, involves the automatic categorization of text into predefined classes. Creating supervised labeled datasets for low-resource languages poses a considerable challenge. Unlocking the language potential of low-resource languages requires robust datasets with supervised labels. However, such datasets are scarce, and the label space is often limited. In our pursuit to address this gap, we aim to optimize existin...

Find SimilarView on arXiv

Multi-Task Deep Learning for Legal Document Translation, Summarization and Multi-Label Classification

October 16, 2018

89% Match

Ahmed Elnaggar, Christoph Gebendorfer, ... , Matthes Florian

Computation and Language

Information Retrieval

Machine Learning

The digitalization of the legal domain has been ongoing for a couple of years. In that process, the application of different machine learning (ML) techniques is crucial. Tasks such as the classification of legal documents or contract clauses as well as the translation of those are highly relevant. On the other side, digitized documents are barely accessible in this field, particularly in Germany. Today, deep learning (DL) is one of the hot topics with many publications and va...

Find SimilarView on arXiv

Zero-shot Cross-lingual Stance Detection via Adversarial Language Adaptation

April 22, 2024

89% Match

Bharathi A, Arkaitz Zubiaga

Computation and Language

Stance detection has been widely studied as the task of determining if a social media post is positive, negative or neutral towards a specific issue, such as support towards vaccines. Research in stance detection has however often been limited to a single language and, where more than one language has been studied, research has focused on few-shot settings, overlooking the challenges of developing a zero-shot cross-lingual stance detection model. This paper makes the first su...

Find SimilarView on arXiv

Zero-Shot Cross-Lingual Transfer in Legal Domain Using Transformer Models

Adopting the Multi-answer Questioning Task with an Auxiliary Metric for Extreme Multi-label Text Classification Utilizing the Label Hierarchy

Evaluating the Cross-Lingual Effectiveness of Massively Multilingual Neural Machine Translation

LegalTurk Optimized BERT for Multi-Label Text Classification and NER

Multi-granular Legal Topic Classification on Greek Legislation

LegaLMFiT: Efficient Short Legal Text Classification with LSTM Language Model Pre-Training

Model and Data Transfer for Cross-Lingual Sequence Labelling in Zero-Resource Settings

A Multilingual Bag-of-Entities Model for Zero-Shot Cross-Lingual Text Classification

Universal Cross-Lingual Text Classification

Multi-Task Deep Learning for Legal Document Translation, Summarization and Multi-Label Classification

Zero-shot Cross-lingual Stance Detection via Adversarial Language Adaptation