A Primer in BERTology: What we know abou...

The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures

March 23, 2021

93% Match

Sushant Singh, Ausif Mahmood

Computation and Language

Machine Learning

In recent years, Natural Language Processing (NLP) models have achieved phenomenal success in linguistic and semantic tasks like text classification, machine translation, cognitive dialogue systems, information retrieval via Natural Language Understanding (NLU), and Natural Language Generation (NLG). This feat is primarily attributed due to the seminal Transformer architecture, leading to designs such as BERT, GPT (I, II, III), etc. Although these large-size models have achie...

Find SimilarView on arXiv

HuggingFace's Transformers: State-of-the-art Natural Language Processing

October 9, 2019

93% Match

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Platen Patrick von, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, ... , Rush Alexander M.

Computation and Language

Recent progress in natural language processing has been driven by advances in both model architecture and model pretraining. Transformer architectures have facilitated building higher-capacity models and pretraining has made it possible to effectively utilize this capacity for a wide variety of tasks. \textit{Transformers} is an open-source library with the goal of opening up these advances to the wider machine learning community. The library consists of carefully engineered ...

Find SimilarView on arXiv

A Primer on the Inner Workings of Transformer-based Language Models

April 30, 2024

93% Match

Javier Ferrando, Gabriele Sarti, ... , Costa-jussà Marta R.

Computation and Language

The rapid progress of research aimed at interpreting the inner workings of advanced language models has highlighted a need for contextualizing the insights gained from years of work in this area. This primer provides a concise technical introduction to the current techniques used to interpret the inner workings of Transformer-based language models, focusing on the generative decoder-only architecture. We conclude by presenting a comprehensive overview of the known internal me...

Find SimilarView on arXiv

Revealing the Dark Secrets of BERT

August 21, 2019

93% Match

Olga Kovaleva, Alexey Romanov, ... , Rumshisky Anna

Computation and Language

Machine Learning

BERT-based architectures currently give state-of-the-art performance on many NLP tasks, but little is known about the exact mechanisms that contribute to its success. In the current work, we focus on the interpretation of self-attention, which is one of the fundamental underlying components of BERT. Using a subset of GLUE tasks and a set of handcrafted features-of-interest, we propose the methodology and carry out a qualitative and quantitative analysis of the information enc...

Find SimilarView on arXiv

Emergent Properties of Finetuned Language Representation Models

October 24, 2019

93% Match

Alexandre Matton, Oliveira Luke de

Computation and Language

Machine Learning

Large, self-supervised transformer-based language representation models have recently received significant amounts of attention, and have produced state-of-the-art results across a variety of tasks simply by scaling up pre-training on larger and larger corpora. Such models usually produce high dimensional vectors, on top of which additional task-specific layers and architectural modifications are added to adapt them to specific downstream tasks. Though there exists ample evid...

Find SimilarView on arXiv

BERT's output layer recognizes all hidden layers? Some Intriguing Phenomena and a simple way to boost BERT

January 25, 2020

93% Match

Wei-Tsung Kao, Tsung-Han Wu, Po-Han Chi, ... , Lee Hung-Yi

Computation and Language

Machine Learning

Although Bidirectional Encoder Representations from Transformers (BERT) have achieved tremendous success in many natural language processing (NLP) tasks, it remains a black box. A variety of previous works have tried to lift the veil of BERT and understand each layer's functionality. In this paper, we found that surprisingly the output layer of BERT can reconstruct the input sentence by directly taking each layer of BERT as input, even though the output layer has never seen t...

Find SimilarView on arXiv

Not all layers are equally as important: Every Layer Counts BERT

November 3, 2023

93% Match

Lucas Georges Gabriel Charpentier, David Samuel

Computation and Language

This paper introduces a novel modification of the transformer architecture, tailored for the data-efficient pretraining of language models. This aspect is evaluated by participating in the BabyLM challenge, where our solution won both the strict and strict-small tracks. Our approach allows each transformer layer to select which outputs of previous layers to process. The empirical results verify the potential of this simple modification and show that not all layers are equally...

Find SimilarView on arXiv

A Survey of Techniques for Optimizing Transformer Inference

July 16, 2023

93% Match

Krishna Teja Chitty-Venkata, Sparsh Mittal, Murali Emani, ... , Somani Arun K.

Machine Learning

Hardware Architecture

Computation and Language

Computer Vision and Pattern ...

Recent years have seen a phenomenal rise in performance and applications of transformer neural networks. The family of transformer networks, including Bidirectional Encoder Representations from Transformer (BERT), Generative Pretrained Transformer (GPT) and Vision Transformer (ViT), have shown their effectiveness across Natural Language Processing (NLP) and Computer Vision (CV) domains. Transformer-based networks such as ChatGPT have impacted the lives of common men. However,...

Find SimilarView on arXiv

Machine Learning Meets Natural Language Processing -- The story so far

March 27, 2021

93% Match

N. -I. Galanis, P. Vafiadis, ... , Papakostas G. A.

Computation and Language

Artificial Intelligence

Machine Learning

Natural Language Processing (NLP) has evolved significantly over the last decade. This paper highlights the most important milestones of this period while trying to pinpoint the contribution of each individual model and algorithm to the overall progress. Furthermore, it focuses on issues still remaining to be solved, emphasizing the groundbreaking proposals of Transformers, BERT, and all the similar attention-based models.

Find SimilarView on arXiv

Lessons Learned from Applying off-the-shelf BERT: There is no Silver Bullet

September 15, 2020

93% Match

Victor Makarenkov, Lior Rokach

Computation and Language

Artificial Intelligence

Machine Learning

One of the challenges in the NLP field is training large classification models, a task that is both difficult and tedious. It is even harder when GPU hardware is unavailable. The increased availability of pre-trained and off-the-shelf word embeddings, models, and modules aim at easing the process of training large models and achieving a competitive performance. We explore the use of off-the-shelf BERT models and share the results of our experiments and compare their results t...

Find SimilarView on arXiv

A Primer in BERTology: What we know about how BERT works

The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures

HuggingFace's Transformers: State-of-the-art Natural Language Processing

A Primer on the Inner Workings of Transformer-based Language Models

Revealing the Dark Secrets of BERT

Emergent Properties of Finetuned Language Representation Models

BERT's output layer recognizes all hidden layers? Some Intriguing Phenomena and a simple way to boost BERT

Not all layers are equally as important: Every Layer Counts BERT

A Survey of Techniques for Optimizing Transformer Inference

Machine Learning Meets Natural Language Processing -- The story so far

Lessons Learned from Applying off-the-shelf BERT: There is no Silver Bullet