Performance evaluation of Reddit Comments using Machine Learning and Natural Language Processing methods in Sentiment Analysis

May 27, 2024

Xiaoxia Zhang, Xiuyuan Qi, Zixin Teng

Computer Science

Computation and Language

Sentiment analysis, an increasingly vital field in both academia and industry, plays a pivotal role in machine learning applications, particularly on social media platforms like Reddit. However, the efficacy of sentiment analysis models is hindered by the lack of expansive and fine-grained emotion datasets. To address this gap, our study leverages the GoEmotions dataset, comprising a diverse range of emotions, to evaluate sentiment analysis methods across a substantial corpus of 58,000 comments. Distinguished from prior studies by the Google team, which limited their analysis to only two models, our research expands the scope by evaluating a diverse array of models. We investigate the performance of traditional classifiers such as Naive Bayes and Support Vector Machines (SVM), as well as state-of-the-art transformer-based models including BERT, RoBERTa, and GPT. Furthermore, our evaluation criteria extend beyond accuracy to encompass nuanced assessments, including hierarchical classification based on varying levels of granularity in emotion categorization. Additionally, considerations such as computational efficiency are incorporated to provide a comprehensive evaluation framework. Our findings reveal that the RoBERTa model consistently outperforms the baseline models, demonstrating superior accuracy in fine-grained sentiment classification tasks. This underscores the substantial potential and significance of the RoBERTa model in advancing sentiment analysis capabilities.

GoEmotions: A Dataset of Fine-Grained Emotions

May 1, 2020

94% Match

Dorottya Demszky, Dana Movshovitz-Attias, Jeongwoo Ko, Alan Cowen, ... , Ravi Sujith

Computation and Language

Understanding emotion expressed in language has a wide range of applications, from building empathetic chatbots to detecting harmful online behavior. Advancement in this area can be improved using large-scale datasets with a fine-grained typology, adaptable to multiple downstream tasks. We introduce GoEmotions, the largest manually annotated dataset of 58k English Reddit comments, labeled for 27 emotion categories or Neutral. We demonstrate the high quality of the annotations...

Find SimilarView on arXiv

Emotion Detection in Reddit: Comparative Study of Machine Learning and Deep Learning Techniques

November 15, 2024

93% Match

Maliheh Alaeddini

Computation and Language

Emotion detection is pivotal in human communication, as it significantly influences behavior, relationships, and decision-making processes. This study concentrates on text-based emotion detection by leveraging the GoEmotions dataset, which annotates Reddit comments with 27 distinct emotions. These emotions are subsequently mapped to Ekman's six basic categories: joy, anger, fear, sadness, disgust, and surprise. We employed a range of models for this task, including six machin...

Find SimilarView on arXiv

Automatically Classifying Emotions based on Text: A Comparative Exploration of Different Datasets

February 28, 2023

92% Match

Anna Koufakou, Jairo Garciga, Adam Paul, ... , Frank Christopher

Computation and Language

Emotion Classification based on text is a task with many applications which has received growing interest in recent years. This paper presents a preliminary study with the goal to help researchers and practitioners gain insight into relatively new datasets as well as emotion classification in general. We focus on three datasets that were recently presented in the related literature, and we explore the performance of traditional as well as state-of-the-art deep learning models...

Find SimilarView on arXiv

Sentiment analysis of texts from social networks based on machine learning methods for monitoring public sentiment

February 24, 2025

91% Match

Arsen Tolebay Nurlanuly

Computation and Language

A sentiment analysis system powered by machine learning was created in this study to improve real-time social network public opinion monitoring. For sophisticated sentiment identification, the suggested approach combines cutting-edge transformer-based architectures (DistilBERT, RoBERTa) with traditional machine learning models (Logistic Regression, SVM, Naive Bayes). The system achieved an accuracy of up to 80-85% using transformer models in real-world scenarios after being t...

Find SimilarView on arXiv

Language Representation Models for Fine-Grained Sentiment Classification

May 27, 2020

90% Match

Brian Cheang, Bailey Wei, David Kogan, ... , Ahmed Masud

Computation and Language

Sentiment classification is a quickly advancing field of study with applications in almost any field. While various models and datasets have shown high accuracy inthe task of binary classification, the task of fine-grained sentiment classification is still an area with room for significant improvement. Analyzing the SST-5 dataset,previous work by Munikar et al. (2019) showed that the embedding tool BERT allowed a simple model to achieve state-of-the-art accuracy. Since that p...

Find SimilarView on arXiv

Large Language Models on Fine-grained Emotion Detection Dataset with Data Augmentation and Transfer Learning

March 10, 2024

90% Match

Kaipeng Wang, Zhi Jing, ... , Han Yikun

Computation and Language

Artificial Intelligence

This paper delves into enhancing the classification performance on the GoEmotions dataset, a large, manually annotated dataset for emotion detection in text. The primary goal of this paper is to address the challenges of detecting subtle emotions in text, a complex issue in Natural Language Processing (NLP) with significant practical applications. The findings offer valuable insights into addressing the challenges of emotion detection in text and suggest directions for future...

Find SimilarView on arXiv

Emotion Classification In Software Engineering Texts: A Comparative Analysis of Pre-trained Transformers Language Models

January 19, 2024

90% Match

Mia Mohammad Imran

Software Engineering

Emotion recognition in software engineering texts is critical for understanding developer expressions and improving collaboration. This paper presents a comparative analysis of state-of-the-art Pre-trained Language Models (PTMs) for fine-grained emotion classification on two benchmark datasets from GitHub and Stack Overflow. We evaluate six transformer models - BERT, RoBERTa, ALBERT, DeBERTa, CodeBERT and GraphCodeBERT against the current best-performing tool SEntiMoji. Our a...

Find SimilarView on arXiv

Emotion Classification in Short English Texts using Deep Learning Techniques

February 25, 2024

90% Match

Siddhanth Bhat

Computation and Language

Artificial Intelligence

Detecting emotions in limited text datasets from under-resourced languages presents a formidable obstacle, demanding specialized frameworks and computational strategies. This study conducts a thorough examination of deep learning techniques for discerning emotions in short English texts. Deep learning approaches employ transfer learning and word embedding, notably BERT, to attain superior accuracy. To evaluate these methods, we introduce the "SmallEnglishEmotions" dataset, co...

Find SimilarView on arXiv

Research on the Application of Deep Learning-based BERT Model in Sentiment Analysis

March 13, 2024

90% Match

Yichao Wu, Zhengyu Jin, Chenxi Shi, ... , Zhan Tong

Computation and Language

Machine Learning

This paper explores the application of deep learning techniques, particularly focusing on BERT models, in sentiment analysis. It begins by introducing the fundamental concept of sentiment analysis and how deep learning methods are utilized in this domain. Subsequently, it delves into the architecture and characteristics of BERT models. Through detailed explanation, it elucidates the application effects and optimization strategies of BERT models in sentiment analysis, supporte...

Find SimilarView on arXiv

SentimentGPT: Exploiting GPT for Advanced Sentiment Analysis and its Departure from Current Machine Learning

July 16, 2023

90% Match

Kiana Kheiri, Hamid Karimi

Computation and Language

Artificial Intelligence

Machine Learning

Social and Information Netwo...

This study presents a thorough examination of various Generative Pretrained Transformer (GPT) methodologies in sentiment analysis, specifically in the context of Task 4 on the SemEval 2017 dataset. Three primary strategies are employed: 1) prompt engineering using the advanced GPT-3.5 Turbo, 2) fine-tuning GPT models, and 3) an inventive approach to embedding classification. The research yields detailed comparative insights among these strategies and individual GPT models, re...

Find SimilarView on arXiv