Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset

Hate speech detection is crucial in curbing online toxicity and fostering a safer digital environment. Previous research has proposed the use of a hybrid CNN-RNN model for this purpose. This study aims to improve the performance of the hybrid CNN-RNN method by using a double-layer approach to addres...

Full description

Bibliographic Details
Published in:IEEE ACCESS
Main Authors: Riyadi, Slamet; Andriyani, Annisa Divayu; Sulaiman, Siti Noraini
Format: Article
Language:English
Published: IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC 2024
Subjects:
Online Access:https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001349751700001
author Riyadi
Slamet; Andriyani
Annisa Divayu; Sulaiman
Siti Noraini
spellingShingle Riyadi
Slamet; Andriyani
Annisa Divayu; Sulaiman
Siti Noraini
Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset
Computer Science; Engineering; Telecommunications
author_facet Riyadi
Slamet; Andriyani
Annisa Divayu; Sulaiman
Siti Noraini
author_sort Riyadi
spelling Riyadi, Slamet; Andriyani, Annisa Divayu; Sulaiman, Siti Noraini
Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset
IEEE ACCESS
English
Article
Hate speech detection is crucial in curbing online toxicity and fostering a safer digital environment. Previous research has proposed the use of a hybrid CNN-RNN model for this purpose. This study aims to improve the performance of the hybrid CNN-RNN method by using a double-layer approach to address imbalanced datasets. The novelty lies in using double layers of hybrid CNN-RNN to enhance hate speech detection accuracy. This research also employed an oversampling technique alongside the double-layer model. The process included preprocessing, feature extraction, training tuning, testing, and performance evaluation. The results demonstrated that the double-layer hybrid CNN-RNN model achieved an accuracy of 0.827, a precision of 0.797, a recall of 0.759, and an F1 score of 0.883, with imbalanced data. Meanwhile, balanced data yielded a higher accuracy of 0.908, a precision of 0.943, a recall of 0.894, and an F1 score of 0.914. Moreover, the proposed model outperformed the hybrid CNN-RNN with an imbalanced dataset, generating an accuracy of 0.752, a precision of 0.797, a recall of 0.559, and an F1 score of 0.657. Dropout and early stopping techniques addressed overfitting in complex models and large datasets. This research has advanced hate speech detection methodologies by demonstrating the effectiveness of a double-layer hybrid CNN-RNN model, especially for imbalanced data. It underscores the importance of addressing imbalanced datasets for improved model accuracy. Future work could explore alternative data augmentation techniques or compare the proposed model with other architectures.
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
2169-3536

2024
12

10.1109/ACCESS.2024.3487433
Computer Science; Engineering; Telecommunications
gold
WOS:001349751700001
https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001349751700001
title Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset
title_short Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset
title_full Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset
title_fullStr Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset
title_full_unstemmed Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset
title_sort Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset
container_title IEEE ACCESS
language English
format Article
description Hate speech detection is crucial in curbing online toxicity and fostering a safer digital environment. Previous research has proposed the use of a hybrid CNN-RNN model for this purpose. This study aims to improve the performance of the hybrid CNN-RNN method by using a double-layer approach to address imbalanced datasets. The novelty lies in using double layers of hybrid CNN-RNN to enhance hate speech detection accuracy. This research also employed an oversampling technique alongside the double-layer model. The process included preprocessing, feature extraction, training tuning, testing, and performance evaluation. The results demonstrated that the double-layer hybrid CNN-RNN model achieved an accuracy of 0.827, a precision of 0.797, a recall of 0.759, and an F1 score of 0.883, with imbalanced data. Meanwhile, balanced data yielded a higher accuracy of 0.908, a precision of 0.943, a recall of 0.894, and an F1 score of 0.914. Moreover, the proposed model outperformed the hybrid CNN-RNN with an imbalanced dataset, generating an accuracy of 0.752, a precision of 0.797, a recall of 0.559, and an F1 score of 0.657. Dropout and early stopping techniques addressed overfitting in complex models and large datasets. This research has advanced hate speech detection methodologies by demonstrating the effectiveness of a double-layer hybrid CNN-RNN model, especially for imbalanced data. It underscores the importance of addressing imbalanced datasets for improved model accuracy. Future work could explore alternative data augmentation techniques or compare the proposed model with other architectures.
publisher IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
issn 2169-3536

publishDate 2024
container_volume 12
container_issue
doi_str_mv 10.1109/ACCESS.2024.3487433
topic Computer Science; Engineering; Telecommunications
topic_facet Computer Science; Engineering; Telecommunications
accesstype gold
id WOS:001349751700001
url https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001349751700001
record_format wos
collection Web of Science (WoS)
_version_ 1818940499063472128