Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset

Hate speech detection is crucial in curbing online toxicity and fostering a safer digital environment. Previous research has proposed the use of a hybrid CNN-RNN model for this purpose. This study aims to improve the performance of the hybrid CNN-RNN method by using a double-layer approach to addres...

Full description

Bibliographic Details
Published in:IEEE Access
Main Author: Riyadi S.; Divayu Andriyani A.; Noraini Sulaiman S.
Format: Article
Language:English
Published: Institute of Electrical and Electronics Engineers Inc. 2024
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85209134904&doi=10.1109%2fACCESS.2024.3487433&partnerID=40&md5=000fd7c0ed4f6df694652599b4e2dd55
id 2-s2.0-85209134904
spelling 2-s2.0-85209134904
Riyadi S.; Divayu Andriyani A.; Noraini Sulaiman S.
Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset
2024
IEEE Access
12

10.1109/ACCESS.2024.3487433
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85209134904&doi=10.1109%2fACCESS.2024.3487433&partnerID=40&md5=000fd7c0ed4f6df694652599b4e2dd55
Hate speech detection is crucial in curbing online toxicity and fostering a safer digital environment. Previous research has proposed the use of a hybrid CNN-RNN model for this purpose. This study aims to improve the performance of the hybrid CNN-RNN method by using a double-layer approach to address imbalanced datasets. The novelty lies in using double layers of hybrid CNN-RNN to enhance hate speech detection accuracy. This research also employed an oversampling technique alongside the double-layer model. The process included preprocessing, feature extraction, training tuning, testing, and performance evaluation. The results demonstrated that the double-layer hybrid CNN-RNN model achieved an accuracy of 0.827, a precision of 0.797, a recall of 0.759, and an F1 score of 0.883, with imbalanced data. Meanwhile, balanced data yielded a higher accuracy of 0.908, a precision of 0.943, a recall of 0.894, and an F1 score of 0.914. Moreover, the proposed model outperformed the hybrid CNN-RNN with an imbalanced dataset, generating an accuracy of 0.752, a precision of 0.797, a recall of 0.559, and an F1 score of 0.657. Dropout and early stopping techniques addressed overfitting in complex models and large datasets. This research has advanced hate speech detection methodologies by demonstrating the effectiveness of a double-layer hybrid CNN-RNN model, especially for imbalanced data. It underscores the importance of addressing imbalanced datasets for improved model accuracy. Future work could explore alternative data augmentation techniques or compare the proposed model with other architectures. © 2013 IEEE.
Institute of Electrical and Electronics Engineers Inc.
21693536
English
Article

author Riyadi S.; Divayu Andriyani A.; Noraini Sulaiman S.
spellingShingle Riyadi S.; Divayu Andriyani A.; Noraini Sulaiman S.
Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset
author_facet Riyadi S.; Divayu Andriyani A.; Noraini Sulaiman S.
author_sort Riyadi S.; Divayu Andriyani A.; Noraini Sulaiman S.
title Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset
title_short Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset
title_full Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset
title_fullStr Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset
title_full_unstemmed Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset
title_sort Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset
publishDate 2024
container_title IEEE Access
container_volume 12
container_issue
doi_str_mv 10.1109/ACCESS.2024.3487433
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85209134904&doi=10.1109%2fACCESS.2024.3487433&partnerID=40&md5=000fd7c0ed4f6df694652599b4e2dd55
description Hate speech detection is crucial in curbing online toxicity and fostering a safer digital environment. Previous research has proposed the use of a hybrid CNN-RNN model for this purpose. This study aims to improve the performance of the hybrid CNN-RNN method by using a double-layer approach to address imbalanced datasets. The novelty lies in using double layers of hybrid CNN-RNN to enhance hate speech detection accuracy. This research also employed an oversampling technique alongside the double-layer model. The process included preprocessing, feature extraction, training tuning, testing, and performance evaluation. The results demonstrated that the double-layer hybrid CNN-RNN model achieved an accuracy of 0.827, a precision of 0.797, a recall of 0.759, and an F1 score of 0.883, with imbalanced data. Meanwhile, balanced data yielded a higher accuracy of 0.908, a precision of 0.943, a recall of 0.894, and an F1 score of 0.914. Moreover, the proposed model outperformed the hybrid CNN-RNN with an imbalanced dataset, generating an accuracy of 0.752, a precision of 0.797, a recall of 0.559, and an F1 score of 0.657. Dropout and early stopping techniques addressed overfitting in complex models and large datasets. This research has advanced hate speech detection methodologies by demonstrating the effectiveness of a double-layer hybrid CNN-RNN model, especially for imbalanced data. It underscores the importance of addressing imbalanced datasets for improved model accuracy. Future work could explore alternative data augmentation techniques or compare the proposed model with other architectures. © 2013 IEEE.
publisher Institute of Electrical and Electronics Engineers Inc.
issn 21693536
language English
format Article
accesstype
record_format scopus
collection Scopus
_version_ 1820775439908470784