Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset
Hate speech detection is crucial in curbing online toxicity and fostering a safer digital environment. Previous research has proposed the use of a hybrid CNN-RNN model for this purpose. This study aims to improve the performance of the hybrid CNN-RNN method by using a double-layer approach to addres...
Published in: | IEEE ACCESS |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Published: |
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC
2024
|
Subjects: | |
Online Access: | https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001349751700001 |
author |
Riyadi Slamet; Andriyani Annisa Divayu; Sulaiman Siti Noraini |
---|---|
spellingShingle |
Riyadi Slamet; Andriyani Annisa Divayu; Sulaiman Siti Noraini Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset Computer Science; Engineering; Telecommunications |
author_facet |
Riyadi Slamet; Andriyani Annisa Divayu; Sulaiman Siti Noraini |
author_sort |
Riyadi |
spelling |
Riyadi, Slamet; Andriyani, Annisa Divayu; Sulaiman, Siti Noraini Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset IEEE ACCESS English Article Hate speech detection is crucial in curbing online toxicity and fostering a safer digital environment. Previous research has proposed the use of a hybrid CNN-RNN model for this purpose. This study aims to improve the performance of the hybrid CNN-RNN method by using a double-layer approach to address imbalanced datasets. The novelty lies in using double layers of hybrid CNN-RNN to enhance hate speech detection accuracy. This research also employed an oversampling technique alongside the double-layer model. The process included preprocessing, feature extraction, training tuning, testing, and performance evaluation. The results demonstrated that the double-layer hybrid CNN-RNN model achieved an accuracy of 0.827, a precision of 0.797, a recall of 0.759, and an F1 score of 0.883, with imbalanced data. Meanwhile, balanced data yielded a higher accuracy of 0.908, a precision of 0.943, a recall of 0.894, and an F1 score of 0.914. Moreover, the proposed model outperformed the hybrid CNN-RNN with an imbalanced dataset, generating an accuracy of 0.752, a precision of 0.797, a recall of 0.559, and an F1 score of 0.657. Dropout and early stopping techniques addressed overfitting in complex models and large datasets. This research has advanced hate speech detection methodologies by demonstrating the effectiveness of a double-layer hybrid CNN-RNN model, especially for imbalanced data. It underscores the importance of addressing imbalanced datasets for improved model accuracy. Future work could explore alternative data augmentation techniques or compare the proposed model with other architectures. IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC 2169-3536 2024 12 10.1109/ACCESS.2024.3487433 Computer Science; Engineering; Telecommunications gold WOS:001349751700001 https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001349751700001 |
title |
Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset |
title_short |
Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset |
title_full |
Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset |
title_fullStr |
Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset |
title_full_unstemmed |
Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset |
title_sort |
Improving Hate Speech Detection Using Double-Layers Hybrid CNN-RNN Model on Imbalanced Dataset |
container_title |
IEEE ACCESS |
language |
English |
format |
Article |
description |
Hate speech detection is crucial in curbing online toxicity and fostering a safer digital environment. Previous research has proposed the use of a hybrid CNN-RNN model for this purpose. This study aims to improve the performance of the hybrid CNN-RNN method by using a double-layer approach to address imbalanced datasets. The novelty lies in using double layers of hybrid CNN-RNN to enhance hate speech detection accuracy. This research also employed an oversampling technique alongside the double-layer model. The process included preprocessing, feature extraction, training tuning, testing, and performance evaluation. The results demonstrated that the double-layer hybrid CNN-RNN model achieved an accuracy of 0.827, a precision of 0.797, a recall of 0.759, and an F1 score of 0.883, with imbalanced data. Meanwhile, balanced data yielded a higher accuracy of 0.908, a precision of 0.943, a recall of 0.894, and an F1 score of 0.914. Moreover, the proposed model outperformed the hybrid CNN-RNN with an imbalanced dataset, generating an accuracy of 0.752, a precision of 0.797, a recall of 0.559, and an F1 score of 0.657. Dropout and early stopping techniques addressed overfitting in complex models and large datasets. This research has advanced hate speech detection methodologies by demonstrating the effectiveness of a double-layer hybrid CNN-RNN model, especially for imbalanced data. It underscores the importance of addressing imbalanced datasets for improved model accuracy. Future work could explore alternative data augmentation techniques or compare the proposed model with other architectures. |
publisher |
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC |
issn |
2169-3536 |
publishDate |
2024 |
container_volume |
12 |
container_issue |
|
doi_str_mv |
10.1109/ACCESS.2024.3487433 |
topic |
Computer Science; Engineering; Telecommunications |
topic_facet |
Computer Science; Engineering; Telecommunications |
accesstype |
gold |
id |
WOS:001349751700001 |
url |
https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001349751700001 |
record_format |
wos |
collection |
Web of Science (WoS) |
_version_ |
1818940499063472128 |