Detection of Harassment Toward Women in Twitter During Pandemic Based on Machine Learning

Harassment is an offensive behavior, intimidating and could cause discomfort to the victims. In some cases, the harassments could lead to a traumatic experience to the vulnerable victims. Currently, the harassments towards women in social media have become more daring and are rising. The increasing...

Full description

Bibliographic Details
Published in:International Journal of Advanced Computer Science and Applications
Main Author: Mustapha W.N.A.W.; Sabri N.M.; Bakar N.A.A.A.; Daud N.M.N.; Azizan A.
Format: Article
Language:English
Published: Science and Information Organization 2024
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85189937398&doi=10.14569%2fIJACSA.2024.01503103&partnerID=40&md5=bf042f8a4354c662d354d3f9294d4d6b
Description
Summary:Harassment is an offensive behavior, intimidating and could cause discomfort to the victims. In some cases, the harassments could lead to a traumatic experience to the vulnerable victims. Currently, the harassments towards women in social media have become more daring and are rising. The increasing number of the social media users since the Covid-19 pandemic in 2020 might be one of the factor. Due to the problem, this research aims to assist in detecting the harassment sentiments toward women in Twitter. The sentiment analysis is based on a machine learning approach and Support Vector Machine (SVM) has been chosen due its acceptable performance in sentiment classification. The objective of the research is to explore the capability of SVM in the detection of harassments toward women in Twitter. The research methodology covers the data collection using Tweepy, data preprocessing, data labelling using TextBlob, feature extraction using TF-IDF vectorizer and dataset splitting using the Hold-Out method. The algorithm was evaluated using the Confusion Matrix and the ROC analysis. The algorithm was integrated with the Graphical User Interface (GUI) using Streamlit for ease of use. The implementation of the SVM algorithm in detecting the harassments toward women was successful and reliable as it achieved good performance, with 81% accuracy. The recommendations for the SVM model improvement is to train the dataset of other languages and to collect the Twitter data regularly. The performance of SVM would also be compared with other machine learning algorithms for further validations. © (2024), (Science and Information Organization). All Rights Reserved.
ISSN:2158107X
DOI:10.14569/IJACSA.2024.01503103