MyDAS Corpus: Malay Social Media Texts for Detecting Depression, Anxiety, and Stress on Facebook

The application of Natural Language Processing (NLP) in mental health monitoring has significantly expanded; however, the specific challenges of interpreting Depression, Anxiety, and Stress (DAS) in Malay language social media texts have not been adequately addressed. This gap underscores the need f...

Full description

Bibliographic Details
Published in:2024 5th International Conference on Artificial Intelligence and Data Sciences, AiDAS 2024 - Proceedings
Main Author: Ahmad Z.; Mohamed A.; Conway M.; Zakaria R.; Teo N.H.I.; Maskat R.
Format: Conference paper
Language:English
Published: Institute of Electrical and Electronics Engineers Inc. 2024
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85209658607&doi=10.1109%2fAiDAS63860.2024.10730385&partnerID=40&md5=abcccc169bf9db99d4e2264e40ef4e41
Description
Summary:The application of Natural Language Processing (NLP) in mental health monitoring has significantly expanded; however, the specific challenges of interpreting Depression, Anxiety, and Stress (DAS) in Malay language social media texts have not been adequately addressed. This gap underscores the need for NLP solutions that are sensitised to the linguistic and cultural specificities of Malay-speaking populations. This study develops and validates a specialised Malay language corpus from social media content, targeting DAS. Utilising a hybrid ground truth strategy that integrates self-reports with expert assessments, the research offers methodological refinements in the analysis of Malay linguistic patterns and the deployment of machine learning classifiers to efficiently identify mental health indicators. The paper reviews existing methodologies, outlines a novel corpus development strategy, and discusses classifier performance. The Decision Tree classifier achieved the highest F1 score of 0.75, followed by the Support Vector Machine (SVM) with an F1 score of 0.73, and Random Forest with 0.70. Multinomial Naive Bayes (MNB) and K-Nearest Neighbors (KNN) demonstrated lower performances with F1 scores of 0.55 and 0.52 respectively. Comprehensive analyses using bi-gram networks and t-SNE visualisations explore the nuanced linguistic indicators of mental health states, culminating in a discussion of the implications for future NLP applications in mental health monitoring. © 2024 IEEE.
ISSN:
DOI:10.1109/AiDAS63860.2024.10730385