Summary: | The application of Natural Language Processing (NLP) in mental health monitoring has significantly expanded; however, the specific challenges of interpreting Depression, Anxiety, and Stress (DAS) in Malay language social media texts have not been adequately addressed. This gap underscores the need for NLP solutions that are sensitised to the linguistic and cultural specificities of Malay-speaking populations. This study develops and validates a specialised Malay language corpus from social media content, targeting DAS. Utilising a hybrid ground truth strategy that integrates self-reports with expert assessments, the research offers methodological refinements in the analysis of Malay linguistic patterns and the deployment of machine learning classifiers to efficiently identify mental health indicators. The paper reviews existing methodologies, outlines a novel corpus development strategy, and discusses classifier performance. The Decision Tree classifier achieved the highest F1 score of 0.75, followed by the Support Vector Machine (SVM) with an F1 score of 0.73, and Random Forest with 0.70. Multinomial Naive Bayes (MNB) and K-Nearest Neighbors (KNN) demonstrated lower performances with F1 scores of 0.55 and 0.52 respectively. Comprehensive analyses using bi-gram networks and t-SNE visualisations explore the nuanced linguistic indicators of mental health states, culminating in a discussion of the implications for future NLP applications in mental health monitoring. © 2024 IEEE.
|