A scoping review of topic modelling on online data

With the increasing prevalence of unstructured online data generated (e.g., social media, online forums), mining them is important since they provide a genuine viewpoint of the public. Due to this significant advantage, topic modelling has become more important than ever. Topic modelling is a natura...

Full description

Bibliographic Details
Published in:Indonesian Journal of Electrical Engineering and Computer Science
Main Author: Sharif M.M.M.; Maskat R.; Baharum Z.; Maskat K.
Format: Article
Language:English
Published: Institute of Advanced Engineering and Science 2023
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85171730967&doi=10.11591%2fijeecs.v31.i3.pp1633-1641&partnerID=40&md5=c4531f3b7fa588e66414209db105ef8f
id 2-s2.0-85171730967
spelling 2-s2.0-85171730967
Sharif M.M.M.; Maskat R.; Baharum Z.; Maskat K.
A scoping review of topic modelling on online data
2023
Indonesian Journal of Electrical Engineering and Computer Science
31
3
10.11591/ijeecs.v31.i3.pp1633-1641
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85171730967&doi=10.11591%2fijeecs.v31.i3.pp1633-1641&partnerID=40&md5=c4531f3b7fa588e66414209db105ef8f
With the increasing prevalence of unstructured online data generated (e.g., social media, online forums), mining them is important since they provide a genuine viewpoint of the public. Due to this significant advantage, topic modelling has become more important than ever. Topic modelling is a natural language processing (NLP) technique that mainly reveals relevant topics hidden in text corpora. This paper aims to review recent research trends in topic modelling and state-of-the-art techniques used when dealing with online data. Preferred reporting items for systematic reviews and meta-analysis (PRISMA) methodology was used in this scoping review. This study was conducted on recent research works published from 2020 to 2022. We constructed 5 research questions for the interest of many researchers. 36 relevant papers revealed that more work on non-English languages is needed, common pre-processing techniques were applied to all datasets regardless of language e.g., stop word removal; latent dirichlet allocation (LDA) is the most used modelling technique and also one of the best performing; and the produced result is most evaluated using topic coherence. In conclusion, topic modelling has largely benefited from LDA, thus, it is interesting to see if this trend continues in the future across languages. © 2023 Institute of Advanced Engineering and Science. All rights reserved.
Institute of Advanced Engineering and Science
25024752
English
Article
All Open Access; Gold Open Access; Green Open Access
author Sharif M.M.M.; Maskat R.; Baharum Z.; Maskat K.
spellingShingle Sharif M.M.M.; Maskat R.; Baharum Z.; Maskat K.
A scoping review of topic modelling on online data
author_facet Sharif M.M.M.; Maskat R.; Baharum Z.; Maskat K.
author_sort Sharif M.M.M.; Maskat R.; Baharum Z.; Maskat K.
title A scoping review of topic modelling on online data
title_short A scoping review of topic modelling on online data
title_full A scoping review of topic modelling on online data
title_fullStr A scoping review of topic modelling on online data
title_full_unstemmed A scoping review of topic modelling on online data
title_sort A scoping review of topic modelling on online data
publishDate 2023
container_title Indonesian Journal of Electrical Engineering and Computer Science
container_volume 31
container_issue 3
doi_str_mv 10.11591/ijeecs.v31.i3.pp1633-1641
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85171730967&doi=10.11591%2fijeecs.v31.i3.pp1633-1641&partnerID=40&md5=c4531f3b7fa588e66414209db105ef8f
description With the increasing prevalence of unstructured online data generated (e.g., social media, online forums), mining them is important since they provide a genuine viewpoint of the public. Due to this significant advantage, topic modelling has become more important than ever. Topic modelling is a natural language processing (NLP) technique that mainly reveals relevant topics hidden in text corpora. This paper aims to review recent research trends in topic modelling and state-of-the-art techniques used when dealing with online data. Preferred reporting items for systematic reviews and meta-analysis (PRISMA) methodology was used in this scoping review. This study was conducted on recent research works published from 2020 to 2022. We constructed 5 research questions for the interest of many researchers. 36 relevant papers revealed that more work on non-English languages is needed, common pre-processing techniques were applied to all datasets regardless of language e.g., stop word removal; latent dirichlet allocation (LDA) is the most used modelling technique and also one of the best performing; and the produced result is most evaluated using topic coherence. In conclusion, topic modelling has largely benefited from LDA, thus, it is interesting to see if this trend continues in the future across languages. © 2023 Institute of Advanced Engineering and Science. All rights reserved.
publisher Institute of Advanced Engineering and Science
issn 25024752
language English
format Article
accesstype All Open Access; Gold Open Access; Green Open Access
record_format scopus
collection Scopus
_version_ 1809677681960157184