A scoping review of topic modelling on online data
With the increasing prevalence of unstructured online data generated (e.g., social media, online forums), mining them is important since they provide a genuine viewpoint of the public. Due to this significant advantage, topic modelling has become more important than ever. Topic modelling is a natura...
Published in: | Indonesian Journal of Electrical Engineering and Computer Science |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Published: |
Institute of Advanced Engineering and Science
2023
|
Online Access: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-85171730967&doi=10.11591%2fijeecs.v31.i3.pp1633-1641&partnerID=40&md5=c4531f3b7fa588e66414209db105ef8f |
id |
2-s2.0-85171730967 |
---|---|
spelling |
2-s2.0-85171730967 Sharif M.M.M.; Maskat R.; Baharum Z.; Maskat K. A scoping review of topic modelling on online data 2023 Indonesian Journal of Electrical Engineering and Computer Science 31 3 10.11591/ijeecs.v31.i3.pp1633-1641 https://www.scopus.com/inward/record.uri?eid=2-s2.0-85171730967&doi=10.11591%2fijeecs.v31.i3.pp1633-1641&partnerID=40&md5=c4531f3b7fa588e66414209db105ef8f With the increasing prevalence of unstructured online data generated (e.g., social media, online forums), mining them is important since they provide a genuine viewpoint of the public. Due to this significant advantage, topic modelling has become more important than ever. Topic modelling is a natural language processing (NLP) technique that mainly reveals relevant topics hidden in text corpora. This paper aims to review recent research trends in topic modelling and state-of-the-art techniques used when dealing with online data. Preferred reporting items for systematic reviews and meta-analysis (PRISMA) methodology was used in this scoping review. This study was conducted on recent research works published from 2020 to 2022. We constructed 5 research questions for the interest of many researchers. 36 relevant papers revealed that more work on non-English languages is needed, common pre-processing techniques were applied to all datasets regardless of language e.g., stop word removal; latent dirichlet allocation (LDA) is the most used modelling technique and also one of the best performing; and the produced result is most evaluated using topic coherence. In conclusion, topic modelling has largely benefited from LDA, thus, it is interesting to see if this trend continues in the future across languages. © 2023 Institute of Advanced Engineering and Science. All rights reserved. Institute of Advanced Engineering and Science 25024752 English Article All Open Access; Gold Open Access; Green Open Access |
author |
Sharif M.M.M.; Maskat R.; Baharum Z.; Maskat K. |
spellingShingle |
Sharif M.M.M.; Maskat R.; Baharum Z.; Maskat K. A scoping review of topic modelling on online data |
author_facet |
Sharif M.M.M.; Maskat R.; Baharum Z.; Maskat K. |
author_sort |
Sharif M.M.M.; Maskat R.; Baharum Z.; Maskat K. |
title |
A scoping review of topic modelling on online data |
title_short |
A scoping review of topic modelling on online data |
title_full |
A scoping review of topic modelling on online data |
title_fullStr |
A scoping review of topic modelling on online data |
title_full_unstemmed |
A scoping review of topic modelling on online data |
title_sort |
A scoping review of topic modelling on online data |
publishDate |
2023 |
container_title |
Indonesian Journal of Electrical Engineering and Computer Science |
container_volume |
31 |
container_issue |
3 |
doi_str_mv |
10.11591/ijeecs.v31.i3.pp1633-1641 |
url |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85171730967&doi=10.11591%2fijeecs.v31.i3.pp1633-1641&partnerID=40&md5=c4531f3b7fa588e66414209db105ef8f |
description |
With the increasing prevalence of unstructured online data generated (e.g., social media, online forums), mining them is important since they provide a genuine viewpoint of the public. Due to this significant advantage, topic modelling has become more important than ever. Topic modelling is a natural language processing (NLP) technique that mainly reveals relevant topics hidden in text corpora. This paper aims to review recent research trends in topic modelling and state-of-the-art techniques used when dealing with online data. Preferred reporting items for systematic reviews and meta-analysis (PRISMA) methodology was used in this scoping review. This study was conducted on recent research works published from 2020 to 2022. We constructed 5 research questions for the interest of many researchers. 36 relevant papers revealed that more work on non-English languages is needed, common pre-processing techniques were applied to all datasets regardless of language e.g., stop word removal; latent dirichlet allocation (LDA) is the most used modelling technique and also one of the best performing; and the produced result is most evaluated using topic coherence. In conclusion, topic modelling has largely benefited from LDA, thus, it is interesting to see if this trend continues in the future across languages. © 2023 Institute of Advanced Engineering and Science. All rights reserved. |
publisher |
Institute of Advanced Engineering and Science |
issn |
25024752 |
language |
English |
format |
Article |
accesstype |
All Open Access; Gold Open Access; Green Open Access |
record_format |
scopus |
collection |
Scopus |
_version_ |
1809677681960157184 |