Identifying Topic Modeling Technique in Evaluating Textual Datasets

One of the most popular methods of topic modeling is Latent Dirichlet Allocation (LDA). To date, philanthropic corporate social responsibility (PCSR) activities are ad-hoc in nature, where assistance is provided more to basic needs with very little attention to activities that can contribute to erad...

詳細記述

書誌詳細
出版年:Lecture Notes on Data Engineering and Communications Technologies
第一著者: 2-s2.0-85151958013
フォーマット: Book chapter
言語:English
出版事項: Springer Science and Business Media Deutschland GmbH 2023
オンライン・アクセス:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85151958013&doi=10.1007%2f978-981-99-0741-0_36&partnerID=40&md5=906626bf37657f029d821b55e8579d0e
id Mangsor N.S.M.N.; Nasir S.A.M.; Abdul-Rahman S.; Ismail Z.
spelling Mangsor N.S.M.N.; Nasir S.A.M.; Abdul-Rahman S.; Ismail Z.
2-s2.0-85151958013
Identifying Topic Modeling Technique in Evaluating Textual Datasets
2023
Lecture Notes on Data Engineering and Communications Technologies
165

10.1007/978-981-99-0741-0_36
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85151958013&doi=10.1007%2f978-981-99-0741-0_36&partnerID=40&md5=906626bf37657f029d821b55e8579d0e
One of the most popular methods of topic modeling is Latent Dirichlet Allocation (LDA). To date, philanthropic corporate social responsibility (PCSR) activities are ad-hoc in nature, where assistance is provided more to basic needs with very little attention to activities that can contribute to eradicating poverty. Based on previous related literature, it is found that there is no proper categorization and documentation of PCSR-related activities. Therefore, this research is aimed to identify the most suitable LDA approaches for categorizing PCSR activities. The analysis involved five-years data from the annual reports of 19 CSR-award winning companies in Malaysia. For this study, three LDA techniques were considered and compared namely Variational Bayes Inference, Gibbs Sampling and Expectation Maximization. Then, performance measurement was carried out using coherence value and pyLDAvis technique. As a result, the study showed that the LDA Expectation Maximization method is the best topic modelling technique for clustering PCSR documents. Furthermore, this approach can estimate parameters in probabilistic models when dealing with partial, noisy or missing data. The findings offer an insight to be considered by companies in strategizing the CSR activities, particularly philanthropic responsibility in ensuring optimum impact to innovatively support the society. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
Springer Science and Business Media Deutschland GmbH
23674512
English
Book chapter

author 2-s2.0-85151958013
spellingShingle 2-s2.0-85151958013
Identifying Topic Modeling Technique in Evaluating Textual Datasets
author_facet 2-s2.0-85151958013
author_sort 2-s2.0-85151958013
title Identifying Topic Modeling Technique in Evaluating Textual Datasets
title_short Identifying Topic Modeling Technique in Evaluating Textual Datasets
title_full Identifying Topic Modeling Technique in Evaluating Textual Datasets
title_fullStr Identifying Topic Modeling Technique in Evaluating Textual Datasets
title_full_unstemmed Identifying Topic Modeling Technique in Evaluating Textual Datasets
title_sort Identifying Topic Modeling Technique in Evaluating Textual Datasets
publishDate 2023
container_title Lecture Notes on Data Engineering and Communications Technologies
container_volume 165
container_issue
doi_str_mv 10.1007/978-981-99-0741-0_36
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85151958013&doi=10.1007%2f978-981-99-0741-0_36&partnerID=40&md5=906626bf37657f029d821b55e8579d0e
description One of the most popular methods of topic modeling is Latent Dirichlet Allocation (LDA). To date, philanthropic corporate social responsibility (PCSR) activities are ad-hoc in nature, where assistance is provided more to basic needs with very little attention to activities that can contribute to eradicating poverty. Based on previous related literature, it is found that there is no proper categorization and documentation of PCSR-related activities. Therefore, this research is aimed to identify the most suitable LDA approaches for categorizing PCSR activities. The analysis involved five-years data from the annual reports of 19 CSR-award winning companies in Malaysia. For this study, three LDA techniques were considered and compared namely Variational Bayes Inference, Gibbs Sampling and Expectation Maximization. Then, performance measurement was carried out using coherence value and pyLDAvis technique. As a result, the study showed that the LDA Expectation Maximization method is the best topic modelling technique for clustering PCSR documents. Furthermore, this approach can estimate parameters in probabilistic models when dealing with partial, noisy or missing data. The findings offer an insight to be considered by companies in strategizing the CSR activities, particularly philanthropic responsibility in ensuring optimum impact to innovatively support the society. © 2023, The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
publisher Springer Science and Business Media Deutschland GmbH
issn 23674512
language English
format Book chapter
accesstype
record_format scopus
collection Scopus
_version_ 1828987866748813312