Sentiment classification from reviews for tourism analytics

User-generated content is critical for tourism destination management as it could help them identify their customers' opinions and come up with solutions to upgrade their tourism organizations as it could help them identify customer opinions. There are many reviews on social media, and it is di...

Full description

Bibliographic Details
Published in:International Journal of Advances in Intelligent Informatics
Main Author: Haris N.A.K.M.; Mutalib S.; Malik A.M.A.; Abdul-Rahman S.; Kamarudin S.N.K.
Format: Article
Language:English
Published: Universitas Ahmad Dahlan 2023
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85153593325&doi=10.26555%2fijain.v9i1.1077&partnerID=40&md5=a9116f6f333c1a9a702c2b0ae15c9e47
id 2-s2.0-85153593325
spelling 2-s2.0-85153593325
Haris N.A.K.M.; Mutalib S.; Malik A.M.A.; Abdul-Rahman S.; Kamarudin S.N.K.
Sentiment classification from reviews for tourism analytics
2023
International Journal of Advances in Intelligent Informatics
9
1
10.26555/ijain.v9i1.1077
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85153593325&doi=10.26555%2fijain.v9i1.1077&partnerID=40&md5=a9116f6f333c1a9a702c2b0ae15c9e47
User-generated content is critical for tourism destination management as it could help them identify their customers' opinions and come up with solutions to upgrade their tourism organizations as it could help them identify customer opinions. There are many reviews on social media, and it is difficult for these organizations to analyze them manually. By applying sentiment classification, reviews can be classified into several classes and help ease decision-making. The reviews contain noisy contents, such as typos and emoticons, which could affect the accuracy of the classifiers. This study evaluates the reviews using Support Vector Machine and Random Forest models to identify a suitable classifier. The main phases in this study are data collection, preparation, labeling, and modeling. The reviews are labeled into three sentiments; positive, neutral, and negative. During pre-processing, steps such as removing the missing value, tokenization, case folding, stop words removal, stemming, and applying n-grams are performed. The result of this research is evaluated by looking at the performance of the models based on accuracy, where the result with the highest accuracy is chosen as the solution. In this study, data is data from TripAdvisor and Google reviews using web scraping tools. The findings show that the Support Vector Machine model with 5-fold cross-validation is the most suitable classifier with an accuracy of 67.97% compared to Naive Bayes with 61.33% accuracy and the Random Forest classifier with 63.55% accuracy. In conclusion, the result of this paper could provide important information in tourism besides determining the suitable algorithm to be used for Sentiment Analysis related to the tourism domain. © 2023, Universitas Ahmad Dahlan. All rights reserved.
Universitas Ahmad Dahlan
24426571
English
Article
All Open Access; Gold Open Access
author Haris N.A.K.M.; Mutalib S.; Malik A.M.A.; Abdul-Rahman S.; Kamarudin S.N.K.
spellingShingle Haris N.A.K.M.; Mutalib S.; Malik A.M.A.; Abdul-Rahman S.; Kamarudin S.N.K.
Sentiment classification from reviews for tourism analytics
author_facet Haris N.A.K.M.; Mutalib S.; Malik A.M.A.; Abdul-Rahman S.; Kamarudin S.N.K.
author_sort Haris N.A.K.M.; Mutalib S.; Malik A.M.A.; Abdul-Rahman S.; Kamarudin S.N.K.
title Sentiment classification from reviews for tourism analytics
title_short Sentiment classification from reviews for tourism analytics
title_full Sentiment classification from reviews for tourism analytics
title_fullStr Sentiment classification from reviews for tourism analytics
title_full_unstemmed Sentiment classification from reviews for tourism analytics
title_sort Sentiment classification from reviews for tourism analytics
publishDate 2023
container_title International Journal of Advances in Intelligent Informatics
container_volume 9
container_issue 1
doi_str_mv 10.26555/ijain.v9i1.1077
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85153593325&doi=10.26555%2fijain.v9i1.1077&partnerID=40&md5=a9116f6f333c1a9a702c2b0ae15c9e47
description User-generated content is critical for tourism destination management as it could help them identify their customers' opinions and come up with solutions to upgrade their tourism organizations as it could help them identify customer opinions. There are many reviews on social media, and it is difficult for these organizations to analyze them manually. By applying sentiment classification, reviews can be classified into several classes and help ease decision-making. The reviews contain noisy contents, such as typos and emoticons, which could affect the accuracy of the classifiers. This study evaluates the reviews using Support Vector Machine and Random Forest models to identify a suitable classifier. The main phases in this study are data collection, preparation, labeling, and modeling. The reviews are labeled into three sentiments; positive, neutral, and negative. During pre-processing, steps such as removing the missing value, tokenization, case folding, stop words removal, stemming, and applying n-grams are performed. The result of this research is evaluated by looking at the performance of the models based on accuracy, where the result with the highest accuracy is chosen as the solution. In this study, data is data from TripAdvisor and Google reviews using web scraping tools. The findings show that the Support Vector Machine model with 5-fold cross-validation is the most suitable classifier with an accuracy of 67.97% compared to Naive Bayes with 61.33% accuracy and the Random Forest classifier with 63.55% accuracy. In conclusion, the result of this paper could provide important information in tourism besides determining the suitable algorithm to be used for Sentiment Analysis related to the tourism domain. © 2023, Universitas Ahmad Dahlan. All rights reserved.
publisher Universitas Ahmad Dahlan
issn 24426571
language English
format Article
accesstype All Open Access; Gold Open Access
record_format scopus
collection Scopus
_version_ 1814778503438008320