Dataset on sentiment-based cryptocurrency-related news and tweets in English and Malay language

Cryptocurrency trading is becoming popular due to its profitable investment and has led to worldwide involvement in buying and selling cryptocurrency assets. Sentiments expressed by cryptocurrency enthusiasts toward some news via social media or other online platforms may affect the cryptocurrency m...

Full description

Bibliographic Details
Published in:LANGUAGE RESOURCES AND EVALUATION
Main Authors: Zamani, Nur Azmina Mohamad; Kamaruddin, Norhaslinda; Yusof, Ahmad Muhyiddin B.
Format: Article; Early Access
Language:English
Published: SPRINGER 2024
Subjects:
Online Access:https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001228216000001
author Zamani
Nur Azmina Mohamad; Kamaruddin
Norhaslinda; Yusof
Ahmad Muhyiddin B.
spellingShingle Zamani
Nur Azmina Mohamad; Kamaruddin
Norhaslinda; Yusof
Ahmad Muhyiddin B.
Dataset on sentiment-based cryptocurrency-related news and tweets in English and Malay language
Computer Science
author_facet Zamani
Nur Azmina Mohamad; Kamaruddin
Norhaslinda; Yusof
Ahmad Muhyiddin B.
author_sort Zamani
spelling Zamani, Nur Azmina Mohamad; Kamaruddin, Norhaslinda; Yusof, Ahmad Muhyiddin B.
Dataset on sentiment-based cryptocurrency-related news and tweets in English and Malay language
LANGUAGE RESOURCES AND EVALUATION
English
Article; Early Access
Cryptocurrency trading is becoming popular due to its profitable investment and has led to worldwide involvement in buying and selling cryptocurrency assets. Sentiments expressed by cryptocurrency enthusiasts toward some news via social media or other online platforms may affect the cryptocurrency market activities. Thus, it has become a challenge to determine the level of positivity or negativity (regression) inhibiting the texts than simply classifying the sentiment into categorical classes. Regression offers more detailed information than a simple classification which can be robust to noisy data as they consider the entire range of possible target values. On the contrary, classification can lead to biased models due to imbalanced dataset and tend to cause overfitting. Hence, this work emphasises in creating sentiment-based cryptocurrency-related corpora in English and Malay focusing on Bitcoin and Ethereum. The data was collected from January to December 2021 from the publicly available news online and tweets from Twitter in English and Malay. The dataset contains a total of 29,694 instances comprised of 5694 news data and 24,000 tweets data. During the annotation process, the annotators are trained until Krippendorf's alpha agreement of above 60% is achieved since it is considered an applicable benckmark due to the annotation complexity. The corpora is available on Github for cryptocurrency-related experiments using various machine learning or deep learning models to study English and Malay sentiments effect on the global market, particularly the Malaysian market and can be extended for further analysis for Bitcoin and Ethereum market volatile nature.
SPRINGER
1574-020X
1574-0218
2024


10.1007/s10579-024-09733-z
Computer Science

WOS:001228216000001
https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001228216000001
title Dataset on sentiment-based cryptocurrency-related news and tweets in English and Malay language
title_short Dataset on sentiment-based cryptocurrency-related news and tweets in English and Malay language
title_full Dataset on sentiment-based cryptocurrency-related news and tweets in English and Malay language
title_fullStr Dataset on sentiment-based cryptocurrency-related news and tweets in English and Malay language
title_full_unstemmed Dataset on sentiment-based cryptocurrency-related news and tweets in English and Malay language
title_sort Dataset on sentiment-based cryptocurrency-related news and tweets in English and Malay language
container_title LANGUAGE RESOURCES AND EVALUATION
language English
format Article; Early Access
description Cryptocurrency trading is becoming popular due to its profitable investment and has led to worldwide involvement in buying and selling cryptocurrency assets. Sentiments expressed by cryptocurrency enthusiasts toward some news via social media or other online platforms may affect the cryptocurrency market activities. Thus, it has become a challenge to determine the level of positivity or negativity (regression) inhibiting the texts than simply classifying the sentiment into categorical classes. Regression offers more detailed information than a simple classification which can be robust to noisy data as they consider the entire range of possible target values. On the contrary, classification can lead to biased models due to imbalanced dataset and tend to cause overfitting. Hence, this work emphasises in creating sentiment-based cryptocurrency-related corpora in English and Malay focusing on Bitcoin and Ethereum. The data was collected from January to December 2021 from the publicly available news online and tweets from Twitter in English and Malay. The dataset contains a total of 29,694 instances comprised of 5694 news data and 24,000 tweets data. During the annotation process, the annotators are trained until Krippendorf's alpha agreement of above 60% is achieved since it is considered an applicable benckmark due to the annotation complexity. The corpora is available on Github for cryptocurrency-related experiments using various machine learning or deep learning models to study English and Malay sentiments effect on the global market, particularly the Malaysian market and can be extended for further analysis for Bitcoin and Ethereum market volatile nature.
publisher SPRINGER
issn 1574-020X
1574-0218
publishDate 2024
container_volume
container_issue
doi_str_mv 10.1007/s10579-024-09733-z
topic Computer Science
topic_facet Computer Science
accesstype
id WOS:001228216000001
url https://www-webofscience-com.uitm.idm.oclc.org/wos/woscc/full-record/WOS:001228216000001
record_format wos
collection Web of Science (WoS)
_version_ 1809679004177793024