Clustering the unlabeled data using a modified cat swarm optimization

This paper presents a modified version of the Cat Swarm Optimization (CSO) algorithm aimed at addressing the limitations of traditional clustering methods in handling complex, high-dimensional datasets. The primary objective of this research is to improve clustering accuracy and stability by elimina...

Full description

Bibliographic Details
Published in:Journal of Applied Data Sciences
Main Author: Dewi D.A.; Kurniawan T.B.; Mohd Zakizakaria; Armoogum S.
Format: Article
Language:English
Published: Bright Publisher 2024
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85210851113&doi=10.47738%2fjads.v5i3.349&partnerID=40&md5=b9bbaca40cab8a163ad06306ff30229f
id 2-s2.0-85210851113
spelling 2-s2.0-85210851113
Dewi D.A.; Kurniawan T.B.; Mohd Zakizakaria; Armoogum S.
Clustering the unlabeled data using a modified cat swarm optimization
2024
Journal of Applied Data Sciences
5
3
10.47738/jads.v5i3.349
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85210851113&doi=10.47738%2fjads.v5i3.349&partnerID=40&md5=b9bbaca40cab8a163ad06306ff30229f
This paper presents a modified version of the Cat Swarm Optimization (CSO) algorithm aimed at addressing the limitations of traditional clustering methods in handling complex, high-dimensional datasets. The primary objective of this research is to improve clustering accuracy and stability by eliminating the mixture ratio (MR), setting the counts of dimensions to change (CDC) to 100%, and incorporating a new search equation in the tracing mode of the CSO algorithm. To evaluate the performance of the modified algorithm, five classic datasets from the UCI Machine Learning Repository—namely Iris, Cancer, Glass, Wine, and Contraceptive Method Choice (CMC)—were used. The proposed algorithm was compared against K-Means and the original CSO. Performance metrics such as intra-cluster distance, standard deviation, and F- measure were used to assess the quality of clustering. The results demonstrated that the modified CSO consistently outperformed the competing algorithms. For example, on the Iris dataset, the modified CSO achieved a best intra-cluster distance of 96.78 and an F-measure of 0.786, compared to 97.12 and 0.781 for K-Means. Similarly, for the Wine dataset, the modified CSO reached a best intra-cluster distance of 16399, surpassing K-Means which recorded 16768. In conclusion, the modifications introduced to the CSO algorithm significantly enhance its clustering performance across diverse datasets, producing tighter and more accurate clusters with improved stability. These findings suggest that the modified CSO is a robust and effective tool for data clustering tasks, particularly in high-dimensional spaces. Future work will focus on dynamic parameter tuning and testing the scalability of the algorithm on larger and more complex datasets. © Authors retain all copyrights.
Bright Publisher
27236471
English
Article

author Dewi D.A.; Kurniawan T.B.; Mohd Zakizakaria; Armoogum S.
spellingShingle Dewi D.A.; Kurniawan T.B.; Mohd Zakizakaria; Armoogum S.
Clustering the unlabeled data using a modified cat swarm optimization
author_facet Dewi D.A.; Kurniawan T.B.; Mohd Zakizakaria; Armoogum S.
author_sort Dewi D.A.; Kurniawan T.B.; Mohd Zakizakaria; Armoogum S.
title Clustering the unlabeled data using a modified cat swarm optimization
title_short Clustering the unlabeled data using a modified cat swarm optimization
title_full Clustering the unlabeled data using a modified cat swarm optimization
title_fullStr Clustering the unlabeled data using a modified cat swarm optimization
title_full_unstemmed Clustering the unlabeled data using a modified cat swarm optimization
title_sort Clustering the unlabeled data using a modified cat swarm optimization
publishDate 2024
container_title Journal of Applied Data Sciences
container_volume 5
container_issue 3
doi_str_mv 10.47738/jads.v5i3.349
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85210851113&doi=10.47738%2fjads.v5i3.349&partnerID=40&md5=b9bbaca40cab8a163ad06306ff30229f
description This paper presents a modified version of the Cat Swarm Optimization (CSO) algorithm aimed at addressing the limitations of traditional clustering methods in handling complex, high-dimensional datasets. The primary objective of this research is to improve clustering accuracy and stability by eliminating the mixture ratio (MR), setting the counts of dimensions to change (CDC) to 100%, and incorporating a new search equation in the tracing mode of the CSO algorithm. To evaluate the performance of the modified algorithm, five classic datasets from the UCI Machine Learning Repository—namely Iris, Cancer, Glass, Wine, and Contraceptive Method Choice (CMC)—were used. The proposed algorithm was compared against K-Means and the original CSO. Performance metrics such as intra-cluster distance, standard deviation, and F- measure were used to assess the quality of clustering. The results demonstrated that the modified CSO consistently outperformed the competing algorithms. For example, on the Iris dataset, the modified CSO achieved a best intra-cluster distance of 96.78 and an F-measure of 0.786, compared to 97.12 and 0.781 for K-Means. Similarly, for the Wine dataset, the modified CSO reached a best intra-cluster distance of 16399, surpassing K-Means which recorded 16768. In conclusion, the modifications introduced to the CSO algorithm significantly enhance its clustering performance across diverse datasets, producing tighter and more accurate clusters with improved stability. These findings suggest that the modified CSO is a robust and effective tool for data clustering tasks, particularly in high-dimensional spaces. Future work will focus on dynamic parameter tuning and testing the scalability of the algorithm on larger and more complex datasets. © Authors retain all copyrights.
publisher Bright Publisher
issn 27236471
language English
format Article
accesstype
record_format scopus
collection Scopus
_version_ 1820775433680977920