Clustering the unlabeled data using a modified cat swarm optimization

This paper presents a modified version of the Cat Swarm Optimization (CSO) algorithm aimed at addressing the limitations of traditional clustering methods in handling complex, high-dimensional datasets. The primary objective of this research is to improve clustering accuracy and stability by elimina...

Full description

Bibliographic Details
Published in:Journal of Applied Data Sciences
Main Author: Dewi D.A.; Kurniawan T.B.; Mohd Zakizakaria; Armoogum S.
Format: Article
Language:English
Published: Bright Publisher 2024
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85210851113&doi=10.47738%2fjads.v5i3.349&partnerID=40&md5=b9bbaca40cab8a163ad06306ff30229f
Description
Summary:This paper presents a modified version of the Cat Swarm Optimization (CSO) algorithm aimed at addressing the limitations of traditional clustering methods in handling complex, high-dimensional datasets. The primary objective of this research is to improve clustering accuracy and stability by eliminating the mixture ratio (MR), setting the counts of dimensions to change (CDC) to 100%, and incorporating a new search equation in the tracing mode of the CSO algorithm. To evaluate the performance of the modified algorithm, five classic datasets from the UCI Machine Learning Repository—namely Iris, Cancer, Glass, Wine, and Contraceptive Method Choice (CMC)—were used. The proposed algorithm was compared against K-Means and the original CSO. Performance metrics such as intra-cluster distance, standard deviation, and F- measure were used to assess the quality of clustering. The results demonstrated that the modified CSO consistently outperformed the competing algorithms. For example, on the Iris dataset, the modified CSO achieved a best intra-cluster distance of 96.78 and an F-measure of 0.786, compared to 97.12 and 0.781 for K-Means. Similarly, for the Wine dataset, the modified CSO reached a best intra-cluster distance of 16399, surpassing K-Means which recorded 16768. In conclusion, the modifications introduced to the CSO algorithm significantly enhance its clustering performance across diverse datasets, producing tighter and more accurate clusters with improved stability. These findings suggest that the modified CSO is a robust and effective tool for data clustering tasks, particularly in high-dimensional spaces. Future work will focus on dynamic parameter tuning and testing the scalability of the algorithm on larger and more complex datasets. © Authors retain all copyrights.
ISSN:27236471
DOI:10.47738/jads.v5i3.349