An optimal and stable algorithm for clustering numerical data

In the conventional k-means framework, seeding is the first step toward optimization before the objects are clustered. In random seeding, two main issues arise: the clustering results may be less than optimal and different clustering results may be obtained for every run. In real-world applications,...

Full description

Bibliographic Details
Published in:Algorithms
Main Author: Seman A.; Sapawi A.M.
Format: Article
Language:English
Published: MDPI AG 2021
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85109399419&doi=10.3390%2fa14070197&partnerID=40&md5=28e9e298aa7e11e0021cdaf07826ebad
Description
Summary:In the conventional k-means framework, seeding is the first step toward optimization before the objects are clustered. In random seeding, two main issues arise: the clustering results may be less than optimal and different clustering results may be obtained for every run. In real-world applications, optimal and stable clustering is highly desirable. This report introduces a new clustering algorithm called the zero k-approximate modal haplotype (Zk-AMH) algorithm that uses a simple and novel seeding mechanism known as zero-point multidimensional spaces. The Zk-AMH provides cluster optimality and stability, therefore resolving the aforementioned issues. Notably, the Zk-AMH algorithm yielded identical mean scores to maximum, and minimum scores in 100 runs, producing zero standard deviation to show its stability. Additionally, when the Zk-AMH algorithm was applied to eight datasets, it achieved the highest mean scores for four datasets, produced an approximately equal score for one dataset, and yielded marginally lower scores for the other three datasets. With its optimality and stability, the Zk-AMH algorithm could be a suitable alternative for developing future clustering tools. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.
ISSN:19994893
DOI:10.3390/a14070197