An optimal and stable algorithm for clustering numerical data

In the conventional k-means framework, seeding is the first step toward optimization before the objects are clustered. In random seeding, two main issues arise: the clustering results may be less than optimal and different clustering results may be obtained for every run. In real-world applications,...

Full description

Bibliographic Details
Published in:Algorithms
Main Author: Seman A.; Sapawi A.M.
Format: Article
Language:English
Published: MDPI AG 2021
Online Access:https://www.scopus.com/inward/record.uri?eid=2-s2.0-85109399419&doi=10.3390%2fa14070197&partnerID=40&md5=28e9e298aa7e11e0021cdaf07826ebad
id 2-s2.0-85109399419
spelling 2-s2.0-85109399419
Seman A.; Sapawi A.M.
An optimal and stable algorithm for clustering numerical data
2021
Algorithms
14
7
10.3390/a14070197
https://www.scopus.com/inward/record.uri?eid=2-s2.0-85109399419&doi=10.3390%2fa14070197&partnerID=40&md5=28e9e298aa7e11e0021cdaf07826ebad
In the conventional k-means framework, seeding is the first step toward optimization before the objects are clustered. In random seeding, two main issues arise: the clustering results may be less than optimal and different clustering results may be obtained for every run. In real-world applications, optimal and stable clustering is highly desirable. This report introduces a new clustering algorithm called the zero k-approximate modal haplotype (Zk-AMH) algorithm that uses a simple and novel seeding mechanism known as zero-point multidimensional spaces. The Zk-AMH provides cluster optimality and stability, therefore resolving the aforementioned issues. Notably, the Zk-AMH algorithm yielded identical mean scores to maximum, and minimum scores in 100 runs, producing zero standard deviation to show its stability. Additionally, when the Zk-AMH algorithm was applied to eight datasets, it achieved the highest mean scores for four datasets, produced an approximately equal score for one dataset, and yielded marginally lower scores for the other three datasets. With its optimality and stability, the Zk-AMH algorithm could be a suitable alternative for developing future clustering tools. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.
MDPI AG
19994893
English
Article
All Open Access; Gold Open Access
author Seman A.; Sapawi A.M.
spellingShingle Seman A.; Sapawi A.M.
An optimal and stable algorithm for clustering numerical data
author_facet Seman A.; Sapawi A.M.
author_sort Seman A.; Sapawi A.M.
title An optimal and stable algorithm for clustering numerical data
title_short An optimal and stable algorithm for clustering numerical data
title_full An optimal and stable algorithm for clustering numerical data
title_fullStr An optimal and stable algorithm for clustering numerical data
title_full_unstemmed An optimal and stable algorithm for clustering numerical data
title_sort An optimal and stable algorithm for clustering numerical data
publishDate 2021
container_title Algorithms
container_volume 14
container_issue 7
doi_str_mv 10.3390/a14070197
url https://www.scopus.com/inward/record.uri?eid=2-s2.0-85109399419&doi=10.3390%2fa14070197&partnerID=40&md5=28e9e298aa7e11e0021cdaf07826ebad
description In the conventional k-means framework, seeding is the first step toward optimization before the objects are clustered. In random seeding, two main issues arise: the clustering results may be less than optimal and different clustering results may be obtained for every run. In real-world applications, optimal and stable clustering is highly desirable. This report introduces a new clustering algorithm called the zero k-approximate modal haplotype (Zk-AMH) algorithm that uses a simple and novel seeding mechanism known as zero-point multidimensional spaces. The Zk-AMH provides cluster optimality and stability, therefore resolving the aforementioned issues. Notably, the Zk-AMH algorithm yielded identical mean scores to maximum, and minimum scores in 100 runs, producing zero standard deviation to show its stability. Additionally, when the Zk-AMH algorithm was applied to eight datasets, it achieved the highest mean scores for four datasets, produced an approximately equal score for one dataset, and yielded marginally lower scores for the other three datasets. With its optimality and stability, the Zk-AMH algorithm could be a suitable alternative for developing future clustering tools. © 2021 by the authors. Licensee MDPI, Basel, Switzerland.
publisher MDPI AG
issn 19994893
language English
format Article
accesstype All Open Access; Gold Open Access
record_format scopus
collection Scopus
_version_ 1792585528606982144