An efficient clustering algorithm for partitioning Y-short tandem repeats data
Background: Y-Short Tandem Repeats (Y-STR) data consist of many similar and almost similar objects. This characteristic of Y-STR data causes two problems with partitioning: non-unique centroids and local minima problems. As a result, the existing partitioning algorithms produce poor clustering resul...
Published in: | BMC Research Notes |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Published: |
BioMed Central Ltd.
2012
|
Online Access: | https://www.scopus.com/inward/record.uri?eid=2-s2.0-84874099937&doi=10.1186%2f1756-0500-5-557&partnerID=40&md5=65f04f86299e6afca8effb90b834dadd |
id |
2-s2.0-84874099937 |
---|---|
spelling |
2-s2.0-84874099937 Seman A.; Bakar Z.A.; Isa M.N. An efficient clustering algorithm for partitioning Y-short tandem repeats data 2012 BMC Research Notes 5 10.1186/1756-0500-5-557 https://www.scopus.com/inward/record.uri?eid=2-s2.0-84874099937&doi=10.1186%2f1756-0500-5-557&partnerID=40&md5=65f04f86299e6afca8effb90b834dadd Background: Y-Short Tandem Repeats (Y-STR) data consist of many similar and almost similar objects. This characteristic of Y-STR data causes two problems with partitioning: non-unique centroids and local minima problems. As a result, the existing partitioning algorithms produce poor clustering results. Results: Our new algorithm, called k-Approximate Modal Haplotypes (k-AMH), obtains the highest clustering accuracy scores for five out of six datasets, and produces an equal performance for the remaining dataset. Furthermore, clustering accuracy scores of 100% are achieved for two of the datasets. The k-AMH algorithm records the highest mean accuracy score of 0.93 overall, compared to that of other algorithms: k-Population (0.91), k-Modes-RVF (0.81), New Fuzzy k-Modes (0.80), k-Modes (0.76), k-Modes-Hybrid 1 (0.76), k-Modes-Hybrid 2 (0.75), Fuzzy k-Modes (0.74), and k-Modes-UAVM (0.70). Conclusions: The partitioning performance of the k-AMH algorithm for Y-STR data is superior to that of other algorithms, owing to its ability to solve the non-unique centroids and local minima problems. Our algorithm is also efficient in terms of time complexity, which is recorded as O(km(n-k)) and considered to be linear. © 2012 Seman et al.; licensee BioMed Central Ltd. BioMed Central Ltd. 17560500 English Article All Open Access; Gold Open Access; Green Open Access |
author |
Seman A.; Bakar Z.A.; Isa M.N. |
spellingShingle |
Seman A.; Bakar Z.A.; Isa M.N. An efficient clustering algorithm for partitioning Y-short tandem repeats data |
author_facet |
Seman A.; Bakar Z.A.; Isa M.N. |
author_sort |
Seman A.; Bakar Z.A.; Isa M.N. |
title |
An efficient clustering algorithm for partitioning Y-short tandem repeats data |
title_short |
An efficient clustering algorithm for partitioning Y-short tandem repeats data |
title_full |
An efficient clustering algorithm for partitioning Y-short tandem repeats data |
title_fullStr |
An efficient clustering algorithm for partitioning Y-short tandem repeats data |
title_full_unstemmed |
An efficient clustering algorithm for partitioning Y-short tandem repeats data |
title_sort |
An efficient clustering algorithm for partitioning Y-short tandem repeats data |
publishDate |
2012 |
container_title |
BMC Research Notes |
container_volume |
5 |
container_issue |
|
doi_str_mv |
10.1186/1756-0500-5-557 |
url |
https://www.scopus.com/inward/record.uri?eid=2-s2.0-84874099937&doi=10.1186%2f1756-0500-5-557&partnerID=40&md5=65f04f86299e6afca8effb90b834dadd |
description |
Background: Y-Short Tandem Repeats (Y-STR) data consist of many similar and almost similar objects. This characteristic of Y-STR data causes two problems with partitioning: non-unique centroids and local minima problems. As a result, the existing partitioning algorithms produce poor clustering results. Results: Our new algorithm, called k-Approximate Modal Haplotypes (k-AMH), obtains the highest clustering accuracy scores for five out of six datasets, and produces an equal performance for the remaining dataset. Furthermore, clustering accuracy scores of 100% are achieved for two of the datasets. The k-AMH algorithm records the highest mean accuracy score of 0.93 overall, compared to that of other algorithms: k-Population (0.91), k-Modes-RVF (0.81), New Fuzzy k-Modes (0.80), k-Modes (0.76), k-Modes-Hybrid 1 (0.76), k-Modes-Hybrid 2 (0.75), Fuzzy k-Modes (0.74), and k-Modes-UAVM (0.70). Conclusions: The partitioning performance of the k-AMH algorithm for Y-STR data is superior to that of other algorithms, owing to its ability to solve the non-unique centroids and local minima problems. Our algorithm is also efficient in terms of time complexity, which is recorded as O(km(n-k)) and considered to be linear. © 2012 Seman et al.; licensee BioMed Central Ltd. |
publisher |
BioMed Central Ltd. |
issn |
17560500 |
language |
English |
format |
Article |
accesstype |
All Open Access; Gold Open Access; Green Open Access |
record_format |
scopus |
collection |
Scopus |
_version_ |
1820775481578881024 |