Towards development of clustering applications for large-scale comparative genotyping and kinship analysis using Y-short tandem repeats

Y-chromosome short tandem repeats (Y-STRs) are genetic markers with practical applications in human identification. However, where mass identification is required (e.g., in the aftermath of disasters with significant fatalities), the efficiency of the process could be improved with new statistical a...

Full description

Bibliographic Details
Published in:	OMICS A Journal of Integrative Biology
Main Author:	Seman A.; Sapawi A.M.; Salleh M.Z.
Format:	Article
Language:	English
Published:	Mary Ann Liebert Inc. 2015
Online Access:	https://www.scopus.com/inward/record.uri?eid=2-s2.0-84930578698&doi=10.1089%2fomi.2014.0136&partnerID=40&md5=ed5e9921a6b98f34903609f098d3df16

id	2-s2.0-84930578698
spelling	2-s2.0-84930578698 Seman A.; Sapawi A.M.; Salleh M.Z. Towards development of clustering applications for large-scale comparative genotyping and kinship analysis using Y-short tandem repeats 2015 OMICS A Journal of Integrative Biology 19 6 10.1089/omi.2014.0136 https://www.scopus.com/inward/record.uri?eid=2-s2.0-84930578698&doi=10.1089%2fomi.2014.0136&partnerID=40&md5=ed5e9921a6b98f34903609f098d3df16 Y-chromosome short tandem repeats (Y-STRs) are genetic markers with practical applications in human identification. However, where mass identification is required (e.g., in the aftermath of disasters with significant fatalities), the efficiency of the process could be improved with new statistical approaches. Clustering applications are relatively new tools for large-scale comparative genotyping, and the k-Approximate Modal Haplotype (k-AMH), an efficient algorithm for clustering large-scale Y-STR data, represents a promising method for developing these tools. In this study we improved the k-AMH and produced three new algorithms: the Nk-AMH I (including a new initial cluster center selection), the Nk-AMH II (including a new dominant weighting value), and the Nk-AMH III (combining I and II). The Nk-AMH III was the superior algorithm, with mean clustering accuracy that increased in four out of six datasets and remained at 100% in the other two. Additionally, the Nk-AMH III achieved a 2% higher overall mean clustering accuracy score than the k-AMH, as well as optimal accuracy for all datasets (0.84-1.00). With inclusion of the two new methods, the Nk-AMH III produced an optimal solution for clustering Y-STR data; thus, the algorithm has potential for further development towards fully automatic clustering of any large-scale genotypic data. © Copyright 2015, Mary Ann Liebert, Inc. 2015. Mary Ann Liebert Inc. 15362310 English Article All Open Access; Green Open Access
author	Seman A.; Sapawi A.M.; Salleh M.Z.
spellingShingle	Seman A.; Sapawi A.M.; Salleh M.Z. Towards development of clustering applications for large-scale comparative genotyping and kinship analysis using Y-short tandem repeats
author_facet	Seman A.; Sapawi A.M.; Salleh M.Z.
author_sort	Seman A.; Sapawi A.M.; Salleh M.Z.
title	Towards development of clustering applications for large-scale comparative genotyping and kinship analysis using Y-short tandem repeats
title_short	Towards development of clustering applications for large-scale comparative genotyping and kinship analysis using Y-short tandem repeats
title_full	Towards development of clustering applications for large-scale comparative genotyping and kinship analysis using Y-short tandem repeats
title_fullStr	Towards development of clustering applications for large-scale comparative genotyping and kinship analysis using Y-short tandem repeats
title_full_unstemmed	Towards development of clustering applications for large-scale comparative genotyping and kinship analysis using Y-short tandem repeats
title_sort	Towards development of clustering applications for large-scale comparative genotyping and kinship analysis using Y-short tandem repeats
publishDate	2015
container_title	OMICS A Journal of Integrative Biology
container_volume	19
container_issue	6
doi_str_mv	10.1089/omi.2014.0136
url	https://www.scopus.com/inward/record.uri?eid=2-s2.0-84930578698&doi=10.1089%2fomi.2014.0136&partnerID=40&md5=ed5e9921a6b98f34903609f098d3df16
description	Y-chromosome short tandem repeats (Y-STRs) are genetic markers with practical applications in human identification. However, where mass identification is required (e.g., in the aftermath of disasters with significant fatalities), the efficiency of the process could be improved with new statistical approaches. Clustering applications are relatively new tools for large-scale comparative genotyping, and the k-Approximate Modal Haplotype (k-AMH), an efficient algorithm for clustering large-scale Y-STR data, represents a promising method for developing these tools. In this study we improved the k-AMH and produced three new algorithms: the Nk-AMH I (including a new initial cluster center selection), the Nk-AMH II (including a new dominant weighting value), and the Nk-AMH III (combining I and II). The Nk-AMH III was the superior algorithm, with mean clustering accuracy that increased in four out of six datasets and remained at 100% in the other two. Additionally, the Nk-AMH III achieved a 2% higher overall mean clustering accuracy score than the k-AMH, as well as optimal accuracy for all datasets (0.84-1.00). With inclusion of the two new methods, the Nk-AMH III produced an optimal solution for clustering Y-STR data; thus, the algorithm has potential for further development towards fully automatic clustering of any large-scale genotypic data. © Copyright 2015, Mary Ann Liebert, Inc. 2015.
publisher	Mary Ann Liebert Inc.
issn	15362310
language	English
format	Article
accesstype	All Open Access; Green Open Access
record_format	scopus
collection	Scopus
_version_	1809677608070152192

Towards development of clustering applications for large-scale comparative genotyping and kinship analysis using Y-short tandem repeats

Similar Items