Loading…

YHP: Y-chromosome Haplogroup Predictor for predicting male lineages based on Y-STRs

Human Y chromosome reflects the evolutionary process of males. Male lineage tracing by Y chromosome is of great use in evolutionary, forensic, and anthropological studies. Identifying the male lineage based on the specific distribution of Y haplogroups narrows down the investigation scope, which has...

Full description

Saved in:
Bibliographic Details
Published in:Forensic science international 2024-08, Vol.361, p.112113, Article 112113
Main Authors: Song, Mengyuan, Zhou, Yuxiang, Zhao, Chenxi, Song, Feng, Hou, Yiping
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Human Y chromosome reflects the evolutionary process of males. Male lineage tracing by Y chromosome is of great use in evolutionary, forensic, and anthropological studies. Identifying the male lineage based on the specific distribution of Y haplogroups narrows down the investigation scope, which has been used in forensic scenarios. However, existing software aids in familial searching using Y-STRs (Y-chromosome short tandem repeats) to predict Y-SNP (Y-chromosome single nucleotide polymorphism) haplogroups, they often lack resolution. In this study, we developed YHP (Y Haplogroup Predictor), a novel software offering high-resolution haplogroup inference without requiring extensive Y-SNP sequencing. Leveraging existing datasets (219 haplogroups, 4064 samples in total), YHP predicts haplogroups with 0.923 accuracy under the highest haplogroup resolution, employing a random forest algorithm. YHP, available on Github (https://github.com/cissy123/YHP-Y-Haplogroup-Predictor-), facilitates high-resolution haplogroup prediction, haplotype mismatch analysis, and haplotype similarity comparison. Notably, it demonstrates efficacy in East Asian populations, benefiting from training data from eight distinct East Asian ethnic populations. Moreover, it enables seamless integration of additional training sets, extending its utility to diverse populations. •A prediction accuracy of 0.923 in the highest haplogroup resolution was achieved.•The significance of the 27 utilized Y-STRs was systematically ranked.•The "Match&Count" and "Similarity" functions provide comprehensive informations.•Predictions can be made for all populations, provided that training data exists.
ISSN:0379-0738
1872-6283
1872-6283
DOI:10.1016/j.forsciint.2024.112113