Loading…
FastNGSadmix: Admixture Proportions and Principal Component Analysis of a Single Low-Depth Sequencing Sample
We present fastNGSadmix, a method for fast and easy estimation of admixture proportions and principal component analysis (PCA), for a single low-depth next generation sequencing (NGS) sample, using a panel of reference populations, with population specific allele frequencies. We show that fastNGSadm...
Saved in:
Published in: | Human heredity 2018-06, Vol.83 (1), p.12 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | We present fastNGSadmix, a method for fast and easy estimation of admixture proportions and principal component analysis (PCA), for a single low-depth next generation sequencing (NGS) sample, using a panel of reference populations, with population specific allele frequencies. We show that fastNGSadmix has increased accuracy compared to established methods for estimating admixture using reference panels, such as iAdmix and ADMIXTURE. fastNGSadmix corrects for the bias, introduced by having a limited size reference panel, which is a substantial problem with other methods. fastNGSadmix works for samples with very low sequencing depth, we show through down-sampling that fastNGSadmix works for samples with depth of less than 0.005X. This is because the method uses genotype likelihoods, thereby incorporating the uncertainty of the genotypes in the model. fastNGSadmix works by maximising the likelihood of the sequencing data given the admixture proportions and the population specific allele frequencies, using the EM algorithm. A bootstrapping approach has been implemented for the admixture estimation, meaning the uncertainty on the admixture estimates can easily be obtained. We use the estimated admixture proportions to perform PCA incorporating both population structure and genotype uncertainty. Existing PCA methods based on NGS data do not model population structure. This method models population structure via the estimated admixture proportions from fastNGSadmix. This method has been applied to ancient DNA samples. Ancient DNA is usually characterised by low quality data, why it is crucial to take genotype uncertainty into account. In addition, the samples are modelled independently, which allows for analysing related samples. |
---|---|
ISSN: | 0001-5652 1423-0062 |