Loading…

Inferring the ancestry of parents and grandparents from genetic data

Inference of admixture proportions is a classical statistical problem in population genetics. Standard methods implicitly assume that both parents of an individual have the same admixture fraction. However, this is rarely the case in real data. In this paper we show that the distribution of admixtur...

Full description

Saved in:
Bibliographic Details
Published in:PLoS computational biology 2020-08, Vol.16 (8), p.e1008065-e1008065
Main Authors: Pei, Jingwen, Zhang, Yiming, Nielsen, Rasmus, Wu, Yufeng
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c610t-584291814ef8042ab416c7e9d7f68da6fb4a26151985e59a27aa8dcf68fde0ea3
cites cdi_FETCH-LOGICAL-c610t-584291814ef8042ab416c7e9d7f68da6fb4a26151985e59a27aa8dcf68fde0ea3
container_end_page e1008065
container_issue 8
container_start_page e1008065
container_title PLoS computational biology
container_volume 16
creator Pei, Jingwen
Zhang, Yiming
Nielsen, Rasmus
Wu, Yufeng
description Inference of admixture proportions is a classical statistical problem in population genetics. Standard methods implicitly assume that both parents of an individual have the same admixture fraction. However, this is rarely the case in real data. In this paper we show that the distribution of admixture tract lengths in a genome contains information about the admixture proportions of the ancestors of an individual. We develop a Hidden Markov Model (HMM) framework for estimating the admixture proportions of the immediate ancestors of an individual, i.e. a type of decomposition of an individual's admixture proportions into further subsets of ancestral proportions in the ancestors. Based on a genealogical model for admixture tracts, we develop an efficient algorithm for computing the sampling probability of the genome from a single individual, as a function of the admixture proportions of the ancestors of this individual. This allows us to perform probabilistic inference of admixture proportions of ancestors only using the genome of an extant individual. We perform extensive simulations to quantify the error in the estimation of ancestral admixture proportions under various conditions. To illustrate the utility of the method, we apply it to real genetic data.
doi_str_mv 10.1371/journal.pcbi.1008065
format article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_2443611455</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A634243235</galeid><doaj_id>oai_doaj_org_article_764401046d504658b19e717cb11d9c4d</doaj_id><sourcerecordid>A634243235</sourcerecordid><originalsourceid>FETCH-LOGICAL-c610t-584291814ef8042ab416c7e9d7f68da6fb4a26151985e59a27aa8dcf68fde0ea3</originalsourceid><addsrcrecordid>eNqVUtuKFDEQbURx19E_EGzwZX2YMencul-EZb0NLApenkN1UunN0N0Zk25x_96M0yuO-CKBSlF16lSdooriKSUbyhR9uQtzHKHf7E3rN5SQmkhxrzinQrC1YqK-_4d_VjxKaUdIdhv5sDhjlWoUYeq8eL0dHcbox66cbrCE0WCa4m0ZXLmHiOOUcsyWXcz2LuBiGMoOR5y8KS1M8Lh44KBP-GT5V8XXt2--XL1fX398t726vF4bScm0FjWvGlpTjq4mvIKWU2kUNlY5WVuQruVQSSpoUwsUDVQKoLYmJ51FgsBWxbMj774PSS8LSLrinElKeVa7KrZHhA2w0_voB4i3OoDXvwIhdhpiHrtHrSTnhBIurchG1C1tUFFlWkptY7jNXK-WbnM7oDVZe4T-hPQ0M_ob3YXvWnHeCEIzwcVCEMO3Oe9VDz4Z7HsYMcyHuRnnSqqqydDnf0H_rW5zRHWQBfjRhdzX5Gdx8CaM6HyOX0rGM3XFDgUvTgoyZsIfUwdzSnr7-dN_YD-cYvkRa2JIKaL7vRVK9OE678bXh-vUy3Wyn2Ya1cQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2443611455</pqid></control><display><type>article</type><title>Inferring the ancestry of parents and grandparents from genetic data</title><source>PubMed Central (Open Access)</source><source>Publicly Available Content Database</source><creator>Pei, Jingwen ; Zhang, Yiming ; Nielsen, Rasmus ; Wu, Yufeng</creator><contributor>Kosakovsky Pond, Sergei L.</contributor><creatorcontrib>Pei, Jingwen ; Zhang, Yiming ; Nielsen, Rasmus ; Wu, Yufeng ; Kosakovsky Pond, Sergei L.</creatorcontrib><description>Inference of admixture proportions is a classical statistical problem in population genetics. Standard methods implicitly assume that both parents of an individual have the same admixture fraction. However, this is rarely the case in real data. In this paper we show that the distribution of admixture tract lengths in a genome contains information about the admixture proportions of the ancestors of an individual. We develop a Hidden Markov Model (HMM) framework for estimating the admixture proportions of the immediate ancestors of an individual, i.e. a type of decomposition of an individual's admixture proportions into further subsets of ancestral proportions in the ancestors. Based on a genealogical model for admixture tracts, we develop an efficient algorithm for computing the sampling probability of the genome from a single individual, as a function of the admixture proportions of the ancestors of this individual. This allows us to perform probabilistic inference of admixture proportions of ancestors only using the genome of an extant individual. We perform extensive simulations to quantify the error in the estimation of ancestral admixture proportions under various conditions. To illustrate the utility of the method, we apply it to real genetic data.</description><identifier>ISSN: 1553-7358</identifier><identifier>ISSN: 1553-734X</identifier><identifier>EISSN: 1553-7358</identifier><identifier>DOI: 10.1371/journal.pcbi.1008065</identifier><identifier>PMID: 32797037</identifier><language>eng</language><publisher>San Francisco: Public Library of Science</publisher><subject>Algorithms ; Biology and Life Sciences ; Computer and Information Sciences ; Computer science ; Computer simulation ; Computers ; Engineering and Technology ; Families &amp; family life ; Family relations ; Genealogy ; Genetic aspects ; Genetics ; Genomes ; Genotype &amp; phenotype ; Grandparents ; Haplotypes ; Markov analysis ; Markov chains ; Markov processes ; Methods ; Parents &amp; parenting ; Physical sciences ; Population ; Population genetics ; Probabilistic inference ; Research and Analysis Methods ; Software ; Statistical analysis ; Statistical inference</subject><ispartof>PLoS computational biology, 2020-08, Vol.16 (8), p.e1008065-e1008065</ispartof><rights>COPYRIGHT 2020 Public Library of Science</rights><rights>2020 Pei et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2020 Pei et al 2020 Pei et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c610t-584291814ef8042ab416c7e9d7f68da6fb4a26151985e59a27aa8dcf68fde0ea3</citedby><cites>FETCH-LOGICAL-c610t-584291814ef8042ab416c7e9d7f68da6fb4a26151985e59a27aa8dcf68fde0ea3</cites><orcidid>0000-0001-9735-0418 ; 0000-0003-4988-3521 ; 0000-0002-7657-0144 ; 0000-0003-0513-6591</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/2443611455/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2443611455?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,25731,27901,27902,36989,36990,44566,53766,53768,74869</link.rule.ids></links><search><contributor>Kosakovsky Pond, Sergei L.</contributor><creatorcontrib>Pei, Jingwen</creatorcontrib><creatorcontrib>Zhang, Yiming</creatorcontrib><creatorcontrib>Nielsen, Rasmus</creatorcontrib><creatorcontrib>Wu, Yufeng</creatorcontrib><title>Inferring the ancestry of parents and grandparents from genetic data</title><title>PLoS computational biology</title><description>Inference of admixture proportions is a classical statistical problem in population genetics. Standard methods implicitly assume that both parents of an individual have the same admixture fraction. However, this is rarely the case in real data. In this paper we show that the distribution of admixture tract lengths in a genome contains information about the admixture proportions of the ancestors of an individual. We develop a Hidden Markov Model (HMM) framework for estimating the admixture proportions of the immediate ancestors of an individual, i.e. a type of decomposition of an individual's admixture proportions into further subsets of ancestral proportions in the ancestors. Based on a genealogical model for admixture tracts, we develop an efficient algorithm for computing the sampling probability of the genome from a single individual, as a function of the admixture proportions of the ancestors of this individual. This allows us to perform probabilistic inference of admixture proportions of ancestors only using the genome of an extant individual. We perform extensive simulations to quantify the error in the estimation of ancestral admixture proportions under various conditions. To illustrate the utility of the method, we apply it to real genetic data.</description><subject>Algorithms</subject><subject>Biology and Life Sciences</subject><subject>Computer and Information Sciences</subject><subject>Computer science</subject><subject>Computer simulation</subject><subject>Computers</subject><subject>Engineering and Technology</subject><subject>Families &amp; family life</subject><subject>Family relations</subject><subject>Genealogy</subject><subject>Genetic aspects</subject><subject>Genetics</subject><subject>Genomes</subject><subject>Genotype &amp; phenotype</subject><subject>Grandparents</subject><subject>Haplotypes</subject><subject>Markov analysis</subject><subject>Markov chains</subject><subject>Markov processes</subject><subject>Methods</subject><subject>Parents &amp; parenting</subject><subject>Physical sciences</subject><subject>Population</subject><subject>Population genetics</subject><subject>Probabilistic inference</subject><subject>Research and Analysis Methods</subject><subject>Software</subject><subject>Statistical analysis</subject><subject>Statistical inference</subject><issn>1553-7358</issn><issn>1553-734X</issn><issn>1553-7358</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNqVUtuKFDEQbURx19E_EGzwZX2YMencul-EZb0NLApenkN1UunN0N0Zk25x_96M0yuO-CKBSlF16lSdooriKSUbyhR9uQtzHKHf7E3rN5SQmkhxrzinQrC1YqK-_4d_VjxKaUdIdhv5sDhjlWoUYeq8eL0dHcbox66cbrCE0WCa4m0ZXLmHiOOUcsyWXcz2LuBiGMoOR5y8KS1M8Lh44KBP-GT5V8XXt2--XL1fX398t726vF4bScm0FjWvGlpTjq4mvIKWU2kUNlY5WVuQruVQSSpoUwsUDVQKoLYmJ51FgsBWxbMj774PSS8LSLrinElKeVa7KrZHhA2w0_voB4i3OoDXvwIhdhpiHrtHrSTnhBIurchG1C1tUFFlWkptY7jNXK-WbnM7oDVZe4T-hPQ0M_ob3YXvWnHeCEIzwcVCEMO3Oe9VDz4Z7HsYMcyHuRnnSqqqydDnf0H_rW5zRHWQBfjRhdzX5Gdx8CaM6HyOX0rGM3XFDgUvTgoyZsIfUwdzSnr7-dN_YD-cYvkRa2JIKaL7vRVK9OE678bXh-vUy3Wyn2Ya1cQ</recordid><startdate>20200801</startdate><enddate>20200801</enddate><creator>Pei, Jingwen</creator><creator>Zhang, Yiming</creator><creator>Nielsen, Rasmus</creator><creator>Wu, Yufeng</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ISN</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7QP</scope><scope>7TK</scope><scope>7TM</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>LK8</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-9735-0418</orcidid><orcidid>https://orcid.org/0000-0003-4988-3521</orcidid><orcidid>https://orcid.org/0000-0002-7657-0144</orcidid><orcidid>https://orcid.org/0000-0003-0513-6591</orcidid></search><sort><creationdate>20200801</creationdate><title>Inferring the ancestry of parents and grandparents from genetic data</title><author>Pei, Jingwen ; Zhang, Yiming ; Nielsen, Rasmus ; Wu, Yufeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c610t-584291814ef8042ab416c7e9d7f68da6fb4a26151985e59a27aa8dcf68fde0ea3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Biology and Life Sciences</topic><topic>Computer and Information Sciences</topic><topic>Computer science</topic><topic>Computer simulation</topic><topic>Computers</topic><topic>Engineering and Technology</topic><topic>Families &amp; family life</topic><topic>Family relations</topic><topic>Genealogy</topic><topic>Genetic aspects</topic><topic>Genetics</topic><topic>Genomes</topic><topic>Genotype &amp; phenotype</topic><topic>Grandparents</topic><topic>Haplotypes</topic><topic>Markov analysis</topic><topic>Markov chains</topic><topic>Markov processes</topic><topic>Methods</topic><topic>Parents &amp; parenting</topic><topic>Physical sciences</topic><topic>Population</topic><topic>Population genetics</topic><topic>Probabilistic inference</topic><topic>Research and Analysis Methods</topic><topic>Software</topic><topic>Statistical analysis</topic><topic>Statistical inference</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pei, Jingwen</creatorcontrib><creatorcontrib>Zhang, Yiming</creatorcontrib><creatorcontrib>Nielsen, Rasmus</creatorcontrib><creatorcontrib>Wu, Yufeng</creatorcontrib><collection>CrossRef</collection><collection>Gale In Context: Canada</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Calcium &amp; Calcified Tissue Abstracts</collection><collection>Neurosciences Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>ProQuest Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Biological Sciences</collection><collection>Computing Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>PML(ProQuest Medical Library)</collection><collection>Biological Science Database</collection><collection>ProQuest advanced technologies &amp; aerospace journals</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Directory of Open Access Journals</collection><jtitle>PLoS computational biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pei, Jingwen</au><au>Zhang, Yiming</au><au>Nielsen, Rasmus</au><au>Wu, Yufeng</au><au>Kosakovsky Pond, Sergei L.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Inferring the ancestry of parents and grandparents from genetic data</atitle><jtitle>PLoS computational biology</jtitle><date>2020-08-01</date><risdate>2020</risdate><volume>16</volume><issue>8</issue><spage>e1008065</spage><epage>e1008065</epage><pages>e1008065-e1008065</pages><issn>1553-7358</issn><issn>1553-734X</issn><eissn>1553-7358</eissn><abstract>Inference of admixture proportions is a classical statistical problem in population genetics. Standard methods implicitly assume that both parents of an individual have the same admixture fraction. However, this is rarely the case in real data. In this paper we show that the distribution of admixture tract lengths in a genome contains information about the admixture proportions of the ancestors of an individual. We develop a Hidden Markov Model (HMM) framework for estimating the admixture proportions of the immediate ancestors of an individual, i.e. a type of decomposition of an individual's admixture proportions into further subsets of ancestral proportions in the ancestors. Based on a genealogical model for admixture tracts, we develop an efficient algorithm for computing the sampling probability of the genome from a single individual, as a function of the admixture proportions of the ancestors of this individual. This allows us to perform probabilistic inference of admixture proportions of ancestors only using the genome of an extant individual. We perform extensive simulations to quantify the error in the estimation of ancestral admixture proportions under various conditions. To illustrate the utility of the method, we apply it to real genetic data.</abstract><cop>San Francisco</cop><pub>Public Library of Science</pub><pmid>32797037</pmid><doi>10.1371/journal.pcbi.1008065</doi><orcidid>https://orcid.org/0000-0001-9735-0418</orcidid><orcidid>https://orcid.org/0000-0003-4988-3521</orcidid><orcidid>https://orcid.org/0000-0002-7657-0144</orcidid><orcidid>https://orcid.org/0000-0003-0513-6591</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1553-7358
ispartof PLoS computational biology, 2020-08, Vol.16 (8), p.e1008065-e1008065
issn 1553-7358
1553-734X
1553-7358
language eng
recordid cdi_plos_journals_2443611455
source PubMed Central (Open Access); Publicly Available Content Database
subjects Algorithms
Biology and Life Sciences
Computer and Information Sciences
Computer science
Computer simulation
Computers
Engineering and Technology
Families & family life
Family relations
Genealogy
Genetic aspects
Genetics
Genomes
Genotype & phenotype
Grandparents
Haplotypes
Markov analysis
Markov chains
Markov processes
Methods
Parents & parenting
Physical sciences
Population
Population genetics
Probabilistic inference
Research and Analysis Methods
Software
Statistical analysis
Statistical inference
title Inferring the ancestry of parents and grandparents from genetic data
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T07%3A18%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Inferring%20the%20ancestry%20of%20parents%20and%20grandparents%20from%20genetic%20data&rft.jtitle=PLoS%20computational%20biology&rft.au=Pei,%20Jingwen&rft.date=2020-08-01&rft.volume=16&rft.issue=8&rft.spage=e1008065&rft.epage=e1008065&rft.pages=e1008065-e1008065&rft.issn=1553-7358&rft.eissn=1553-7358&rft_id=info:doi/10.1371/journal.pcbi.1008065&rft_dat=%3Cgale_plos_%3EA634243235%3C/gale_plos_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c610t-584291814ef8042ab416c7e9d7f68da6fb4a26151985e59a27aa8dcf68fde0ea3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2443611455&rft_id=info:pmid/32797037&rft_galeid=A634243235&rfr_iscdi=true