Loading…

The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process

With the availability of next-generation sequencing (NGS) technology, it is expected that sequence variants may be called on a genomic scale. Here, we demonstrate that a deeper understanding of the distribution of the variant call frequencies at heterozygous loci in NGS data sets is a prerequisite f...

Full description

Saved in:
Bibliographic Details
Published in:Nucleic acids research 2012-03, Vol.40 (6), p.2426-2431
Main Authors: Heinrich, Verena, Stange, Jens, Dickhaus, Thorsten, Imkeller, Peter, Krüger, Ulrike, Bauer, Sebastian, Mundlos, Stefan, Robinson, Peter N, Hecht, Jochen, Krawitz, Peter M
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c479t-be8ef78b390b154289bcac646c2688829761874f4d0ddae8879c919cdceecf943
cites cdi_FETCH-LOGICAL-c479t-be8ef78b390b154289bcac646c2688829761874f4d0ddae8879c919cdceecf943
container_end_page 2431
container_issue 6
container_start_page 2426
container_title Nucleic acids research
container_volume 40
creator Heinrich, Verena
Stange, Jens
Dickhaus, Thorsten
Imkeller, Peter
Krüger, Ulrike
Bauer, Sebastian
Mundlos, Stefan
Robinson, Peter N
Hecht, Jochen
Krawitz, Peter M
description With the availability of next-generation sequencing (NGS) technology, it is expected that sequence variants may be called on a genomic scale. Here, we demonstrate that a deeper understanding of the distribution of the variant call frequencies at heterozygous loci in NGS data sets is a prerequisite for sensitive variant detection. We model the crucial steps in an NGS protocol as a stochastic branching process and derive a mathematical framework for the expected distribution of alleles at heterozygous loci before measurement that is sequencing. We confirm our theoretical results by analyzing technical replicates of human exome data and demonstrate that the variance of allele frequencies at heterozygous loci is higher than expected by a simple binomial distribution. Due to this high variance, mutation callers relying on binomial distributed priors are less sensitive for heterozygous variants that deviate strongly from the expected mean frequency. Our results also indicate that error rates can be reduced to a greater degree by technical replicates than by increasing sequencing depth.
doi_str_mv 10.1093/nar/gkr1073
format article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3315291</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>963488458</sourcerecordid><originalsourceid>FETCH-LOGICAL-c479t-be8ef78b390b154289bcac646c2688829761874f4d0ddae8879c919cdceecf943</originalsourceid><addsrcrecordid>eNp9kT1vFDEQhi0EIpdARY_cgRQt8dd57QYJRXxJkdKE2vJ6Z-8MPu_h8SLyC_jbOMkRkSaVNTOPHnnmJeQVZ-84s_Is-3K2-VE46-UTsuJSi05ZLZ6SFZNs3XGmzBE5RvzOGFd8rZ6TIyG46I0WK_LnagvUpwQJ6BixljgsNc6Zxkwz_K7dBjIUf9tC-LlADjFv6Oirb3VFGpH6EJaGQLqmI2BoChipR1qbugAuqdJ5oo2vc9h6rDHQofgctjemfZkDIL4gzyafEF4e3hPy7dPHq_Mv3cXl56_nHy66oHpbuwEMTL0ZpGVDW0UYOwQftNJBaGOMsL3mpleTGtk4ejCmt8FyG8YAECar5Al5f-fdL8MOWjvX4pPbl7jz5drNPrqHkxy3bjP_clLytbC8Cd4cBGVu58DqdhEDpOQzzAs6q6UyRq1NI98-SnLGjJHGMN3Q0zs0lBmxwHT_Ic7cTciuhewOITf69f873LP_UpV_AYIGp_Q</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1008838806</pqid></control><display><type>article</type><title>The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process</title><source>PubMed Central Free</source><source>Open Access: Oxford University Press Open Journals</source><creator>Heinrich, Verena ; Stange, Jens ; Dickhaus, Thorsten ; Imkeller, Peter ; Krüger, Ulrike ; Bauer, Sebastian ; Mundlos, Stefan ; Robinson, Peter N ; Hecht, Jochen ; Krawitz, Peter M</creator><creatorcontrib>Heinrich, Verena ; Stange, Jens ; Dickhaus, Thorsten ; Imkeller, Peter ; Krüger, Ulrike ; Bauer, Sebastian ; Mundlos, Stefan ; Robinson, Peter N ; Hecht, Jochen ; Krawitz, Peter M</creatorcontrib><description>With the availability of next-generation sequencing (NGS) technology, it is expected that sequence variants may be called on a genomic scale. Here, we demonstrate that a deeper understanding of the distribution of the variant call frequencies at heterozygous loci in NGS data sets is a prerequisite for sensitive variant detection. We model the crucial steps in an NGS protocol as a stochastic branching process and derive a mathematical framework for the expected distribution of alleles at heterozygous loci before measurement that is sequencing. We confirm our theoretical results by analyzing technical replicates of human exome data and demonstrate that the variance of allele frequencies at heterozygous loci is higher than expected by a simple binomial distribution. Due to this high variance, mutation callers relying on binomial distributed priors are less sensitive for heterozygous variants that deviate strongly from the expected mean frequency. Our results also indicate that error rates can be reduced to a greater degree by technical replicates than by increasing sequencing depth.</description><identifier>ISSN: 0305-1048</identifier><identifier>EISSN: 1362-4962</identifier><identifier>DOI: 10.1093/nar/gkr1073</identifier><identifier>PMID: 22127862</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Alleles ; Computational Biology ; Data processing ; Exome ; Gene Frequency ; genomics ; Heterozygote ; High-Throughput Nucleotide Sequencing ; Humans ; Mathematical models ; Mutation ; Sequence Analysis, DNA ; Stochastic Processes ; Stochasticity</subject><ispartof>Nucleic acids research, 2012-03, Vol.40 (6), p.2426-2431</ispartof><rights>The Author(s) 2011. Published by Oxford University Press. 2011</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c479t-be8ef78b390b154289bcac646c2688829761874f4d0ddae8879c919cdceecf943</citedby><cites>FETCH-LOGICAL-c479t-be8ef78b390b154289bcac646c2688829761874f4d0ddae8879c919cdceecf943</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3315291/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3315291/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/22127862$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Heinrich, Verena</creatorcontrib><creatorcontrib>Stange, Jens</creatorcontrib><creatorcontrib>Dickhaus, Thorsten</creatorcontrib><creatorcontrib>Imkeller, Peter</creatorcontrib><creatorcontrib>Krüger, Ulrike</creatorcontrib><creatorcontrib>Bauer, Sebastian</creatorcontrib><creatorcontrib>Mundlos, Stefan</creatorcontrib><creatorcontrib>Robinson, Peter N</creatorcontrib><creatorcontrib>Hecht, Jochen</creatorcontrib><creatorcontrib>Krawitz, Peter M</creatorcontrib><title>The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process</title><title>Nucleic acids research</title><addtitle>Nucleic Acids Res</addtitle><description>With the availability of next-generation sequencing (NGS) technology, it is expected that sequence variants may be called on a genomic scale. Here, we demonstrate that a deeper understanding of the distribution of the variant call frequencies at heterozygous loci in NGS data sets is a prerequisite for sensitive variant detection. We model the crucial steps in an NGS protocol as a stochastic branching process and derive a mathematical framework for the expected distribution of alleles at heterozygous loci before measurement that is sequencing. We confirm our theoretical results by analyzing technical replicates of human exome data and demonstrate that the variance of allele frequencies at heterozygous loci is higher than expected by a simple binomial distribution. Due to this high variance, mutation callers relying on binomial distributed priors are less sensitive for heterozygous variants that deviate strongly from the expected mean frequency. Our results also indicate that error rates can be reduced to a greater degree by technical replicates than by increasing sequencing depth.</description><subject>Alleles</subject><subject>Computational Biology</subject><subject>Data processing</subject><subject>Exome</subject><subject>Gene Frequency</subject><subject>genomics</subject><subject>Heterozygote</subject><subject>High-Throughput Nucleotide Sequencing</subject><subject>Humans</subject><subject>Mathematical models</subject><subject>Mutation</subject><subject>Sequence Analysis, DNA</subject><subject>Stochastic Processes</subject><subject>Stochasticity</subject><issn>0305-1048</issn><issn>1362-4962</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2012</creationdate><recordtype>article</recordtype><recordid>eNp9kT1vFDEQhi0EIpdARY_cgRQt8dd57QYJRXxJkdKE2vJ6Z-8MPu_h8SLyC_jbOMkRkSaVNTOPHnnmJeQVZ-84s_Is-3K2-VE46-UTsuJSi05ZLZ6SFZNs3XGmzBE5RvzOGFd8rZ6TIyG46I0WK_LnagvUpwQJ6BixljgsNc6Zxkwz_K7dBjIUf9tC-LlADjFv6Oirb3VFGpH6EJaGQLqmI2BoChipR1qbugAuqdJ5oo2vc9h6rDHQofgctjemfZkDIL4gzyafEF4e3hPy7dPHq_Mv3cXl56_nHy66oHpbuwEMTL0ZpGVDW0UYOwQftNJBaGOMsL3mpleTGtk4ejCmt8FyG8YAECar5Al5f-fdL8MOWjvX4pPbl7jz5drNPrqHkxy3bjP_clLytbC8Cd4cBGVu58DqdhEDpOQzzAs6q6UyRq1NI98-SnLGjJHGMN3Q0zs0lBmxwHT_Ic7cTciuhewOITf69f873LP_UpV_AYIGp_Q</recordid><startdate>20120301</startdate><enddate>20120301</enddate><creator>Heinrich, Verena</creator><creator>Stange, Jens</creator><creator>Dickhaus, Thorsten</creator><creator>Imkeller, Peter</creator><creator>Krüger, Ulrike</creator><creator>Bauer, Sebastian</creator><creator>Mundlos, Stefan</creator><creator>Robinson, Peter N</creator><creator>Hecht, Jochen</creator><creator>Krawitz, Peter M</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope></search><sort><creationdate>20120301</creationdate><title>The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process</title><author>Heinrich, Verena ; Stange, Jens ; Dickhaus, Thorsten ; Imkeller, Peter ; Krüger, Ulrike ; Bauer, Sebastian ; Mundlos, Stefan ; Robinson, Peter N ; Hecht, Jochen ; Krawitz, Peter M</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c479t-be8ef78b390b154289bcac646c2688829761874f4d0ddae8879c919cdceecf943</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2012</creationdate><topic>Alleles</topic><topic>Computational Biology</topic><topic>Data processing</topic><topic>Exome</topic><topic>Gene Frequency</topic><topic>genomics</topic><topic>Heterozygote</topic><topic>High-Throughput Nucleotide Sequencing</topic><topic>Humans</topic><topic>Mathematical models</topic><topic>Mutation</topic><topic>Sequence Analysis, DNA</topic><topic>Stochastic Processes</topic><topic>Stochasticity</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Heinrich, Verena</creatorcontrib><creatorcontrib>Stange, Jens</creatorcontrib><creatorcontrib>Dickhaus, Thorsten</creatorcontrib><creatorcontrib>Imkeller, Peter</creatorcontrib><creatorcontrib>Krüger, Ulrike</creatorcontrib><creatorcontrib>Bauer, Sebastian</creatorcontrib><creatorcontrib>Mundlos, Stefan</creatorcontrib><creatorcontrib>Robinson, Peter N</creatorcontrib><creatorcontrib>Hecht, Jochen</creatorcontrib><creatorcontrib>Krawitz, Peter M</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nucleic acids research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Heinrich, Verena</au><au>Stange, Jens</au><au>Dickhaus, Thorsten</au><au>Imkeller, Peter</au><au>Krüger, Ulrike</au><au>Bauer, Sebastian</au><au>Mundlos, Stefan</au><au>Robinson, Peter N</au><au>Hecht, Jochen</au><au>Krawitz, Peter M</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process</atitle><jtitle>Nucleic acids research</jtitle><addtitle>Nucleic Acids Res</addtitle><date>2012-03-01</date><risdate>2012</risdate><volume>40</volume><issue>6</issue><spage>2426</spage><epage>2431</epage><pages>2426-2431</pages><issn>0305-1048</issn><eissn>1362-4962</eissn><abstract>With the availability of next-generation sequencing (NGS) technology, it is expected that sequence variants may be called on a genomic scale. Here, we demonstrate that a deeper understanding of the distribution of the variant call frequencies at heterozygous loci in NGS data sets is a prerequisite for sensitive variant detection. We model the crucial steps in an NGS protocol as a stochastic branching process and derive a mathematical framework for the expected distribution of alleles at heterozygous loci before measurement that is sequencing. We confirm our theoretical results by analyzing technical replicates of human exome data and demonstrate that the variance of allele frequencies at heterozygous loci is higher than expected by a simple binomial distribution. Due to this high variance, mutation callers relying on binomial distributed priors are less sensitive for heterozygous variants that deviate strongly from the expected mean frequency. Our results also indicate that error rates can be reduced to a greater degree by technical replicates than by increasing sequencing depth.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>22127862</pmid><doi>10.1093/nar/gkr1073</doi><tpages>6</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0305-1048
ispartof Nucleic acids research, 2012-03, Vol.40 (6), p.2426-2431
issn 0305-1048
1362-4962
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3315291
source PubMed Central Free; Open Access: Oxford University Press Open Journals
subjects Alleles
Computational Biology
Data processing
Exome
Gene Frequency
genomics
Heterozygote
High-Throughput Nucleotide Sequencing
Humans
Mathematical models
Mutation
Sequence Analysis, DNA
Stochastic Processes
Stochasticity
title The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T21%3A34%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20allele%20distribution%20in%20next-generation%20sequencing%20data%20sets%20is%20accurately%20described%20as%20the%20result%20of%20a%20stochastic%20branching%20process&rft.jtitle=Nucleic%20acids%20research&rft.au=Heinrich,%20Verena&rft.date=2012-03-01&rft.volume=40&rft.issue=6&rft.spage=2426&rft.epage=2431&rft.pages=2426-2431&rft.issn=0305-1048&rft.eissn=1362-4962&rft_id=info:doi/10.1093/nar/gkr1073&rft_dat=%3Cproquest_pubme%3E963488458%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c479t-be8ef78b390b154289bcac646c2688829761874f4d0ddae8879c919cdceecf943%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1008838806&rft_id=info:pmid/22127862&rfr_iscdi=true