Loading…

Comprehensive analysis to improve the validation rate for single nucleotide variants detected by next-generation sequencing

Next-generation sequencing (NGS) has enabled the high-throughput discovery of germline and somatic mutations. However, NGS-based variant detection is still prone to errors, resulting in inaccurate variant calls. Here, we categorized the variants detected by NGS according to total read depth (TD) and...

Full description

Saved in:
Bibliographic Details
Published in:PloS one 2014-01, Vol.9 (1), p.e86664-e86664
Main Authors: Park, Mi-Hyun, Rhee, Hwanseok, Park, Jung Hoon, Woo, Hae-Mi, Choi, Byung-Ok, Kim, Bo-Young, Chung, Ki Wha, Cho, Yoo-Bok, Kim, Hyung Jin, Jung, Ji-Won, Koo, Soo Kyung
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c692t-eaccc3b3ec6a3346f07a88353ac32735e440ec2970aab2d33a60d736ff886aac3
cites cdi_FETCH-LOGICAL-c692t-eaccc3b3ec6a3346f07a88353ac32735e440ec2970aab2d33a60d736ff886aac3
container_end_page e86664
container_issue 1
container_start_page e86664
container_title PloS one
container_volume 9
creator Park, Mi-Hyun
Rhee, Hwanseok
Park, Jung Hoon
Woo, Hae-Mi
Choi, Byung-Ok
Kim, Bo-Young
Chung, Ki Wha
Cho, Yoo-Bok
Kim, Hyung Jin
Jung, Ji-Won
Koo, Soo Kyung
description Next-generation sequencing (NGS) has enabled the high-throughput discovery of germline and somatic mutations. However, NGS-based variant detection is still prone to errors, resulting in inaccurate variant calls. Here, we categorized the variants detected by NGS according to total read depth (TD) and SNP quality (SNPQ), and performed Sanger sequencing with 348 selected non-synonymous single nucleotide variants (SNVs) for validation. Using the SAMtools and GATK algorithms, the validation rate was positively correlated with SNPQ but showed no correlation with TD. In addition, common variants called by both programs had a higher validation rate than caller-specific variants. We further examined several parameters to improve the validation rate, and found that strand bias (SB) was a key parameter. SB in NGS data showed a strong difference between the variants passing validation and those that failed validation, showing a validation rate of more than 92% (filtering cutoff value: alternate allele forward [AF] ≥ 20 and AF
doi_str_mv 10.1371/journal.pone.0086664
format article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_1492542244</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A478835621</galeid><doaj_id>oai_doaj_org_article_ac41c115f415446b8b7bf29325006a43</doaj_id><sourcerecordid>A478835621</sourcerecordid><originalsourceid>FETCH-LOGICAL-c692t-eaccc3b3ec6a3346f07a88353ac32735e440ec2970aab2d33a60d736ff886aac3</originalsourceid><addsrcrecordid>eNqNk02L2zAQhk1p6W63_QelNRRKe0gqWbJsXwpL6EdgYaFfVzGWx4mCI2UlOTT0z1feeJe47KH4IDN65h3NK02SvKRkTllBP2xs7wx08501OCekFELwR8k5rVg2Exlhj0_-z5Jn3m8IyVnEniZnGedlVQh2nvxZ2O3O4RqN13tMISoevPZpsKmOGzbGwhrTPXS6gaCtSR0ETFvrUq_NqsPU9KpDG3QzUE6DCT5tMKAK2KT1ITX4O8xWaNAd8z3e9GhUTH6ePGmh8_hiXC-Sn58__Vh8nV1df1kuLq9mSlRZmCEopVjNUAlgjIuWFFCWLGegWFawHDknqLKqIAB11jAGgjQFE21blgIidJG8PuruOuvl6JuXlFdZzrPoRSSWR6KxsJE7p7fgDtKClrcB61YSXNCxUQmKU0Vp3nKacy7qsi7qNotG54QI4CxqfRyr9fUWG4UmOOgmotMdo9dyZfeSVUSQcjjMu1HA2WiVD3KrvcKuA4O2vz03ZyQndKj15h_04e5GagWxAW1aG-uqQVRe8mKwUmQ0UvMHqPg1uNUqPrJWx_gk4f0kITIh3vUKeu_l8vu3_2evf03ZtyfsGqELa2-7fng9fgryI6ic9d5he28yJXKYkTs35DAjcpyRmPbq9ILuk-6Ggv0FPs8OXg</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1492542244</pqid></control><display><type>article</type><title>Comprehensive analysis to improve the validation rate for single nucleotide variants detected by next-generation sequencing</title><source>PubMed (Medline)</source><source>Publicly Available Content Database</source><creator>Park, Mi-Hyun ; Rhee, Hwanseok ; Park, Jung Hoon ; Woo, Hae-Mi ; Choi, Byung-Ok ; Kim, Bo-Young ; Chung, Ki Wha ; Cho, Yoo-Bok ; Kim, Hyung Jin ; Jung, Ji-Won ; Koo, Soo Kyung</creator><contributor>Calogero, Raffaele A.</contributor><creatorcontrib>Park, Mi-Hyun ; Rhee, Hwanseok ; Park, Jung Hoon ; Woo, Hae-Mi ; Choi, Byung-Ok ; Kim, Bo-Young ; Chung, Ki Wha ; Cho, Yoo-Bok ; Kim, Hyung Jin ; Jung, Ji-Won ; Koo, Soo Kyung ; Calogero, Raffaele A.</creatorcontrib><description>Next-generation sequencing (NGS) has enabled the high-throughput discovery of germline and somatic mutations. However, NGS-based variant detection is still prone to errors, resulting in inaccurate variant calls. Here, we categorized the variants detected by NGS according to total read depth (TD) and SNP quality (SNPQ), and performed Sanger sequencing with 348 selected non-synonymous single nucleotide variants (SNVs) for validation. Using the SAMtools and GATK algorithms, the validation rate was positively correlated with SNPQ but showed no correlation with TD. In addition, common variants called by both programs had a higher validation rate than caller-specific variants. We further examined several parameters to improve the validation rate, and found that strand bias (SB) was a key parameter. SB in NGS data showed a strong difference between the variants passing validation and those that failed validation, showing a validation rate of more than 92% (filtering cutoff value: alternate allele forward [AF] ≥ 20 and AF&lt;80 in SAMtools, SB&lt;-10 in GATK). Moreover, the validation rate increased significantly (up to 97-99%) when the variant was filtered together with the suggested values of mapping quality (MQ), SNPQ and SB. This detailed and systematic study provides comprehensive recommendations for improving validation rates, saving time and lowering cost in NGS analyses.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0086664</identifier><identifier>PMID: 24489763</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>Algorithms ; Analysis ; Bias ; Bioinformatics ; Biology ; Charcot-Marie-Tooth Disease - genetics ; Computer Science ; Cost analysis ; Deoxyribonucleic acid ; Disease ; DNA ; DNA sequencing ; Exome ; Filtration ; Genomes ; High-Throughput Nucleotide Sequencing ; Humans ; Medicine ; Mutation ; Polymorphism, Single Nucleotide ; Researchers ; Single-nucleotide polymorphism ; Statistical analysis</subject><ispartof>PloS one, 2014-01, Vol.9 (1), p.e86664-e86664</ispartof><rights>COPYRIGHT 2014 Public Library of Science</rights><rights>2014 Park et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>2014 Park et al 2014 Park et al</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c692t-eaccc3b3ec6a3346f07a88353ac32735e440ec2970aab2d33a60d736ff886aac3</citedby><cites>FETCH-LOGICAL-c692t-eaccc3b3ec6a3346f07a88353ac32735e440ec2970aab2d33a60d736ff886aac3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/1492542244/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/1492542244?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,44590,53791,53793,75126</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/24489763$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Calogero, Raffaele A.</contributor><creatorcontrib>Park, Mi-Hyun</creatorcontrib><creatorcontrib>Rhee, Hwanseok</creatorcontrib><creatorcontrib>Park, Jung Hoon</creatorcontrib><creatorcontrib>Woo, Hae-Mi</creatorcontrib><creatorcontrib>Choi, Byung-Ok</creatorcontrib><creatorcontrib>Kim, Bo-Young</creatorcontrib><creatorcontrib>Chung, Ki Wha</creatorcontrib><creatorcontrib>Cho, Yoo-Bok</creatorcontrib><creatorcontrib>Kim, Hyung Jin</creatorcontrib><creatorcontrib>Jung, Ji-Won</creatorcontrib><creatorcontrib>Koo, Soo Kyung</creatorcontrib><title>Comprehensive analysis to improve the validation rate for single nucleotide variants detected by next-generation sequencing</title><title>PloS one</title><addtitle>PLoS One</addtitle><description>Next-generation sequencing (NGS) has enabled the high-throughput discovery of germline and somatic mutations. However, NGS-based variant detection is still prone to errors, resulting in inaccurate variant calls. Here, we categorized the variants detected by NGS according to total read depth (TD) and SNP quality (SNPQ), and performed Sanger sequencing with 348 selected non-synonymous single nucleotide variants (SNVs) for validation. Using the SAMtools and GATK algorithms, the validation rate was positively correlated with SNPQ but showed no correlation with TD. In addition, common variants called by both programs had a higher validation rate than caller-specific variants. We further examined several parameters to improve the validation rate, and found that strand bias (SB) was a key parameter. SB in NGS data showed a strong difference between the variants passing validation and those that failed validation, showing a validation rate of more than 92% (filtering cutoff value: alternate allele forward [AF] ≥ 20 and AF&lt;80 in SAMtools, SB&lt;-10 in GATK). Moreover, the validation rate increased significantly (up to 97-99%) when the variant was filtered together with the suggested values of mapping quality (MQ), SNPQ and SB. This detailed and systematic study provides comprehensive recommendations for improving validation rates, saving time and lowering cost in NGS analyses.</description><subject>Algorithms</subject><subject>Analysis</subject><subject>Bias</subject><subject>Bioinformatics</subject><subject>Biology</subject><subject>Charcot-Marie-Tooth Disease - genetics</subject><subject>Computer Science</subject><subject>Cost analysis</subject><subject>Deoxyribonucleic acid</subject><subject>Disease</subject><subject>DNA</subject><subject>DNA sequencing</subject><subject>Exome</subject><subject>Filtration</subject><subject>Genomes</subject><subject>High-Throughput Nucleotide Sequencing</subject><subject>Humans</subject><subject>Medicine</subject><subject>Mutation</subject><subject>Polymorphism, Single Nucleotide</subject><subject>Researchers</subject><subject>Single-nucleotide polymorphism</subject><subject>Statistical analysis</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNqNk02L2zAQhk1p6W63_QelNRRKe0gqWbJsXwpL6EdgYaFfVzGWx4mCI2UlOTT0z1feeJe47KH4IDN65h3NK02SvKRkTllBP2xs7wx08501OCekFELwR8k5rVg2Exlhj0_-z5Jn3m8IyVnEniZnGedlVQh2nvxZ2O3O4RqN13tMISoevPZpsKmOGzbGwhrTPXS6gaCtSR0ETFvrUq_NqsPU9KpDG3QzUE6DCT5tMKAK2KT1ITX4O8xWaNAd8z3e9GhUTH6ePGmh8_hiXC-Sn58__Vh8nV1df1kuLq9mSlRZmCEopVjNUAlgjIuWFFCWLGegWFawHDknqLKqIAB11jAGgjQFE21blgIidJG8PuruOuvl6JuXlFdZzrPoRSSWR6KxsJE7p7fgDtKClrcB61YSXNCxUQmKU0Vp3nKacy7qsi7qNotG54QI4CxqfRyr9fUWG4UmOOgmotMdo9dyZfeSVUSQcjjMu1HA2WiVD3KrvcKuA4O2vz03ZyQndKj15h_04e5GagWxAW1aG-uqQVRe8mKwUmQ0UvMHqPg1uNUqPrJWx_gk4f0kITIh3vUKeu_l8vu3_2evf03ZtyfsGqELa2-7fng9fgryI6ic9d5he28yJXKYkTs35DAjcpyRmPbq9ILuk-6Ggv0FPs8OXg</recordid><startdate>20140129</startdate><enddate>20140129</enddate><creator>Park, Mi-Hyun</creator><creator>Rhee, Hwanseok</creator><creator>Park, Jung Hoon</creator><creator>Woo, Hae-Mi</creator><creator>Choi, Byung-Ok</creator><creator>Kim, Bo-Young</creator><creator>Chung, Ki Wha</creator><creator>Cho, Yoo-Bok</creator><creator>Kim, Hyung Jin</creator><creator>Jung, Ji-Won</creator><creator>Koo, Soo Kyung</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20140129</creationdate><title>Comprehensive analysis to improve the validation rate for single nucleotide variants detected by next-generation sequencing</title><author>Park, Mi-Hyun ; Rhee, Hwanseok ; Park, Jung Hoon ; Woo, Hae-Mi ; Choi, Byung-Ok ; Kim, Bo-Young ; Chung, Ki Wha ; Cho, Yoo-Bok ; Kim, Hyung Jin ; Jung, Ji-Won ; Koo, Soo Kyung</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c692t-eaccc3b3ec6a3346f07a88353ac32735e440ec2970aab2d33a60d736ff886aac3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Algorithms</topic><topic>Analysis</topic><topic>Bias</topic><topic>Bioinformatics</topic><topic>Biology</topic><topic>Charcot-Marie-Tooth Disease - genetics</topic><topic>Computer Science</topic><topic>Cost analysis</topic><topic>Deoxyribonucleic acid</topic><topic>Disease</topic><topic>DNA</topic><topic>DNA sequencing</topic><topic>Exome</topic><topic>Filtration</topic><topic>Genomes</topic><topic>High-Throughput Nucleotide Sequencing</topic><topic>Humans</topic><topic>Medicine</topic><topic>Mutation</topic><topic>Polymorphism, Single Nucleotide</topic><topic>Researchers</topic><topic>Single-nucleotide polymorphism</topic><topic>Statistical analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Park, Mi-Hyun</creatorcontrib><creatorcontrib>Rhee, Hwanseok</creatorcontrib><creatorcontrib>Park, Jung Hoon</creatorcontrib><creatorcontrib>Woo, Hae-Mi</creatorcontrib><creatorcontrib>Choi, Byung-Ok</creatorcontrib><creatorcontrib>Kim, Bo-Young</creatorcontrib><creatorcontrib>Chung, Ki Wha</creatorcontrib><creatorcontrib>Cho, Yoo-Bok</creatorcontrib><creatorcontrib>Kim, Hyung Jin</creatorcontrib><creatorcontrib>Jung, Ji-Won</creatorcontrib><creatorcontrib>Koo, Soo Kyung</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Nursing &amp; Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological &amp; Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>Agricultural &amp; Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>https://resources.nclive.org/materials</collection><collection>Nursing &amp; Allied Health Database (Alumni Edition)</collection><collection>Meteorological &amp; Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Agriculture Science Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>ProQuest Biological Science Journals</collection><collection>ProQuest Engineering Database</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>ProQuest advanced technologies &amp; aerospace journals</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials science collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Park, Mi-Hyun</au><au>Rhee, Hwanseok</au><au>Park, Jung Hoon</au><au>Woo, Hae-Mi</au><au>Choi, Byung-Ok</au><au>Kim, Bo-Young</au><au>Chung, Ki Wha</au><au>Cho, Yoo-Bok</au><au>Kim, Hyung Jin</au><au>Jung, Ji-Won</au><au>Koo, Soo Kyung</au><au>Calogero, Raffaele A.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Comprehensive analysis to improve the validation rate for single nucleotide variants detected by next-generation sequencing</atitle><jtitle>PloS one</jtitle><addtitle>PLoS One</addtitle><date>2014-01-29</date><risdate>2014</risdate><volume>9</volume><issue>1</issue><spage>e86664</spage><epage>e86664</epage><pages>e86664-e86664</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>Next-generation sequencing (NGS) has enabled the high-throughput discovery of germline and somatic mutations. However, NGS-based variant detection is still prone to errors, resulting in inaccurate variant calls. Here, we categorized the variants detected by NGS according to total read depth (TD) and SNP quality (SNPQ), and performed Sanger sequencing with 348 selected non-synonymous single nucleotide variants (SNVs) for validation. Using the SAMtools and GATK algorithms, the validation rate was positively correlated with SNPQ but showed no correlation with TD. In addition, common variants called by both programs had a higher validation rate than caller-specific variants. We further examined several parameters to improve the validation rate, and found that strand bias (SB) was a key parameter. SB in NGS data showed a strong difference between the variants passing validation and those that failed validation, showing a validation rate of more than 92% (filtering cutoff value: alternate allele forward [AF] ≥ 20 and AF&lt;80 in SAMtools, SB&lt;-10 in GATK). Moreover, the validation rate increased significantly (up to 97-99%) when the variant was filtered together with the suggested values of mapping quality (MQ), SNPQ and SB. This detailed and systematic study provides comprehensive recommendations for improving validation rates, saving time and lowering cost in NGS analyses.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>24489763</pmid><doi>10.1371/journal.pone.0086664</doi><tpages>e86664</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1932-6203
ispartof PloS one, 2014-01, Vol.9 (1), p.e86664-e86664
issn 1932-6203
1932-6203
language eng
recordid cdi_plos_journals_1492542244
source PubMed (Medline); Publicly Available Content Database
subjects Algorithms
Analysis
Bias
Bioinformatics
Biology
Charcot-Marie-Tooth Disease - genetics
Computer Science
Cost analysis
Deoxyribonucleic acid
Disease
DNA
DNA sequencing
Exome
Filtration
Genomes
High-Throughput Nucleotide Sequencing
Humans
Medicine
Mutation
Polymorphism, Single Nucleotide
Researchers
Single-nucleotide polymorphism
Statistical analysis
title Comprehensive analysis to improve the validation rate for single nucleotide variants detected by next-generation sequencing
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T03%3A42%3A42IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Comprehensive%20analysis%20to%20improve%20the%20validation%20rate%20for%20single%20nucleotide%20variants%20detected%20by%20next-generation%20sequencing&rft.jtitle=PloS%20one&rft.au=Park,%20Mi-Hyun&rft.date=2014-01-29&rft.volume=9&rft.issue=1&rft.spage=e86664&rft.epage=e86664&rft.pages=e86664-e86664&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0086664&rft_dat=%3Cgale_plos_%3EA478835621%3C/gale_plos_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c692t-eaccc3b3ec6a3346f07a88353ac32735e440ec2970aab2d33a60d736ff886aac3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1492542244&rft_id=info:pmid/24489763&rft_galeid=A478835621&rfr_iscdi=true