Loading…

Exome sequence read depth methods for identifying copy number changes

Copy number variants (CNVs) play important roles in a number of human diseases and in pharmacogenetics. Powerful methods exist for CNV detection in whole genome sequencing (WGS) data, but such data are costly to obtain. Many disease causal CNVs span or are found in genome coding regions (exons), whi...

Full description

Saved in:
Bibliographic Details
Published in:Briefings in bioinformatics 2015-05, Vol.16 (3), p.380-392
Main Authors: Kadalayil, Latha, Rafiq, Sajjad, Rose-Zerilli, Matthew J J, Pengelly, Reuben J, Parker, Helen, Oscier, David, Strefford, Jonathan C, Tapper, William J, Gibson, Jane, Ennis, Sarah, Collins, Andrew
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c347t-4147053748ea1262266b26558fd8b8101c7e5ba3365c3684cfb4a0a296bd6eef3
cites cdi_FETCH-LOGICAL-c347t-4147053748ea1262266b26558fd8b8101c7e5ba3365c3684cfb4a0a296bd6eef3
container_end_page 392
container_issue 3
container_start_page 380
container_title Briefings in bioinformatics
container_volume 16
creator Kadalayil, Latha
Rafiq, Sajjad
Rose-Zerilli, Matthew J J
Pengelly, Reuben J
Parker, Helen
Oscier, David
Strefford, Jonathan C
Tapper, William J
Gibson, Jane
Ennis, Sarah
Collins, Andrew
description Copy number variants (CNVs) play important roles in a number of human diseases and in pharmacogenetics. Powerful methods exist for CNV detection in whole genome sequencing (WGS) data, but such data are costly to obtain. Many disease causal CNVs span or are found in genome coding regions (exons), which makes CNV detection using whole exome sequencing (WES) data attractive. If reliably validated against WGS-based CNVs, exome-derived CNVs have potential applications in a clinical setting. Several algorithms have been developed to exploit exome data for CNV detection and comparisons made to find the most suitable methods for particular data samples. The results are not consistent across studies. Here, we review some of the exome CNV detection methods based on depth of coverage profiles and examine their performance to identify problems contributing to discrepancies in published results. We also present a streamlined strategy that uses a single metric, the likelihood ratio, to compare exome methods, and we demonstrated its utility using the VarScan 2 and eXome Hidden Markov Model (XHMM) programs using paired normal and tumour exome data from chronic lymphocytic leukaemia patients. We use array-based somatic CNV (SCNV) calls as a reference standard to compute prevalence-independent statistics, such as sensitivity, specificity and likelihood ratio, for validation of the exome-derived SCNVs. We also account for factors known to influence the performance of exome read depth methods, such as CNV size and frequency, while comparing our findings with published results.
doi_str_mv 10.1093/bib/bbu027
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1709781824</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1701473875</sourcerecordid><originalsourceid>FETCH-LOGICAL-c347t-4147053748ea1262266b26558fd8b8101c7e5ba3365c3684cfb4a0a296bd6eef3</originalsourceid><addsrcrecordid>eNqN0U1Lw0AQBuBFFKvViz9AFryIELvfuzlKqR9Q8KLnsLuZtClNUncTsP_eLa0evOhp5vAwvMOL0BUl95TkfOJqN3FuIEwfoTMqtM4EkeJ4tyudSaH4CJ3HuCKEEW3oKRoxSVWeS3mGZrPPrgEc4WOA1gMOYEtcwqZf4gb6ZVdGXHUB1yW0fV1t63aBfbfZ4nZoHATsl7ZdQLxAJ5VdR7g8zDF6f5y9TZ-z-evTy_RhnnkudJ-JFI5IroUBS5liTCnHlJSmKo0zlFCvQTrLuZKeKyN85YQlluXKlQqg4mN0u7-7CV0KHPuiqaOH9dq20A2xoJrk6UPDxH9oSsONln9TZRgjXBKW6M0vuuqG0Kafd0oSSkWukrrbKx-6GANUxSbUjQ3bgpJiV1mRKiv2lSV8fTg5uAbKH_rdEf8Ci3-PsQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1685011496</pqid></control><display><type>article</type><title>Exome sequence read depth methods for identifying copy number changes</title><source>Oxford Journals Open Access Collection</source><source>Business Source Ultimate</source><source>PubMed Central</source><creator>Kadalayil, Latha ; Rafiq, Sajjad ; Rose-Zerilli, Matthew J J ; Pengelly, Reuben J ; Parker, Helen ; Oscier, David ; Strefford, Jonathan C ; Tapper, William J ; Gibson, Jane ; Ennis, Sarah ; Collins, Andrew</creator><creatorcontrib>Kadalayil, Latha ; Rafiq, Sajjad ; Rose-Zerilli, Matthew J J ; Pengelly, Reuben J ; Parker, Helen ; Oscier, David ; Strefford, Jonathan C ; Tapper, William J ; Gibson, Jane ; Ennis, Sarah ; Collins, Andrew</creatorcontrib><description>Copy number variants (CNVs) play important roles in a number of human diseases and in pharmacogenetics. Powerful methods exist for CNV detection in whole genome sequencing (WGS) data, but such data are costly to obtain. Many disease causal CNVs span or are found in genome coding regions (exons), which makes CNV detection using whole exome sequencing (WES) data attractive. If reliably validated against WGS-based CNVs, exome-derived CNVs have potential applications in a clinical setting. Several algorithms have been developed to exploit exome data for CNV detection and comparisons made to find the most suitable methods for particular data samples. The results are not consistent across studies. Here, we review some of the exome CNV detection methods based on depth of coverage profiles and examine their performance to identify problems contributing to discrepancies in published results. We also present a streamlined strategy that uses a single metric, the likelihood ratio, to compare exome methods, and we demonstrated its utility using the VarScan 2 and eXome Hidden Markov Model (XHMM) programs using paired normal and tumour exome data from chronic lymphocytic leukaemia patients. We use array-based somatic CNV (SCNV) calls as a reference standard to compute prevalence-independent statistics, such as sensitivity, specificity and likelihood ratio, for validation of the exome-derived SCNVs. We also account for factors known to influence the performance of exome read depth methods, such as CNV size and frequency, while comparing our findings with published results.</description><identifier>ISSN: 1467-5463</identifier><identifier>EISSN: 1477-4054</identifier><identifier>DOI: 10.1093/bib/bbu027</identifier><identifier>PMID: 25169955</identifier><language>eng</language><publisher>England: Oxford Publishing Limited (England)</publisher><subject>Algorithms ; Base Sequence ; Chromosome Mapping - methods ; Comparative analysis ; Data Interpretation, Statistical ; Diseases ; DNA Copy Number Variations - genetics ; DNA, Neoplasm - genetics ; Exome - genetics ; Gene sequencing ; Genomes ; Humans ; Leukemia ; Leukemia, Lymphocytic, Chronic, B-Cell - genetics ; Likelihood ratio ; Markov analysis ; Markov chains ; Mathematical models ; Molecular Sequence Data ; Pattern Recognition, Automated - methods ; Reproducibility of Results ; Reproduction ; Sensitivity and Specificity ; Sequence Analysis, DNA - methods ; Statistical methods</subject><ispartof>Briefings in bioinformatics, 2015-05, Vol.16 (3), p.380-392</ispartof><rights>The Author 2014. Published by Oxford University Press. For Permissions, please email: journals.permissions@oup.com.</rights><rights>Copyright Oxford Publishing Limited(England) May 2015</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c347t-4147053748ea1262266b26558fd8b8101c7e5ba3365c3684cfb4a0a296bd6eef3</citedby><cites>FETCH-LOGICAL-c347t-4147053748ea1262266b26558fd8b8101c7e5ba3365c3684cfb4a0a296bd6eef3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/25169955$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Kadalayil, Latha</creatorcontrib><creatorcontrib>Rafiq, Sajjad</creatorcontrib><creatorcontrib>Rose-Zerilli, Matthew J J</creatorcontrib><creatorcontrib>Pengelly, Reuben J</creatorcontrib><creatorcontrib>Parker, Helen</creatorcontrib><creatorcontrib>Oscier, David</creatorcontrib><creatorcontrib>Strefford, Jonathan C</creatorcontrib><creatorcontrib>Tapper, William J</creatorcontrib><creatorcontrib>Gibson, Jane</creatorcontrib><creatorcontrib>Ennis, Sarah</creatorcontrib><creatorcontrib>Collins, Andrew</creatorcontrib><title>Exome sequence read depth methods for identifying copy number changes</title><title>Briefings in bioinformatics</title><addtitle>Brief Bioinform</addtitle><description>Copy number variants (CNVs) play important roles in a number of human diseases and in pharmacogenetics. Powerful methods exist for CNV detection in whole genome sequencing (WGS) data, but such data are costly to obtain. Many disease causal CNVs span or are found in genome coding regions (exons), which makes CNV detection using whole exome sequencing (WES) data attractive. If reliably validated against WGS-based CNVs, exome-derived CNVs have potential applications in a clinical setting. Several algorithms have been developed to exploit exome data for CNV detection and comparisons made to find the most suitable methods for particular data samples. The results are not consistent across studies. Here, we review some of the exome CNV detection methods based on depth of coverage profiles and examine their performance to identify problems contributing to discrepancies in published results. We also present a streamlined strategy that uses a single metric, the likelihood ratio, to compare exome methods, and we demonstrated its utility using the VarScan 2 and eXome Hidden Markov Model (XHMM) programs using paired normal and tumour exome data from chronic lymphocytic leukaemia patients. We use array-based somatic CNV (SCNV) calls as a reference standard to compute prevalence-independent statistics, such as sensitivity, specificity and likelihood ratio, for validation of the exome-derived SCNVs. We also account for factors known to influence the performance of exome read depth methods, such as CNV size and frequency, while comparing our findings with published results.</description><subject>Algorithms</subject><subject>Base Sequence</subject><subject>Chromosome Mapping - methods</subject><subject>Comparative analysis</subject><subject>Data Interpretation, Statistical</subject><subject>Diseases</subject><subject>DNA Copy Number Variations - genetics</subject><subject>DNA, Neoplasm - genetics</subject><subject>Exome - genetics</subject><subject>Gene sequencing</subject><subject>Genomes</subject><subject>Humans</subject><subject>Leukemia</subject><subject>Leukemia, Lymphocytic, Chronic, B-Cell - genetics</subject><subject>Likelihood ratio</subject><subject>Markov analysis</subject><subject>Markov chains</subject><subject>Mathematical models</subject><subject>Molecular Sequence Data</subject><subject>Pattern Recognition, Automated - methods</subject><subject>Reproducibility of Results</subject><subject>Reproduction</subject><subject>Sensitivity and Specificity</subject><subject>Sequence Analysis, DNA - methods</subject><subject>Statistical methods</subject><issn>1467-5463</issn><issn>1477-4054</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNqN0U1Lw0AQBuBFFKvViz9AFryIELvfuzlKqR9Q8KLnsLuZtClNUncTsP_eLa0evOhp5vAwvMOL0BUl95TkfOJqN3FuIEwfoTMqtM4EkeJ4tyudSaH4CJ3HuCKEEW3oKRoxSVWeS3mGZrPPrgEc4WOA1gMOYEtcwqZf4gb6ZVdGXHUB1yW0fV1t63aBfbfZ4nZoHATsl7ZdQLxAJ5VdR7g8zDF6f5y9TZ-z-evTy_RhnnkudJ-JFI5IroUBS5liTCnHlJSmKo0zlFCvQTrLuZKeKyN85YQlluXKlQqg4mN0u7-7CV0KHPuiqaOH9dq20A2xoJrk6UPDxH9oSsONln9TZRgjXBKW6M0vuuqG0Kafd0oSSkWukrrbKx-6GANUxSbUjQ3bgpJiV1mRKiv2lSV8fTg5uAbKH_rdEf8Ci3-PsQ</recordid><startdate>201505</startdate><enddate>201505</enddate><creator>Kadalayil, Latha</creator><creator>Rafiq, Sajjad</creator><creator>Rose-Zerilli, Matthew J J</creator><creator>Pengelly, Reuben J</creator><creator>Parker, Helen</creator><creator>Oscier, David</creator><creator>Strefford, Jonathan C</creator><creator>Tapper, William J</creator><creator>Gibson, Jane</creator><creator>Ennis, Sarah</creator><creator>Collins, Andrew</creator><general>Oxford Publishing Limited (England)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QO</scope><scope>7SC</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>K9.</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>RC3</scope><scope>7X8</scope></search><sort><creationdate>201505</creationdate><title>Exome sequence read depth methods for identifying copy number changes</title><author>Kadalayil, Latha ; Rafiq, Sajjad ; Rose-Zerilli, Matthew J J ; Pengelly, Reuben J ; Parker, Helen ; Oscier, David ; Strefford, Jonathan C ; Tapper, William J ; Gibson, Jane ; Ennis, Sarah ; Collins, Andrew</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c347t-4147053748ea1262266b26558fd8b8101c7e5ba3365c3684cfb4a0a296bd6eef3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Algorithms</topic><topic>Base Sequence</topic><topic>Chromosome Mapping - methods</topic><topic>Comparative analysis</topic><topic>Data Interpretation, Statistical</topic><topic>Diseases</topic><topic>DNA Copy Number Variations - genetics</topic><topic>DNA, Neoplasm - genetics</topic><topic>Exome - genetics</topic><topic>Gene sequencing</topic><topic>Genomes</topic><topic>Humans</topic><topic>Leukemia</topic><topic>Leukemia, Lymphocytic, Chronic, B-Cell - genetics</topic><topic>Likelihood ratio</topic><topic>Markov analysis</topic><topic>Markov chains</topic><topic>Mathematical models</topic><topic>Molecular Sequence Data</topic><topic>Pattern Recognition, Automated - methods</topic><topic>Reproducibility of Results</topic><topic>Reproduction</topic><topic>Sensitivity and Specificity</topic><topic>Sequence Analysis, DNA - methods</topic><topic>Statistical methods</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kadalayil, Latha</creatorcontrib><creatorcontrib>Rafiq, Sajjad</creatorcontrib><creatorcontrib>Rose-Zerilli, Matthew J J</creatorcontrib><creatorcontrib>Pengelly, Reuben J</creatorcontrib><creatorcontrib>Parker, Helen</creatorcontrib><creatorcontrib>Oscier, David</creatorcontrib><creatorcontrib>Strefford, Jonathan C</creatorcontrib><creatorcontrib>Tapper, William J</creatorcontrib><creatorcontrib>Gibson, Jane</creatorcontrib><creatorcontrib>Ennis, Sarah</creatorcontrib><creatorcontrib>Collins, Andrew</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>Briefings in bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kadalayil, Latha</au><au>Rafiq, Sajjad</au><au>Rose-Zerilli, Matthew J J</au><au>Pengelly, Reuben J</au><au>Parker, Helen</au><au>Oscier, David</au><au>Strefford, Jonathan C</au><au>Tapper, William J</au><au>Gibson, Jane</au><au>Ennis, Sarah</au><au>Collins, Andrew</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Exome sequence read depth methods for identifying copy number changes</atitle><jtitle>Briefings in bioinformatics</jtitle><addtitle>Brief Bioinform</addtitle><date>2015-05</date><risdate>2015</risdate><volume>16</volume><issue>3</issue><spage>380</spage><epage>392</epage><pages>380-392</pages><issn>1467-5463</issn><eissn>1477-4054</eissn><abstract>Copy number variants (CNVs) play important roles in a number of human diseases and in pharmacogenetics. Powerful methods exist for CNV detection in whole genome sequencing (WGS) data, but such data are costly to obtain. Many disease causal CNVs span or are found in genome coding regions (exons), which makes CNV detection using whole exome sequencing (WES) data attractive. If reliably validated against WGS-based CNVs, exome-derived CNVs have potential applications in a clinical setting. Several algorithms have been developed to exploit exome data for CNV detection and comparisons made to find the most suitable methods for particular data samples. The results are not consistent across studies. Here, we review some of the exome CNV detection methods based on depth of coverage profiles and examine their performance to identify problems contributing to discrepancies in published results. We also present a streamlined strategy that uses a single metric, the likelihood ratio, to compare exome methods, and we demonstrated its utility using the VarScan 2 and eXome Hidden Markov Model (XHMM) programs using paired normal and tumour exome data from chronic lymphocytic leukaemia patients. We use array-based somatic CNV (SCNV) calls as a reference standard to compute prevalence-independent statistics, such as sensitivity, specificity and likelihood ratio, for validation of the exome-derived SCNVs. We also account for factors known to influence the performance of exome read depth methods, such as CNV size and frequency, while comparing our findings with published results.</abstract><cop>England</cop><pub>Oxford Publishing Limited (England)</pub><pmid>25169955</pmid><doi>10.1093/bib/bbu027</doi><tpages>13</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1467-5463
ispartof Briefings in bioinformatics, 2015-05, Vol.16 (3), p.380-392
issn 1467-5463
1477-4054
language eng
recordid cdi_proquest_miscellaneous_1709781824
source Oxford Journals Open Access Collection; Business Source Ultimate; PubMed Central
subjects Algorithms
Base Sequence
Chromosome Mapping - methods
Comparative analysis
Data Interpretation, Statistical
Diseases
DNA Copy Number Variations - genetics
DNA, Neoplasm - genetics
Exome - genetics
Gene sequencing
Genomes
Humans
Leukemia
Leukemia, Lymphocytic, Chronic, B-Cell - genetics
Likelihood ratio
Markov analysis
Markov chains
Mathematical models
Molecular Sequence Data
Pattern Recognition, Automated - methods
Reproducibility of Results
Reproduction
Sensitivity and Specificity
Sequence Analysis, DNA - methods
Statistical methods
title Exome sequence read depth methods for identifying copy number changes
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T02%3A07%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Exome%20sequence%20read%20depth%20methods%20for%20identifying%20copy%20number%20changes&rft.jtitle=Briefings%20in%20bioinformatics&rft.au=Kadalayil,%20Latha&rft.date=2015-05&rft.volume=16&rft.issue=3&rft.spage=380&rft.epage=392&rft.pages=380-392&rft.issn=1467-5463&rft.eissn=1477-4054&rft_id=info:doi/10.1093/bib/bbu027&rft_dat=%3Cproquest_cross%3E1701473875%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c347t-4147053748ea1262266b26558fd8b8101c7e5ba3365c3684cfb4a0a296bd6eef3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1685011496&rft_id=info:pmid/25169955&rfr_iscdi=true