Loading…

CODEX: a normalization and copy number variation detection method for whole exome sequencing

High-throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of human diseases. Copy number variation (CNV) is an important type of genomic variation, but detecting and characterizing CNV from exome sequencing is challenging due to the high leve...

Full description

Saved in:
Bibliographic Details
Published in:Nucleic acids research 2015-03, Vol.43 (6), p.e39-e39
Main Authors: Jiang, Yuchao, Oldridge, Derek A, Diskin, Sharon J, Zhang, Nancy R
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c480t-2e39d624299aa7f85ca381670bb62a23e682fa8b3f0b997a13eb4440a4dcf03a3
cites cdi_FETCH-LOGICAL-c480t-2e39d624299aa7f85ca381670bb62a23e682fa8b3f0b997a13eb4440a4dcf03a3
container_end_page e39
container_issue 6
container_start_page e39
container_title Nucleic acids research
container_volume 43
creator Jiang, Yuchao
Oldridge, Derek A
Diskin, Sharon J
Zhang, Nancy R
description High-throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of human diseases. Copy number variation (CNV) is an important type of genomic variation, but detecting and characterizing CNV from exome sequencing is challenging due to the high level of biases and artifacts. We propose CODEX, a normalization and CNV calling procedure for whole exome sequencing data. The Poisson latent factor model in CODEX includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based recursive segmentation procedure that explicitly models the count-based exome sequencing data. CODEX is compared to existing methods on a population analysis of HapMap samples from the 1000 Genomes Project, and shown to be more accurate on three microarray-based validation data sets. We further evaluate performance on 222 neuroblastoma samples with matched normals and focus on a well-studied rare somatic CNV within the ATRX gene. We show that the cross-sample normalization procedure of CODEX removes more noise than normalizing the tumor against the matched normal and that the segmentation procedure performs well in detecting CNVs with nested structures.
doi_str_mv 10.1093/nar/gku1363
format article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4381046</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1790940793</sourcerecordid><originalsourceid>FETCH-LOGICAL-c480t-2e39d624299aa7f85ca381670bb62a23e682fa8b3f0b997a13eb4440a4dcf03a3</originalsourceid><addsrcrecordid>eNqFkc1LHEEQxZtgiBvNKXfpoxAm9tf2dHsQZGNiQPASwYPQ1PTU7I7OdK_dMybmr8_orpKccqqC9-PVKx4hHzn7zJmVRwHS0fJu5FLLN2Q2DVEoq8UOmTHJ5gVnyuyS9znfMsYVn6t3ZFfMNTdG2Rm5WVx-Obs-pkBDTD107W8Y2hgohJr6uH6kYewrTPQBUrtRahzQP289DqtY0yYm-nMVO6T4K_ZIM96PGHwblvvkbQNdxg_buUeuvp79WJwXF5ffvi9OLwqvDBsKgdLWWihhLUDZmLkHabguWVVpAUKiNqIBU8mGVdaWwCVWSikGqvYNkyD3yMnGdz1WPdYew5Cgc-vU9pAeXYTW_auEduWW8cGp6Q5TejI43BqkOIXPg-vb7LHrIGAcs-OlZVax0sr_o1pbI60U5YR-2qA-xZwTNq-JOHNP1bmpOretbqIP_n7ilX3pSv4BGB-XGw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1669839327</pqid></control><display><type>article</type><title>CODEX: a normalization and copy number variation detection method for whole exome sequencing</title><source>PubMed (Medline)</source><source>Open Access: Oxford University Press Open Journals</source><creator>Jiang, Yuchao ; Oldridge, Derek A ; Diskin, Sharon J ; Zhang, Nancy R</creator><creatorcontrib>Jiang, Yuchao ; Oldridge, Derek A ; Diskin, Sharon J ; Zhang, Nancy R</creatorcontrib><description>High-throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of human diseases. Copy number variation (CNV) is an important type of genomic variation, but detecting and characterizing CNV from exome sequencing is challenging due to the high level of biases and artifacts. We propose CODEX, a normalization and CNV calling procedure for whole exome sequencing data. The Poisson latent factor model in CODEX includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based recursive segmentation procedure that explicitly models the count-based exome sequencing data. CODEX is compared to existing methods on a population analysis of HapMap samples from the 1000 Genomes Project, and shown to be more accurate on three microarray-based validation data sets. We further evaluate performance on 222 neuroblastoma samples with matched normals and focus on a well-studied rare somatic CNV within the ATRX gene. We show that the cross-sample normalization procedure of CODEX removes more noise than normalizing the tumor against the matched normal and that the segmentation procedure performs well in detecting CNVs with nested structures.</description><identifier>ISSN: 0305-1048</identifier><identifier>EISSN: 1362-4962</identifier><identifier>DOI: 10.1093/nar/gku1363</identifier><identifier>PMID: 25618849</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; Base Composition ; Bias ; Case-Control Studies ; Databases, Nucleic Acid - statistics &amp; numerical data ; DNA Copy Number Variations ; DNA Helicases - genetics ; DNA, Neoplasm - genetics ; Exome ; Female ; High-Throughput Nucleotide Sequencing - methods ; High-Throughput Nucleotide Sequencing - statistics &amp; numerical data ; Humans ; Likelihood Functions ; Male ; Methods Online ; Neuroblastoma - genetics ; Nuclear Proteins - genetics ; Sequence Analysis, DNA - methods ; Sequence Analysis, DNA - statistics &amp; numerical data ; X-linked Nuclear Protein</subject><ispartof>Nucleic acids research, 2015-03, Vol.43 (6), p.e39-e39</ispartof><rights>The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.</rights><rights>The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. 2015</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c480t-2e39d624299aa7f85ca381670bb62a23e682fa8b3f0b997a13eb4440a4dcf03a3</citedby><cites>FETCH-LOGICAL-c480t-2e39d624299aa7f85ca381670bb62a23e682fa8b3f0b997a13eb4440a4dcf03a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4381046/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4381046/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/25618849$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Jiang, Yuchao</creatorcontrib><creatorcontrib>Oldridge, Derek A</creatorcontrib><creatorcontrib>Diskin, Sharon J</creatorcontrib><creatorcontrib>Zhang, Nancy R</creatorcontrib><title>CODEX: a normalization and copy number variation detection method for whole exome sequencing</title><title>Nucleic acids research</title><addtitle>Nucleic Acids Res</addtitle><description>High-throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of human diseases. Copy number variation (CNV) is an important type of genomic variation, but detecting and characterizing CNV from exome sequencing is challenging due to the high level of biases and artifacts. We propose CODEX, a normalization and CNV calling procedure for whole exome sequencing data. The Poisson latent factor model in CODEX includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based recursive segmentation procedure that explicitly models the count-based exome sequencing data. CODEX is compared to existing methods on a population analysis of HapMap samples from the 1000 Genomes Project, and shown to be more accurate on three microarray-based validation data sets. We further evaluate performance on 222 neuroblastoma samples with matched normals and focus on a well-studied rare somatic CNV within the ATRX gene. We show that the cross-sample normalization procedure of CODEX removes more noise than normalizing the tumor against the matched normal and that the segmentation procedure performs well in detecting CNVs with nested structures.</description><subject>Algorithms</subject><subject>Base Composition</subject><subject>Bias</subject><subject>Case-Control Studies</subject><subject>Databases, Nucleic Acid - statistics &amp; numerical data</subject><subject>DNA Copy Number Variations</subject><subject>DNA Helicases - genetics</subject><subject>DNA, Neoplasm - genetics</subject><subject>Exome</subject><subject>Female</subject><subject>High-Throughput Nucleotide Sequencing - methods</subject><subject>High-Throughput Nucleotide Sequencing - statistics &amp; numerical data</subject><subject>Humans</subject><subject>Likelihood Functions</subject><subject>Male</subject><subject>Methods Online</subject><subject>Neuroblastoma - genetics</subject><subject>Nuclear Proteins - genetics</subject><subject>Sequence Analysis, DNA - methods</subject><subject>Sequence Analysis, DNA - statistics &amp; numerical data</subject><subject>X-linked Nuclear Protein</subject><issn>0305-1048</issn><issn>1362-4962</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNqFkc1LHEEQxZtgiBvNKXfpoxAm9tf2dHsQZGNiQPASwYPQ1PTU7I7OdK_dMybmr8_orpKccqqC9-PVKx4hHzn7zJmVRwHS0fJu5FLLN2Q2DVEoq8UOmTHJ5gVnyuyS9znfMsYVn6t3ZFfMNTdG2Rm5WVx-Obs-pkBDTD107W8Y2hgohJr6uH6kYewrTPQBUrtRahzQP289DqtY0yYm-nMVO6T4K_ZIM96PGHwblvvkbQNdxg_buUeuvp79WJwXF5ffvi9OLwqvDBsKgdLWWihhLUDZmLkHabguWVVpAUKiNqIBU8mGVdaWwCVWSikGqvYNkyD3yMnGdz1WPdYew5Cgc-vU9pAeXYTW_auEduWW8cGp6Q5TejI43BqkOIXPg-vb7LHrIGAcs-OlZVax0sr_o1pbI60U5YR-2qA-xZwTNq-JOHNP1bmpOretbqIP_n7ilX3pSv4BGB-XGw</recordid><startdate>20150331</startdate><enddate>20150331</enddate><creator>Jiang, Yuchao</creator><creator>Oldridge, Derek A</creator><creator>Diskin, Sharon J</creator><creator>Zhang, Nancy R</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>RC3</scope><scope>5PM</scope></search><sort><creationdate>20150331</creationdate><title>CODEX: a normalization and copy number variation detection method for whole exome sequencing</title><author>Jiang, Yuchao ; Oldridge, Derek A ; Diskin, Sharon J ; Zhang, Nancy R</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c480t-2e39d624299aa7f85ca381670bb62a23e682fa8b3f0b997a13eb4440a4dcf03a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Algorithms</topic><topic>Base Composition</topic><topic>Bias</topic><topic>Case-Control Studies</topic><topic>Databases, Nucleic Acid - statistics &amp; numerical data</topic><topic>DNA Copy Number Variations</topic><topic>DNA Helicases - genetics</topic><topic>DNA, Neoplasm - genetics</topic><topic>Exome</topic><topic>Female</topic><topic>High-Throughput Nucleotide Sequencing - methods</topic><topic>High-Throughput Nucleotide Sequencing - statistics &amp; numerical data</topic><topic>Humans</topic><topic>Likelihood Functions</topic><topic>Male</topic><topic>Methods Online</topic><topic>Neuroblastoma - genetics</topic><topic>Nuclear Proteins - genetics</topic><topic>Sequence Analysis, DNA - methods</topic><topic>Sequence Analysis, DNA - statistics &amp; numerical data</topic><topic>X-linked Nuclear Protein</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Yuchao</creatorcontrib><creatorcontrib>Oldridge, Derek A</creatorcontrib><creatorcontrib>Diskin, Sharon J</creatorcontrib><creatorcontrib>Zhang, Nancy R</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nucleic acids research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jiang, Yuchao</au><au>Oldridge, Derek A</au><au>Diskin, Sharon J</au><au>Zhang, Nancy R</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CODEX: a normalization and copy number variation detection method for whole exome sequencing</atitle><jtitle>Nucleic acids research</jtitle><addtitle>Nucleic Acids Res</addtitle><date>2015-03-31</date><risdate>2015</risdate><volume>43</volume><issue>6</issue><spage>e39</spage><epage>e39</epage><pages>e39-e39</pages><issn>0305-1048</issn><eissn>1362-4962</eissn><abstract>High-throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of human diseases. Copy number variation (CNV) is an important type of genomic variation, but detecting and characterizing CNV from exome sequencing is challenging due to the high level of biases and artifacts. We propose CODEX, a normalization and CNV calling procedure for whole exome sequencing data. The Poisson latent factor model in CODEX includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based recursive segmentation procedure that explicitly models the count-based exome sequencing data. CODEX is compared to existing methods on a population analysis of HapMap samples from the 1000 Genomes Project, and shown to be more accurate on three microarray-based validation data sets. We further evaluate performance on 222 neuroblastoma samples with matched normals and focus on a well-studied rare somatic CNV within the ATRX gene. We show that the cross-sample normalization procedure of CODEX removes more noise than normalizing the tumor against the matched normal and that the segmentation procedure performs well in detecting CNVs with nested structures.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>25618849</pmid><doi>10.1093/nar/gku1363</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0305-1048
ispartof Nucleic acids research, 2015-03, Vol.43 (6), p.e39-e39
issn 0305-1048
1362-4962
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4381046
source PubMed (Medline); Open Access: Oxford University Press Open Journals
subjects Algorithms
Base Composition
Bias
Case-Control Studies
Databases, Nucleic Acid - statistics & numerical data
DNA Copy Number Variations
DNA Helicases - genetics
DNA, Neoplasm - genetics
Exome
Female
High-Throughput Nucleotide Sequencing - methods
High-Throughput Nucleotide Sequencing - statistics & numerical data
Humans
Likelihood Functions
Male
Methods Online
Neuroblastoma - genetics
Nuclear Proteins - genetics
Sequence Analysis, DNA - methods
Sequence Analysis, DNA - statistics & numerical data
X-linked Nuclear Protein
title CODEX: a normalization and copy number variation detection method for whole exome sequencing
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T07%3A09%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CODEX:%20a%20normalization%20and%20copy%20number%20variation%20detection%20method%20for%20whole%20exome%20sequencing&rft.jtitle=Nucleic%20acids%20research&rft.au=Jiang,%20Yuchao&rft.date=2015-03-31&rft.volume=43&rft.issue=6&rft.spage=e39&rft.epage=e39&rft.pages=e39-e39&rft.issn=0305-1048&rft.eissn=1362-4962&rft_id=info:doi/10.1093/nar/gku1363&rft_dat=%3Cproquest_pubme%3E1790940793%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c480t-2e39d624299aa7f85ca381670bb62a23e682fa8b3f0b997a13eb4440a4dcf03a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1669839327&rft_id=info:pmid/25618849&rfr_iscdi=true