Loading…
CODEX: a normalization and copy number variation detection method for whole exome sequencing
High-throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of human diseases. Copy number variation (CNV) is an important type of genomic variation, but detecting and characterizing CNV from exome sequencing is challenging due to the high leve...
Saved in:
Published in: | Nucleic acids research 2015-03, Vol.43 (6), p.e39-e39 |
---|---|
Main Authors: | , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c480t-2e39d624299aa7f85ca381670bb62a23e682fa8b3f0b997a13eb4440a4dcf03a3 |
---|---|
cites | cdi_FETCH-LOGICAL-c480t-2e39d624299aa7f85ca381670bb62a23e682fa8b3f0b997a13eb4440a4dcf03a3 |
container_end_page | e39 |
container_issue | 6 |
container_start_page | e39 |
container_title | Nucleic acids research |
container_volume | 43 |
creator | Jiang, Yuchao Oldridge, Derek A Diskin, Sharon J Zhang, Nancy R |
description | High-throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of human diseases. Copy number variation (CNV) is an important type of genomic variation, but detecting and characterizing CNV from exome sequencing is challenging due to the high level of biases and artifacts. We propose CODEX, a normalization and CNV calling procedure for whole exome sequencing data. The Poisson latent factor model in CODEX includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based recursive segmentation procedure that explicitly models the count-based exome sequencing data. CODEX is compared to existing methods on a population analysis of HapMap samples from the 1000 Genomes Project, and shown to be more accurate on three microarray-based validation data sets. We further evaluate performance on 222 neuroblastoma samples with matched normals and focus on a well-studied rare somatic CNV within the ATRX gene. We show that the cross-sample normalization procedure of CODEX removes more noise than normalizing the tumor against the matched normal and that the segmentation procedure performs well in detecting CNVs with nested structures. |
doi_str_mv | 10.1093/nar/gku1363 |
format | article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4381046</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1790940793</sourcerecordid><originalsourceid>FETCH-LOGICAL-c480t-2e39d624299aa7f85ca381670bb62a23e682fa8b3f0b997a13eb4440a4dcf03a3</originalsourceid><addsrcrecordid>eNqFkc1LHEEQxZtgiBvNKXfpoxAm9tf2dHsQZGNiQPASwYPQ1PTU7I7OdK_dMybmr8_orpKccqqC9-PVKx4hHzn7zJmVRwHS0fJu5FLLN2Q2DVEoq8UOmTHJ5gVnyuyS9znfMsYVn6t3ZFfMNTdG2Rm5WVx-Obs-pkBDTD107W8Y2hgohJr6uH6kYewrTPQBUrtRahzQP289DqtY0yYm-nMVO6T4K_ZIM96PGHwblvvkbQNdxg_buUeuvp79WJwXF5ffvi9OLwqvDBsKgdLWWihhLUDZmLkHabguWVVpAUKiNqIBU8mGVdaWwCVWSikGqvYNkyD3yMnGdz1WPdYew5Cgc-vU9pAeXYTW_auEduWW8cGp6Q5TejI43BqkOIXPg-vb7LHrIGAcs-OlZVax0sr_o1pbI60U5YR-2qA-xZwTNq-JOHNP1bmpOretbqIP_n7ilX3pSv4BGB-XGw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1669839327</pqid></control><display><type>article</type><title>CODEX: a normalization and copy number variation detection method for whole exome sequencing</title><source>PubMed (Medline)</source><source>Open Access: Oxford University Press Open Journals</source><creator>Jiang, Yuchao ; Oldridge, Derek A ; Diskin, Sharon J ; Zhang, Nancy R</creator><creatorcontrib>Jiang, Yuchao ; Oldridge, Derek A ; Diskin, Sharon J ; Zhang, Nancy R</creatorcontrib><description>High-throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of human diseases. Copy number variation (CNV) is an important type of genomic variation, but detecting and characterizing CNV from exome sequencing is challenging due to the high level of biases and artifacts. We propose CODEX, a normalization and CNV calling procedure for whole exome sequencing data. The Poisson latent factor model in CODEX includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based recursive segmentation procedure that explicitly models the count-based exome sequencing data. CODEX is compared to existing methods on a population analysis of HapMap samples from the 1000 Genomes Project, and shown to be more accurate on three microarray-based validation data sets. We further evaluate performance on 222 neuroblastoma samples with matched normals and focus on a well-studied rare somatic CNV within the ATRX gene. We show that the cross-sample normalization procedure of CODEX removes more noise than normalizing the tumor against the matched normal and that the segmentation procedure performs well in detecting CNVs with nested structures.</description><identifier>ISSN: 0305-1048</identifier><identifier>EISSN: 1362-4962</identifier><identifier>DOI: 10.1093/nar/gku1363</identifier><identifier>PMID: 25618849</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Algorithms ; Base Composition ; Bias ; Case-Control Studies ; Databases, Nucleic Acid - statistics & numerical data ; DNA Copy Number Variations ; DNA Helicases - genetics ; DNA, Neoplasm - genetics ; Exome ; Female ; High-Throughput Nucleotide Sequencing - methods ; High-Throughput Nucleotide Sequencing - statistics & numerical data ; Humans ; Likelihood Functions ; Male ; Methods Online ; Neuroblastoma - genetics ; Nuclear Proteins - genetics ; Sequence Analysis, DNA - methods ; Sequence Analysis, DNA - statistics & numerical data ; X-linked Nuclear Protein</subject><ispartof>Nucleic acids research, 2015-03, Vol.43 (6), p.e39-e39</ispartof><rights>The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.</rights><rights>The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. 2015</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c480t-2e39d624299aa7f85ca381670bb62a23e682fa8b3f0b997a13eb4440a4dcf03a3</citedby><cites>FETCH-LOGICAL-c480t-2e39d624299aa7f85ca381670bb62a23e682fa8b3f0b997a13eb4440a4dcf03a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4381046/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC4381046/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/25618849$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Jiang, Yuchao</creatorcontrib><creatorcontrib>Oldridge, Derek A</creatorcontrib><creatorcontrib>Diskin, Sharon J</creatorcontrib><creatorcontrib>Zhang, Nancy R</creatorcontrib><title>CODEX: a normalization and copy number variation detection method for whole exome sequencing</title><title>Nucleic acids research</title><addtitle>Nucleic Acids Res</addtitle><description>High-throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of human diseases. Copy number variation (CNV) is an important type of genomic variation, but detecting and characterizing CNV from exome sequencing is challenging due to the high level of biases and artifacts. We propose CODEX, a normalization and CNV calling procedure for whole exome sequencing data. The Poisson latent factor model in CODEX includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based recursive segmentation procedure that explicitly models the count-based exome sequencing data. CODEX is compared to existing methods on a population analysis of HapMap samples from the 1000 Genomes Project, and shown to be more accurate on three microarray-based validation data sets. We further evaluate performance on 222 neuroblastoma samples with matched normals and focus on a well-studied rare somatic CNV within the ATRX gene. We show that the cross-sample normalization procedure of CODEX removes more noise than normalizing the tumor against the matched normal and that the segmentation procedure performs well in detecting CNVs with nested structures.</description><subject>Algorithms</subject><subject>Base Composition</subject><subject>Bias</subject><subject>Case-Control Studies</subject><subject>Databases, Nucleic Acid - statistics & numerical data</subject><subject>DNA Copy Number Variations</subject><subject>DNA Helicases - genetics</subject><subject>DNA, Neoplasm - genetics</subject><subject>Exome</subject><subject>Female</subject><subject>High-Throughput Nucleotide Sequencing - methods</subject><subject>High-Throughput Nucleotide Sequencing - statistics & numerical data</subject><subject>Humans</subject><subject>Likelihood Functions</subject><subject>Male</subject><subject>Methods Online</subject><subject>Neuroblastoma - genetics</subject><subject>Nuclear Proteins - genetics</subject><subject>Sequence Analysis, DNA - methods</subject><subject>Sequence Analysis, DNA - statistics & numerical data</subject><subject>X-linked Nuclear Protein</subject><issn>0305-1048</issn><issn>1362-4962</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNqFkc1LHEEQxZtgiBvNKXfpoxAm9tf2dHsQZGNiQPASwYPQ1PTU7I7OdK_dMybmr8_orpKccqqC9-PVKx4hHzn7zJmVRwHS0fJu5FLLN2Q2DVEoq8UOmTHJ5gVnyuyS9znfMsYVn6t3ZFfMNTdG2Rm5WVx-Obs-pkBDTD107W8Y2hgohJr6uH6kYewrTPQBUrtRahzQP289DqtY0yYm-nMVO6T4K_ZIM96PGHwblvvkbQNdxg_buUeuvp79WJwXF5ffvi9OLwqvDBsKgdLWWihhLUDZmLkHabguWVVpAUKiNqIBU8mGVdaWwCVWSikGqvYNkyD3yMnGdz1WPdYew5Cgc-vU9pAeXYTW_auEduWW8cGp6Q5TejI43BqkOIXPg-vb7LHrIGAcs-OlZVax0sr_o1pbI60U5YR-2qA-xZwTNq-JOHNP1bmpOretbqIP_n7ilX3pSv4BGB-XGw</recordid><startdate>20150331</startdate><enddate>20150331</enddate><creator>Jiang, Yuchao</creator><creator>Oldridge, Derek A</creator><creator>Diskin, Sharon J</creator><creator>Zhang, Nancy R</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7TM</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>RC3</scope><scope>5PM</scope></search><sort><creationdate>20150331</creationdate><title>CODEX: a normalization and copy number variation detection method for whole exome sequencing</title><author>Jiang, Yuchao ; Oldridge, Derek A ; Diskin, Sharon J ; Zhang, Nancy R</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c480t-2e39d624299aa7f85ca381670bb62a23e682fa8b3f0b997a13eb4440a4dcf03a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Algorithms</topic><topic>Base Composition</topic><topic>Bias</topic><topic>Case-Control Studies</topic><topic>Databases, Nucleic Acid - statistics & numerical data</topic><topic>DNA Copy Number Variations</topic><topic>DNA Helicases - genetics</topic><topic>DNA, Neoplasm - genetics</topic><topic>Exome</topic><topic>Female</topic><topic>High-Throughput Nucleotide Sequencing - methods</topic><topic>High-Throughput Nucleotide Sequencing - statistics & numerical data</topic><topic>Humans</topic><topic>Likelihood Functions</topic><topic>Male</topic><topic>Methods Online</topic><topic>Neuroblastoma - genetics</topic><topic>Nuclear Proteins - genetics</topic><topic>Sequence Analysis, DNA - methods</topic><topic>Sequence Analysis, DNA - statistics & numerical data</topic><topic>X-linked Nuclear Protein</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Yuchao</creatorcontrib><creatorcontrib>Oldridge, Derek A</creatorcontrib><creatorcontrib>Diskin, Sharon J</creatorcontrib><creatorcontrib>Zhang, Nancy R</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Nucleic Acids Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Nucleic acids research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Jiang, Yuchao</au><au>Oldridge, Derek A</au><au>Diskin, Sharon J</au><au>Zhang, Nancy R</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>CODEX: a normalization and copy number variation detection method for whole exome sequencing</atitle><jtitle>Nucleic acids research</jtitle><addtitle>Nucleic Acids Res</addtitle><date>2015-03-31</date><risdate>2015</risdate><volume>43</volume><issue>6</issue><spage>e39</spage><epage>e39</epage><pages>e39-e39</pages><issn>0305-1048</issn><eissn>1362-4962</eissn><abstract>High-throughput sequencing of DNA coding regions has become a common way of assaying genomic variation in the study of human diseases. Copy number variation (CNV) is an important type of genomic variation, but detecting and characterizing CNV from exome sequencing is challenging due to the high level of biases and artifacts. We propose CODEX, a normalization and CNV calling procedure for whole exome sequencing data. The Poisson latent factor model in CODEX includes terms that specifically remove biases due to GC content, exon capture and amplification efficiency, and latent systemic artifacts. CODEX also includes a Poisson likelihood-based recursive segmentation procedure that explicitly models the count-based exome sequencing data. CODEX is compared to existing methods on a population analysis of HapMap samples from the 1000 Genomes Project, and shown to be more accurate on three microarray-based validation data sets. We further evaluate performance on 222 neuroblastoma samples with matched normals and focus on a well-studied rare somatic CNV within the ATRX gene. We show that the cross-sample normalization procedure of CODEX removes more noise than normalizing the tumor against the matched normal and that the segmentation procedure performs well in detecting CNVs with nested structures.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>25618849</pmid><doi>10.1093/nar/gku1363</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0305-1048 |
ispartof | Nucleic acids research, 2015-03, Vol.43 (6), p.e39-e39 |
issn | 0305-1048 1362-4962 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_4381046 |
source | PubMed (Medline); Open Access: Oxford University Press Open Journals |
subjects | Algorithms Base Composition Bias Case-Control Studies Databases, Nucleic Acid - statistics & numerical data DNA Copy Number Variations DNA Helicases - genetics DNA, Neoplasm - genetics Exome Female High-Throughput Nucleotide Sequencing - methods High-Throughput Nucleotide Sequencing - statistics & numerical data Humans Likelihood Functions Male Methods Online Neuroblastoma - genetics Nuclear Proteins - genetics Sequence Analysis, DNA - methods Sequence Analysis, DNA - statistics & numerical data X-linked Nuclear Protein |
title | CODEX: a normalization and copy number variation detection method for whole exome sequencing |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-02T07%3A09%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=CODEX:%20a%20normalization%20and%20copy%20number%20variation%20detection%20method%20for%20whole%20exome%20sequencing&rft.jtitle=Nucleic%20acids%20research&rft.au=Jiang,%20Yuchao&rft.date=2015-03-31&rft.volume=43&rft.issue=6&rft.spage=e39&rft.epage=e39&rft.pages=e39-e39&rft.issn=0305-1048&rft.eissn=1362-4962&rft_id=info:doi/10.1093/nar/gku1363&rft_dat=%3Cproquest_pubme%3E1790940793%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c480t-2e39d624299aa7f85ca381670bb62a23e682fa8b3f0b997a13eb4440a4dcf03a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1669839327&rft_id=info:pmid/25618849&rfr_iscdi=true |