Loading…

A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data

Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate...

Full description

Saved in:
Bibliographic Details
Published in:BMC bioinformatics 2018-04, Vol.19 (1), p.147-147, Article 147
Main Authors: Bertl, Johanna, Guo, Qianyun, Juul, Malene, Besenbacher, Søren, Nielsen, Morten Muhlig, Hornshøj, Henrik, Pedersen, Jakob Skou, Hobolth, Asger
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c465t-3d9ae9c86dea661a95640a5ec9cfb86e4f035de91c04fa4eb443305bbcac6d613
cites cdi_FETCH-LOGICAL-c465t-3d9ae9c86dea661a95640a5ec9cfb86e4f035de91c04fa4eb443305bbcac6d613
container_end_page 147
container_issue 1
container_start_page 147
container_title BMC bioinformatics
container_volume 19
creator Bertl, Johanna
Guo, Qianyun
Juul, Malene
Besenbacher, Søren
Nielsen, Morten Muhlig
Hornshøj, Henrik
Pedersen, Jakob Skou
Hobolth, Asger
description Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.
doi_str_mv 10.1186/s12859-018-2141-2
format article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_1b55cf0177f64be48276f4a107619169</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_1b55cf0177f64be48276f4a107619169</doaj_id><sourcerecordid>2028952619</sourcerecordid><originalsourceid>FETCH-LOGICAL-c465t-3d9ae9c86dea661a95640a5ec9cfb86e4f035de91c04fa4eb443305bbcac6d613</originalsourceid><addsrcrecordid>eNpVkc1u1TAQhS1ERUvLA7BBXrIJeBzbiTdIVcVPpUps6NqaOON7UyXxxU5AfXuc3lK1C2tG4zPfjH0Yew_iE0BrPmeQrbaVgLaSoKCSr9gZqKYkIPTrZ_kpe5vznRDQtEK_YafSmqauQZ0xf8nzsBDPB_JDGDyfYk8jx7kvB8f7PGQeA1_2xGdal4Qjz3HCZVOuS4lx5gkLYJj5330cqdrRHCfiHmdPife44AU7CThmevcYz9ntt6-_rn5UNz-_X19d3lReGb1UdW-RrG9NT2gMoNVGCdTkrQ9da0gFUeueLHihAirqlKprobvOoze9gfqcXR-5fcQ7d0jDhOneRRzcQyGmncNUNh_JQae1D-U_mmBUR6qVjQkKQTQGLBhbWF-OrMPaTdR7mre3v4C-vJmHvdvFP05bYaXeAB8fASn-Xikvbhqyp3HEmeKanRSytVqWcUUKR6lPMedE4WkMCLcZ7Y5Gu2K024x2svR8eL7fU8d_Z-t_P4-lIQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2028952619</pqid></control><display><type>article</type><title>A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data</title><source>Open Access: PubMed Central</source><source>Publicly Available Content Database</source><creator>Bertl, Johanna ; Guo, Qianyun ; Juul, Malene ; Besenbacher, Søren ; Nielsen, Morten Muhlig ; Hornshøj, Henrik ; Pedersen, Jakob Skou ; Hobolth, Asger</creator><creatorcontrib>Bertl, Johanna ; Guo, Qianyun ; Juul, Malene ; Besenbacher, Søren ; Nielsen, Morten Muhlig ; Hornshøj, Henrik ; Pedersen, Jakob Skou ; Hobolth, Asger</creatorcontrib><description>Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.</description><identifier>ISSN: 1471-2105</identifier><identifier>EISSN: 1471-2105</identifier><identifier>DOI: 10.1186/s12859-018-2141-2</identifier><identifier>PMID: 29673314</identifier><language>eng</language><publisher>England: BioMed Central</publisher><subject>Multinomial logistic regression ; Site-specific model ; Somatic cancer mutations</subject><ispartof>BMC bioinformatics, 2018-04, Vol.19 (1), p.147-147, Article 147</ispartof><rights>The Author(s) 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c465t-3d9ae9c86dea661a95640a5ec9cfb86e4f035de91c04fa4eb443305bbcac6d613</citedby><cites>FETCH-LOGICAL-c465t-3d9ae9c86dea661a95640a5ec9cfb86e4f035de91c04fa4eb443305bbcac6d613</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5909259/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC5909259/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,37013,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/29673314$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Bertl, Johanna</creatorcontrib><creatorcontrib>Guo, Qianyun</creatorcontrib><creatorcontrib>Juul, Malene</creatorcontrib><creatorcontrib>Besenbacher, Søren</creatorcontrib><creatorcontrib>Nielsen, Morten Muhlig</creatorcontrib><creatorcontrib>Hornshøj, Henrik</creatorcontrib><creatorcontrib>Pedersen, Jakob Skou</creatorcontrib><creatorcontrib>Hobolth, Asger</creatorcontrib><title>A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data</title><title>BMC bioinformatics</title><addtitle>BMC Bioinformatics</addtitle><description>Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.</description><subject>Multinomial logistic regression</subject><subject>Site-specific model</subject><subject>Somatic cancer mutations</subject><issn>1471-2105</issn><issn>1471-2105</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><sourceid>DOA</sourceid><recordid>eNpVkc1u1TAQhS1ERUvLA7BBXrIJeBzbiTdIVcVPpUps6NqaOON7UyXxxU5AfXuc3lK1C2tG4zPfjH0Yew_iE0BrPmeQrbaVgLaSoKCSr9gZqKYkIPTrZ_kpe5vznRDQtEK_YafSmqauQZ0xf8nzsBDPB_JDGDyfYk8jx7kvB8f7PGQeA1_2xGdal4Qjz3HCZVOuS4lx5gkLYJj5330cqdrRHCfiHmdPife44AU7CThmevcYz9ntt6-_rn5UNz-_X19d3lReGb1UdW-RrG9NT2gMoNVGCdTkrQ9da0gFUeueLHihAirqlKprobvOoze9gfqcXR-5fcQ7d0jDhOneRRzcQyGmncNUNh_JQae1D-U_mmBUR6qVjQkKQTQGLBhbWF-OrMPaTdR7mre3v4C-vJmHvdvFP05bYaXeAB8fASn-Xikvbhqyp3HEmeKanRSytVqWcUUKR6lPMedE4WkMCLcZ7Y5Gu2K024x2svR8eL7fU8d_Z-t_P4-lIQ</recordid><startdate>20180419</startdate><enddate>20180419</enddate><creator>Bertl, Johanna</creator><creator>Guo, Qianyun</creator><creator>Juul, Malene</creator><creator>Besenbacher, Søren</creator><creator>Nielsen, Morten Muhlig</creator><creator>Hornshøj, Henrik</creator><creator>Pedersen, Jakob Skou</creator><creator>Hobolth, Asger</creator><general>BioMed Central</general><general>BMC</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20180419</creationdate><title>A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data</title><author>Bertl, Johanna ; Guo, Qianyun ; Juul, Malene ; Besenbacher, Søren ; Nielsen, Morten Muhlig ; Hornshøj, Henrik ; Pedersen, Jakob Skou ; Hobolth, Asger</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c465t-3d9ae9c86dea661a95640a5ec9cfb86e4f035de91c04fa4eb443305bbcac6d613</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Multinomial logistic regression</topic><topic>Site-specific model</topic><topic>Somatic cancer mutations</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bertl, Johanna</creatorcontrib><creatorcontrib>Guo, Qianyun</creatorcontrib><creatorcontrib>Juul, Malene</creatorcontrib><creatorcontrib>Besenbacher, Søren</creatorcontrib><creatorcontrib>Nielsen, Morten Muhlig</creatorcontrib><creatorcontrib>Hornshøj, Henrik</creatorcontrib><creatorcontrib>Pedersen, Jakob Skou</creatorcontrib><creatorcontrib>Hobolth, Asger</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Open Access: DOAJ - Directory of Open Access Journals</collection><jtitle>BMC bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bertl, Johanna</au><au>Guo, Qianyun</au><au>Juul, Malene</au><au>Besenbacher, Søren</au><au>Nielsen, Morten Muhlig</au><au>Hornshøj, Henrik</au><au>Pedersen, Jakob Skou</au><au>Hobolth, Asger</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data</atitle><jtitle>BMC bioinformatics</jtitle><addtitle>BMC Bioinformatics</addtitle><date>2018-04-19</date><risdate>2018</risdate><volume>19</volume><issue>1</issue><spage>147</spage><epage>147</epage><pages>147-147</pages><artnum>147</artnum><issn>1471-2105</issn><eissn>1471-2105</eissn><abstract>Detailed modelling of the neutral mutational process in cancer cells is crucial for identifying driver mutations and understanding the mutational mechanisms that act during cancer development. The neutral mutational process is very complex: whole-genome analyses have revealed that the mutation rate differs between cancer types, between patients and along the genome depending on the genetic and epigenetic context. Therefore, methods that predict the number of different types of mutations in regions or specific genomic elements must consider local genomic explanatory variables. A major drawback of most methods is the need to average the explanatory variables across the entire region or genomic element. This procedure is particularly problematic if the explanatory variable varies dramatically in the element under consideration. To take into account the fine scale of the explanatory variables, we model the probabilities of different types of mutations for each position in the genome by multinomial logistic regression. We analyse 505 cancer genomes from 14 different cancer types and compare the performance in predicting mutation rate for both regional based models and site-specific models. We show that for 1000 randomly selected genomic positions, the site-specific model predicts the mutation rate much better than regional based models. We use a forward selection procedure to identify the most important explanatory variables. The procedure identifies site-specific conservation (phyloP), replication timing, and expression level as the best predictors for the mutation rate. Finally, our model confirms and quantifies certain well-known mutational signatures. We find that our site-specific multinomial regression model outperforms the regional based models. The possibility of including genomic variables on different scales and patient specific variables makes it a versatile framework for studying different mutational mechanisms. Our model can serve as the neutral null model for the mutational process; regions that deviate from the null model are candidates for elements that drive cancer development.</abstract><cop>England</cop><pub>BioMed Central</pub><pmid>29673314</pmid><doi>10.1186/s12859-018-2141-2</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1471-2105
ispartof BMC bioinformatics, 2018-04, Vol.19 (1), p.147-147, Article 147
issn 1471-2105
1471-2105
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_1b55cf0177f64be48276f4a107619169
source Open Access: PubMed Central; Publicly Available Content Database
subjects Multinomial logistic regression
Site-specific model
Somatic cancer mutations
title A site specific model and analysis of the neutral somatic mutation rate in whole-genome cancer data
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T19%3A48%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20site%20specific%20model%20and%20analysis%20of%20the%20neutral%20somatic%20mutation%20rate%20in%20whole-genome%20cancer%20data&rft.jtitle=BMC%20bioinformatics&rft.au=Bertl,%20Johanna&rft.date=2018-04-19&rft.volume=19&rft.issue=1&rft.spage=147&rft.epage=147&rft.pages=147-147&rft.artnum=147&rft.issn=1471-2105&rft.eissn=1471-2105&rft_id=info:doi/10.1186/s12859-018-2141-2&rft_dat=%3Cproquest_doaj_%3E2028952619%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c465t-3d9ae9c86dea661a95640a5ec9cfb86e4f035de91c04fa4eb443305bbcac6d613%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2028952619&rft_id=info:pmid/29673314&rfr_iscdi=true