Loading…

Bayesian data integration and variable selection for pan‐cancer survival prediction using protein expression data

Accurate prognostic prediction using molecular information is a challenging area of research, which is essential to develop precision medicine. In this paper, we develop translational models to identify major actionable proteins that are associated with clinical outcomes, like the survival time of p...

Full description

Saved in:
Bibliographic Details
Published in:Biometrics 2020-03, Vol.76 (1), p.316-325
Main Authors: Maity, Arnab Kumar, Bhattacharya, Anirban, Mallick, Bani K., Baladandayuthapani, Veerabhadran
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c4482-17c0afb9f65491d62661a5112338cdd796fdce9698aa00f058b7fdca0c82e6623
cites cdi_FETCH-LOGICAL-c4482-17c0afb9f65491d62661a5112338cdd796fdce9698aa00f058b7fdca0c82e6623
container_end_page 325
container_issue 1
container_start_page 316
container_title Biometrics
container_volume 76
creator Maity, Arnab Kumar
Bhattacharya, Anirban
Mallick, Bani K.
Baladandayuthapani, Veerabhadran
description Accurate prognostic prediction using molecular information is a challenging area of research, which is essential to develop precision medicine. In this paper, we develop translational models to identify major actionable proteins that are associated with clinical outcomes, like the survival time of patients. There are considerable statistical and computational challenges due to the large dimension of the problems. Furthermore, data are available for different tumor types; hence data integration for various tumors is desirable. Having censored survival outcomes escalates one more level of complexity in the inferential procedure. We develop Bayesian hierarchical survival models, which accommodate all the challenges mentioned here. We use the hierarchical Bayesian accelerated failure time model for survival regression. Furthermore, we assume sparse horseshoe prior distribution for the regression coefficients to identify the major proteomic drivers. We borrow strength across tumor groups by introducing a correlation structure among the prior distributions. The proposed methods have been used to analyze data from the recently curated “The Cancer Proteome Atlas” (TCPA), which contains reverse‐phase protein arrays–based high‐quality protein expression data as well as detailed clinical annotation, including survival times. Our simulation and the TCPA data analysis illustrate the efficacy of the proposed integrative model, which links different tumors with the correlated prior structures.
doi_str_mv 10.1111/biom.13132
format article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7007312</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2375889737</sourcerecordid><originalsourceid>FETCH-LOGICAL-c4482-17c0afb9f65491d62661a5112338cdd796fdce9698aa00f058b7fdca0c82e6623</originalsourceid><addsrcrecordid>eNp9kc1u1DAUhS0EokNhwwMgS2wQUlr_JHayQaIVtJVadQMSO-vGuRlcZezBTgZmxyPwjDwJTlMqYIE3ls_5dO61DiHPOTvi-Ry3LmyOuORSPCArXpW8YKVgD8mKMaYKWfJPB-RJSjf52VRMPCYHmW0kY3JF0gnsMTnwtIMRqPMjriOMLngKvqM7iA7aAWnCAe2t3IdIt-B_fv9hwVuMNE1x53Yw0G3Ezi3QlJxfZyGM6DzFb9lKaTbmKU_Jox6GhM_u7kPy8f27D6fnxeX12cXp28vClmUtCq4tg75telWVDe-UUIpDxbmQsrZdpxvVdxYb1dQAjPWsqludFWC2FqiUkIfkzZK7ndoNZtaPEQazjW4DcW8COPO3491nsw47oxnTks8Br-4CYvgyYRrNxiWLwwAew5SMEJnkXIsmoy__QW_CFH3-nhFSV3XdaKkz9XqhbAwpRezvl-HMzF2auUtz22WGX_y5_j36u7wM8AX46gbc_yfKnFxcXy2hvwBRD61E</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2375889737</pqid></control><display><type>article</type><title>Bayesian data integration and variable selection for pan‐cancer survival prediction using protein expression data</title><source>Oxford Journals Online</source><source>SPORTDiscus with Full Text</source><creator>Maity, Arnab Kumar ; Bhattacharya, Anirban ; Mallick, Bani K. ; Baladandayuthapani, Veerabhadran</creator><creatorcontrib>Maity, Arnab Kumar ; Bhattacharya, Anirban ; Mallick, Bani K. ; Baladandayuthapani, Veerabhadran</creatorcontrib><description>Accurate prognostic prediction using molecular information is a challenging area of research, which is essential to develop precision medicine. In this paper, we develop translational models to identify major actionable proteins that are associated with clinical outcomes, like the survival time of patients. There are considerable statistical and computational challenges due to the large dimension of the problems. Furthermore, data are available for different tumor types; hence data integration for various tumors is desirable. Having censored survival outcomes escalates one more level of complexity in the inferential procedure. We develop Bayesian hierarchical survival models, which accommodate all the challenges mentioned here. We use the hierarchical Bayesian accelerated failure time model for survival regression. Furthermore, we assume sparse horseshoe prior distribution for the regression coefficients to identify the major proteomic drivers. We borrow strength across tumor groups by introducing a correlation structure among the prior distributions. The proposed methods have been used to analyze data from the recently curated “The Cancer Proteome Atlas” (TCPA), which contains reverse‐phase protein arrays–based high‐quality protein expression data as well as detailed clinical annotation, including survival times. Our simulation and the TCPA data analysis illustrate the efficacy of the proposed integrative model, which links different tumors with the correlated prior structures.</description><identifier>ISSN: 0006-341X</identifier><identifier>EISSN: 1541-0420</identifier><identifier>DOI: 10.1111/biom.13132</identifier><identifier>PMID: 31393003</identifier><language>eng</language><publisher>United States: Blackwell Publishing Ltd</publisher><subject>AFT regression ; Annotations ; Bayesian analysis ; borrowing strength ; Cancer ; Computer applications ; Computer simulation ; Data analysis ; Data integration ; Failure times ; horseshoe ; Integration ; Mathematical models ; pan‐cancer model ; Precision medicine ; Protein arrays ; Protein expression ; Proteins ; Proteomes ; Regression analysis ; Regression coefficients ; Regression models ; Statistical analysis ; Survival ; TCPA ; Tumors</subject><ispartof>Biometrics, 2020-03, Vol.76 (1), p.316-325</ispartof><rights>2019 The International Biometric Society</rights><rights>2019 The International Biometric Society.</rights><rights>2020 The International Biometric Society</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c4482-17c0afb9f65491d62661a5112338cdd796fdce9698aa00f058b7fdca0c82e6623</citedby><cites>FETCH-LOGICAL-c4482-17c0afb9f65491d62661a5112338cdd796fdce9698aa00f058b7fdca0c82e6623</cites><orcidid>0000-0002-6692-0155</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/31393003$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Maity, Arnab Kumar</creatorcontrib><creatorcontrib>Bhattacharya, Anirban</creatorcontrib><creatorcontrib>Mallick, Bani K.</creatorcontrib><creatorcontrib>Baladandayuthapani, Veerabhadran</creatorcontrib><title>Bayesian data integration and variable selection for pan‐cancer survival prediction using protein expression data</title><title>Biometrics</title><addtitle>Biometrics</addtitle><description>Accurate prognostic prediction using molecular information is a challenging area of research, which is essential to develop precision medicine. In this paper, we develop translational models to identify major actionable proteins that are associated with clinical outcomes, like the survival time of patients. There are considerable statistical and computational challenges due to the large dimension of the problems. Furthermore, data are available for different tumor types; hence data integration for various tumors is desirable. Having censored survival outcomes escalates one more level of complexity in the inferential procedure. We develop Bayesian hierarchical survival models, which accommodate all the challenges mentioned here. We use the hierarchical Bayesian accelerated failure time model for survival regression. Furthermore, we assume sparse horseshoe prior distribution for the regression coefficients to identify the major proteomic drivers. We borrow strength across tumor groups by introducing a correlation structure among the prior distributions. The proposed methods have been used to analyze data from the recently curated “The Cancer Proteome Atlas” (TCPA), which contains reverse‐phase protein arrays–based high‐quality protein expression data as well as detailed clinical annotation, including survival times. Our simulation and the TCPA data analysis illustrate the efficacy of the proposed integrative model, which links different tumors with the correlated prior structures.</description><subject>AFT regression</subject><subject>Annotations</subject><subject>Bayesian analysis</subject><subject>borrowing strength</subject><subject>Cancer</subject><subject>Computer applications</subject><subject>Computer simulation</subject><subject>Data analysis</subject><subject>Data integration</subject><subject>Failure times</subject><subject>horseshoe</subject><subject>Integration</subject><subject>Mathematical models</subject><subject>pan‐cancer model</subject><subject>Precision medicine</subject><subject>Protein arrays</subject><subject>Protein expression</subject><subject>Proteins</subject><subject>Proteomes</subject><subject>Regression analysis</subject><subject>Regression coefficients</subject><subject>Regression models</subject><subject>Statistical analysis</subject><subject>Survival</subject><subject>TCPA</subject><subject>Tumors</subject><issn>0006-341X</issn><issn>1541-0420</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNp9kc1u1DAUhS0EokNhwwMgS2wQUlr_JHayQaIVtJVadQMSO-vGuRlcZezBTgZmxyPwjDwJTlMqYIE3ls_5dO61DiHPOTvi-Ry3LmyOuORSPCArXpW8YKVgD8mKMaYKWfJPB-RJSjf52VRMPCYHmW0kY3JF0gnsMTnwtIMRqPMjriOMLngKvqM7iA7aAWnCAe2t3IdIt-B_fv9hwVuMNE1x53Yw0G3Ezi3QlJxfZyGM6DzFb9lKaTbmKU_Jox6GhM_u7kPy8f27D6fnxeX12cXp28vClmUtCq4tg75telWVDe-UUIpDxbmQsrZdpxvVdxYb1dQAjPWsqludFWC2FqiUkIfkzZK7ndoNZtaPEQazjW4DcW8COPO3491nsw47oxnTks8Br-4CYvgyYRrNxiWLwwAew5SMEJnkXIsmoy__QW_CFH3-nhFSV3XdaKkz9XqhbAwpRezvl-HMzF2auUtz22WGX_y5_j36u7wM8AX46gbc_yfKnFxcXy2hvwBRD61E</recordid><startdate>202003</startdate><enddate>202003</enddate><creator>Maity, Arnab Kumar</creator><creator>Bhattacharya, Anirban</creator><creator>Mallick, Bani K.</creator><creator>Baladandayuthapani, Veerabhadran</creator><general>Blackwell Publishing Ltd</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope><scope>7X8</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0002-6692-0155</orcidid></search><sort><creationdate>202003</creationdate><title>Bayesian data integration and variable selection for pan‐cancer survival prediction using protein expression data</title><author>Maity, Arnab Kumar ; Bhattacharya, Anirban ; Mallick, Bani K. ; Baladandayuthapani, Veerabhadran</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c4482-17c0afb9f65491d62661a5112338cdd796fdce9698aa00f058b7fdca0c82e6623</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>AFT regression</topic><topic>Annotations</topic><topic>Bayesian analysis</topic><topic>borrowing strength</topic><topic>Cancer</topic><topic>Computer applications</topic><topic>Computer simulation</topic><topic>Data analysis</topic><topic>Data integration</topic><topic>Failure times</topic><topic>horseshoe</topic><topic>Integration</topic><topic>Mathematical models</topic><topic>pan‐cancer model</topic><topic>Precision medicine</topic><topic>Protein arrays</topic><topic>Protein expression</topic><topic>Proteins</topic><topic>Proteomes</topic><topic>Regression analysis</topic><topic>Regression coefficients</topic><topic>Regression models</topic><topic>Statistical analysis</topic><topic>Survival</topic><topic>TCPA</topic><topic>Tumors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Maity, Arnab Kumar</creatorcontrib><creatorcontrib>Bhattacharya, Anirban</creatorcontrib><creatorcontrib>Mallick, Bani K.</creatorcontrib><creatorcontrib>Baladandayuthapani, Veerabhadran</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Biometrics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Maity, Arnab Kumar</au><au>Bhattacharya, Anirban</au><au>Mallick, Bani K.</au><au>Baladandayuthapani, Veerabhadran</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Bayesian data integration and variable selection for pan‐cancer survival prediction using protein expression data</atitle><jtitle>Biometrics</jtitle><addtitle>Biometrics</addtitle><date>2020-03</date><risdate>2020</risdate><volume>76</volume><issue>1</issue><spage>316</spage><epage>325</epage><pages>316-325</pages><issn>0006-341X</issn><eissn>1541-0420</eissn><abstract>Accurate prognostic prediction using molecular information is a challenging area of research, which is essential to develop precision medicine. In this paper, we develop translational models to identify major actionable proteins that are associated with clinical outcomes, like the survival time of patients. There are considerable statistical and computational challenges due to the large dimension of the problems. Furthermore, data are available for different tumor types; hence data integration for various tumors is desirable. Having censored survival outcomes escalates one more level of complexity in the inferential procedure. We develop Bayesian hierarchical survival models, which accommodate all the challenges mentioned here. We use the hierarchical Bayesian accelerated failure time model for survival regression. Furthermore, we assume sparse horseshoe prior distribution for the regression coefficients to identify the major proteomic drivers. We borrow strength across tumor groups by introducing a correlation structure among the prior distributions. The proposed methods have been used to analyze data from the recently curated “The Cancer Proteome Atlas” (TCPA), which contains reverse‐phase protein arrays–based high‐quality protein expression data as well as detailed clinical annotation, including survival times. Our simulation and the TCPA data analysis illustrate the efficacy of the proposed integrative model, which links different tumors with the correlated prior structures.</abstract><cop>United States</cop><pub>Blackwell Publishing Ltd</pub><pmid>31393003</pmid><doi>10.1111/biom.13132</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0002-6692-0155</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0006-341X
ispartof Biometrics, 2020-03, Vol.76 (1), p.316-325
issn 0006-341X
1541-0420
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_7007312
source Oxford Journals Online; SPORTDiscus with Full Text
subjects AFT regression
Annotations
Bayesian analysis
borrowing strength
Cancer
Computer applications
Computer simulation
Data analysis
Data integration
Failure times
horseshoe
Integration
Mathematical models
pan‐cancer model
Precision medicine
Protein arrays
Protein expression
Proteins
Proteomes
Regression analysis
Regression coefficients
Regression models
Statistical analysis
Survival
TCPA
Tumors
title Bayesian data integration and variable selection for pan‐cancer survival prediction using protein expression data
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-03T23%3A24%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Bayesian%20data%20integration%20and%20variable%20selection%20for%20pan%E2%80%90cancer%20survival%20prediction%20using%20protein%20expression%20data&rft.jtitle=Biometrics&rft.au=Maity,%20Arnab%20Kumar&rft.date=2020-03&rft.volume=76&rft.issue=1&rft.spage=316&rft.epage=325&rft.pages=316-325&rft.issn=0006-341X&rft.eissn=1541-0420&rft_id=info:doi/10.1111/biom.13132&rft_dat=%3Cproquest_pubme%3E2375889737%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c4482-17c0afb9f65491d62661a5112338cdd796fdce9698aa00f058b7fdca0c82e6623%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2375889737&rft_id=info:pmid/31393003&rfr_iscdi=true