Loading…

Module-based prediction approach for robust inter-study predictions in microarray data

Motivation: Traditional genomic prediction models based on individual genes suffer from low reproducibility across microarray studies due to the lack of robustness to expression measurement noise and gene missingness when they are matched across platforms. It is common that some of the genes in the...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics 2010-10, Vol.26 (20), p.2586-2593
Main Authors: Mi, Zhibao, Shen, Kui, Song, Nan, Cheng, Chunrong, Song, Chi, Kaminski, Naftali, Tseng, George C.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c510t-9c73e4633ea5d6ae3dcae0302335c30152ffc5b73a23e001017f85de7c90156a3
cites cdi_FETCH-LOGICAL-c510t-9c73e4633ea5d6ae3dcae0302335c30152ffc5b73a23e001017f85de7c90156a3
container_end_page 2593
container_issue 20
container_start_page 2586
container_title Bioinformatics
container_volume 26
creator Mi, Zhibao
Shen, Kui
Song, Nan
Cheng, Chunrong
Song, Chi
Kaminski, Naftali
Tseng, George C.
description Motivation: Traditional genomic prediction models based on individual genes suffer from low reproducibility across microarray studies due to the lack of robustness to expression measurement noise and gene missingness when they are matched across platforms. It is common that some of the genes in the prediction model established in a training study cannot be matched to another test study because a different platform is applied. The failure of inter-study predictions has severely hindered the clinical applications of microarray. To overcome the drawbacks of traditional gene-based prediction (GBP) models, we propose a module-based prediction (MBP) strategy via unsupervised gene clustering. Results: K-means clustering is used to group genes sharing similar expression profiles into gene modules, and small modules are merged into their nearest neighbors. Conventional univariate or multivariate feature selection procedure is applied and a representative gene from each selected module is identified to construct the final prediction model. As a result, the prediction model is portable to any test study as long as partial genes in each module exist in the test study. We demonstrate that K-means cluster sizes generally follow a multinomial distribution and the failure probability of inter-study prediction due to missing genes is diminished by merging small clusters into their nearest neighbors. By simulation and applications of real datasets in inter-study predictions, we show that the proposed MBP provides slightly improved accuracy while is considerably more robust than traditional GBP. Availability: http://www.biostat.pitt.edu/bioinfo/ Contact: ctseng@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online.
doi_str_mv 10.1093/bioinformatics/btq472
format article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_2951088</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>758131070</sourcerecordid><originalsourceid>FETCH-LOGICAL-c510t-9c73e4633ea5d6ae3dcae0302335c30152ffc5b73a23e001017f85de7c90156a3</originalsourceid><addsrcrecordid>eNqFkUFv1DAQhS0EoqXlJ4ByQZxCx5k4Ti5ItNAWqaVCQIW4WBPHoYYkTm2nYv89RrvdtidOtvy-eTOex9gLDm84NHjQWmen3vmRotXhoI3XpSwesV1eVpAXIJrH6Y6VzMsacIc9C-EXgOBlWT5lOwVI3siK77LLc9ctg8lbCqbLZm86q6N1U0bz7B3pqyz1yLxrlxAzO0Xj8xCXbnUPDek9G61OuPe0yjqKtM-e9DQE83xz7rFvxx--Hp3mZxcnH4_eneVacIh5oyWaskI0JLqKDHaaDCAUiEIjcFH0vRatRCrQAHDgsq9FZ6RuklgR7rG3a995aUfTaTNFT4OavR3Jr5Qjqx4qk71SP92NKpo0QF0ng9cbA--uFxOiGm3QZhhoMm4JqhaiqkVV_J-UoubIQUIixZpMKwnBm347Dwf1Lzz1MDy1Di_Vvbz_mW3VbVoJeLUBKGgaek-TtuGOw7S5EkXi8jVnQzR_tjr536qSKIU6_f5DHcLJ-eWnL-_VZ_wLCRG5wQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>758131070</pqid></control><display><type>article</type><title>Module-based prediction approach for robust inter-study predictions in microarray data</title><source>PubMed (Medline)</source><source>Oxford Academic Journals (Open Access)</source><creator>Mi, Zhibao ; Shen, Kui ; Song, Nan ; Cheng, Chunrong ; Song, Chi ; Kaminski, Naftali ; Tseng, George C.</creator><creatorcontrib>Mi, Zhibao ; Shen, Kui ; Song, Nan ; Cheng, Chunrong ; Song, Chi ; Kaminski, Naftali ; Tseng, George C.</creatorcontrib><description>Motivation: Traditional genomic prediction models based on individual genes suffer from low reproducibility across microarray studies due to the lack of robustness to expression measurement noise and gene missingness when they are matched across platforms. It is common that some of the genes in the prediction model established in a training study cannot be matched to another test study because a different platform is applied. The failure of inter-study predictions has severely hindered the clinical applications of microarray. To overcome the drawbacks of traditional gene-based prediction (GBP) models, we propose a module-based prediction (MBP) strategy via unsupervised gene clustering. Results: K-means clustering is used to group genes sharing similar expression profiles into gene modules, and small modules are merged into their nearest neighbors. Conventional univariate or multivariate feature selection procedure is applied and a representative gene from each selected module is identified to construct the final prediction model. As a result, the prediction model is portable to any test study as long as partial genes in each module exist in the test study. We demonstrate that K-means cluster sizes generally follow a multinomial distribution and the failure probability of inter-study prediction due to missing genes is diminished by merging small clusters into their nearest neighbors. By simulation and applications of real datasets in inter-study predictions, we show that the proposed MBP provides slightly improved accuracy while is considerably more robust than traditional GBP. Availability: http://www.biostat.pitt.edu/bioinfo/ Contact: ctseng@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btq472</identifier><identifier>PMID: 20719761</identifier><language>eng</language><publisher>Oxford: Oxford University Press</publisher><subject>Biological and medical sciences ; Cluster Analysis ; Computational Biology - methods ; Databases, Factual ; Fundamental and applied biological sciences. Psychology ; Gene Expression Profiling - methods ; General aspects ; Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) ; Oligonucleotide Array Sequence Analysis - methods ; Original Paper</subject><ispartof>Bioinformatics, 2010-10, Vol.26 (20), p.2586-2593</ispartof><rights>2015 INIST-CNRS</rights><rights>The Author 2010. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 2010</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c510t-9c73e4633ea5d6ae3dcae0302335c30152ffc5b73a23e001017f85de7c90156a3</citedby><cites>FETCH-LOGICAL-c510t-9c73e4633ea5d6ae3dcae0302335c30152ffc5b73a23e001017f85de7c90156a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2951088/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2951088/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=23302435$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/20719761$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Mi, Zhibao</creatorcontrib><creatorcontrib>Shen, Kui</creatorcontrib><creatorcontrib>Song, Nan</creatorcontrib><creatorcontrib>Cheng, Chunrong</creatorcontrib><creatorcontrib>Song, Chi</creatorcontrib><creatorcontrib>Kaminski, Naftali</creatorcontrib><creatorcontrib>Tseng, George C.</creatorcontrib><title>Module-based prediction approach for robust inter-study predictions in microarray data</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Motivation: Traditional genomic prediction models based on individual genes suffer from low reproducibility across microarray studies due to the lack of robustness to expression measurement noise and gene missingness when they are matched across platforms. It is common that some of the genes in the prediction model established in a training study cannot be matched to another test study because a different platform is applied. The failure of inter-study predictions has severely hindered the clinical applications of microarray. To overcome the drawbacks of traditional gene-based prediction (GBP) models, we propose a module-based prediction (MBP) strategy via unsupervised gene clustering. Results: K-means clustering is used to group genes sharing similar expression profiles into gene modules, and small modules are merged into their nearest neighbors. Conventional univariate or multivariate feature selection procedure is applied and a representative gene from each selected module is identified to construct the final prediction model. As a result, the prediction model is portable to any test study as long as partial genes in each module exist in the test study. We demonstrate that K-means cluster sizes generally follow a multinomial distribution and the failure probability of inter-study prediction due to missing genes is diminished by merging small clusters into their nearest neighbors. By simulation and applications of real datasets in inter-study predictions, we show that the proposed MBP provides slightly improved accuracy while is considerably more robust than traditional GBP. Availability: http://www.biostat.pitt.edu/bioinfo/ Contact: ctseng@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online.</description><subject>Biological and medical sciences</subject><subject>Cluster Analysis</subject><subject>Computational Biology - methods</subject><subject>Databases, Factual</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Gene Expression Profiling - methods</subject><subject>General aspects</subject><subject>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</subject><subject>Oligonucleotide Array Sequence Analysis - methods</subject><subject>Original Paper</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><recordid>eNqFkUFv1DAQhS0EoqXlJ4ByQZxCx5k4Ti5ItNAWqaVCQIW4WBPHoYYkTm2nYv89RrvdtidOtvy-eTOex9gLDm84NHjQWmen3vmRotXhoI3XpSwesV1eVpAXIJrH6Y6VzMsacIc9C-EXgOBlWT5lOwVI3siK77LLc9ctg8lbCqbLZm86q6N1U0bz7B3pqyz1yLxrlxAzO0Xj8xCXbnUPDek9G61OuPe0yjqKtM-e9DQE83xz7rFvxx--Hp3mZxcnH4_eneVacIh5oyWaskI0JLqKDHaaDCAUiEIjcFH0vRatRCrQAHDgsq9FZ6RuklgR7rG3a995aUfTaTNFT4OavR3Jr5Qjqx4qk71SP92NKpo0QF0ng9cbA--uFxOiGm3QZhhoMm4JqhaiqkVV_J-UoubIQUIixZpMKwnBm347Dwf1Lzz1MDy1Di_Vvbz_mW3VbVoJeLUBKGgaek-TtuGOw7S5EkXi8jVnQzR_tjr536qSKIU6_f5DHcLJ-eWnL-_VZ_wLCRG5wQ</recordid><startdate>20101015</startdate><enddate>20101015</enddate><creator>Mi, Zhibao</creator><creator>Shen, Kui</creator><creator>Song, Nan</creator><creator>Cheng, Chunrong</creator><creator>Song, Chi</creator><creator>Kaminski, Naftali</creator><creator>Tseng, George C.</creator><general>Oxford University Press</general><scope>BSCLL</scope><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7QO</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>5PM</scope></search><sort><creationdate>20101015</creationdate><title>Module-based prediction approach for robust inter-study predictions in microarray data</title><author>Mi, Zhibao ; Shen, Kui ; Song, Nan ; Cheng, Chunrong ; Song, Chi ; Kaminski, Naftali ; Tseng, George C.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c510t-9c73e4633ea5d6ae3dcae0302335c30152ffc5b73a23e001017f85de7c90156a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Biological and medical sciences</topic><topic>Cluster Analysis</topic><topic>Computational Biology - methods</topic><topic>Databases, Factual</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Gene Expression Profiling - methods</topic><topic>General aspects</topic><topic>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</topic><topic>Oligonucleotide Array Sequence Analysis - methods</topic><topic>Original Paper</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mi, Zhibao</creatorcontrib><creatorcontrib>Shen, Kui</creatorcontrib><creatorcontrib>Song, Nan</creatorcontrib><creatorcontrib>Cheng, Chunrong</creatorcontrib><creatorcontrib>Song, Chi</creatorcontrib><creatorcontrib>Kaminski, Naftali</creatorcontrib><creatorcontrib>Tseng, George C.</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Biotechnology Research Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mi, Zhibao</au><au>Shen, Kui</au><au>Song, Nan</au><au>Cheng, Chunrong</au><au>Song, Chi</au><au>Kaminski, Naftali</au><au>Tseng, George C.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Module-based prediction approach for robust inter-study predictions in microarray data</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2010-10-15</date><risdate>2010</risdate><volume>26</volume><issue>20</issue><spage>2586</spage><epage>2593</epage><pages>2586-2593</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><abstract>Motivation: Traditional genomic prediction models based on individual genes suffer from low reproducibility across microarray studies due to the lack of robustness to expression measurement noise and gene missingness when they are matched across platforms. It is common that some of the genes in the prediction model established in a training study cannot be matched to another test study because a different platform is applied. The failure of inter-study predictions has severely hindered the clinical applications of microarray. To overcome the drawbacks of traditional gene-based prediction (GBP) models, we propose a module-based prediction (MBP) strategy via unsupervised gene clustering. Results: K-means clustering is used to group genes sharing similar expression profiles into gene modules, and small modules are merged into their nearest neighbors. Conventional univariate or multivariate feature selection procedure is applied and a representative gene from each selected module is identified to construct the final prediction model. As a result, the prediction model is portable to any test study as long as partial genes in each module exist in the test study. We demonstrate that K-means cluster sizes generally follow a multinomial distribution and the failure probability of inter-study prediction due to missing genes is diminished by merging small clusters into their nearest neighbors. By simulation and applications of real datasets in inter-study predictions, we show that the proposed MBP provides slightly improved accuracy while is considerably more robust than traditional GBP. Availability: http://www.biostat.pitt.edu/bioinfo/ Contact: ctseng@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online.</abstract><cop>Oxford</cop><pub>Oxford University Press</pub><pmid>20719761</pmid><doi>10.1093/bioinformatics/btq472</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2010-10, Vol.26 (20), p.2586-2593
issn 1367-4803
1460-2059
1367-4811
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_2951088
source PubMed (Medline); Oxford Academic Journals (Open Access)
subjects Biological and medical sciences
Cluster Analysis
Computational Biology - methods
Databases, Factual
Fundamental and applied biological sciences. Psychology
Gene Expression Profiling - methods
General aspects
Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)
Oligonucleotide Array Sequence Analysis - methods
Original Paper
title Module-based prediction approach for robust inter-study predictions in microarray data
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T21%3A52%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Module-based%20prediction%20approach%20for%20robust%20inter-study%20predictions%20in%20microarray%20data&rft.jtitle=Bioinformatics&rft.au=Mi,%20Zhibao&rft.date=2010-10-15&rft.volume=26&rft.issue=20&rft.spage=2586&rft.epage=2593&rft.pages=2586-2593&rft.issn=1367-4803&rft.eissn=1460-2059&rft_id=info:doi/10.1093/bioinformatics/btq472&rft_dat=%3Cproquest_pubme%3E758131070%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c510t-9c73e4633ea5d6ae3dcae0302335c30152ffc5b73a23e001017f85de7c90156a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=758131070&rft_id=info:pmid/20719761&rfr_iscdi=true