Loading…
Module-based prediction approach for robust inter-study predictions in microarray data
Motivation: Traditional genomic prediction models based on individual genes suffer from low reproducibility across microarray studies due to the lack of robustness to expression measurement noise and gene missingness when they are matched across platforms. It is common that some of the genes in the...
Saved in:
Published in: | Bioinformatics 2010-10, Vol.26 (20), p.2586-2593 |
---|---|
Main Authors: | , , , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c510t-9c73e4633ea5d6ae3dcae0302335c30152ffc5b73a23e001017f85de7c90156a3 |
---|---|
cites | cdi_FETCH-LOGICAL-c510t-9c73e4633ea5d6ae3dcae0302335c30152ffc5b73a23e001017f85de7c90156a3 |
container_end_page | 2593 |
container_issue | 20 |
container_start_page | 2586 |
container_title | Bioinformatics |
container_volume | 26 |
creator | Mi, Zhibao Shen, Kui Song, Nan Cheng, Chunrong Song, Chi Kaminski, Naftali Tseng, George C. |
description | Motivation: Traditional genomic prediction models based on individual genes suffer from low reproducibility across microarray studies due to the lack of robustness to expression measurement noise and gene missingness when they are matched across platforms. It is common that some of the genes in the prediction model established in a training study cannot be matched to another test study because a different platform is applied. The failure of inter-study predictions has severely hindered the clinical applications of microarray. To overcome the drawbacks of traditional gene-based prediction (GBP) models, we propose a module-based prediction (MBP) strategy via unsupervised gene clustering. Results: K-means clustering is used to group genes sharing similar expression profiles into gene modules, and small modules are merged into their nearest neighbors. Conventional univariate or multivariate feature selection procedure is applied and a representative gene from each selected module is identified to construct the final prediction model. As a result, the prediction model is portable to any test study as long as partial genes in each module exist in the test study. We demonstrate that K-means cluster sizes generally follow a multinomial distribution and the failure probability of inter-study prediction due to missing genes is diminished by merging small clusters into their nearest neighbors. By simulation and applications of real datasets in inter-study predictions, we show that the proposed MBP provides slightly improved accuracy while is considerably more robust than traditional GBP. Availability: http://www.biostat.pitt.edu/bioinfo/ Contact: ctseng@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online. |
doi_str_mv | 10.1093/bioinformatics/btq472 |
format | article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_2951088</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>758131070</sourcerecordid><originalsourceid>FETCH-LOGICAL-c510t-9c73e4633ea5d6ae3dcae0302335c30152ffc5b73a23e001017f85de7c90156a3</originalsourceid><addsrcrecordid>eNqFkUFv1DAQhS0EoqXlJ4ByQZxCx5k4Ti5ItNAWqaVCQIW4WBPHoYYkTm2nYv89RrvdtidOtvy-eTOex9gLDm84NHjQWmen3vmRotXhoI3XpSwesV1eVpAXIJrH6Y6VzMsacIc9C-EXgOBlWT5lOwVI3siK77LLc9ctg8lbCqbLZm86q6N1U0bz7B3pqyz1yLxrlxAzO0Xj8xCXbnUPDek9G61OuPe0yjqKtM-e9DQE83xz7rFvxx--Hp3mZxcnH4_eneVacIh5oyWaskI0JLqKDHaaDCAUiEIjcFH0vRatRCrQAHDgsq9FZ6RuklgR7rG3a995aUfTaTNFT4OavR3Jr5Qjqx4qk71SP92NKpo0QF0ng9cbA--uFxOiGm3QZhhoMm4JqhaiqkVV_J-UoubIQUIixZpMKwnBm347Dwf1Lzz1MDy1Di_Vvbz_mW3VbVoJeLUBKGgaek-TtuGOw7S5EkXi8jVnQzR_tjr536qSKIU6_f5DHcLJ-eWnL-_VZ_wLCRG5wQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>758131070</pqid></control><display><type>article</type><title>Module-based prediction approach for robust inter-study predictions in microarray data</title><source>PubMed (Medline)</source><source>Oxford Academic Journals (Open Access)</source><creator>Mi, Zhibao ; Shen, Kui ; Song, Nan ; Cheng, Chunrong ; Song, Chi ; Kaminski, Naftali ; Tseng, George C.</creator><creatorcontrib>Mi, Zhibao ; Shen, Kui ; Song, Nan ; Cheng, Chunrong ; Song, Chi ; Kaminski, Naftali ; Tseng, George C.</creatorcontrib><description>Motivation: Traditional genomic prediction models based on individual genes suffer from low reproducibility across microarray studies due to the lack of robustness to expression measurement noise and gene missingness when they are matched across platforms. It is common that some of the genes in the prediction model established in a training study cannot be matched to another test study because a different platform is applied. The failure of inter-study predictions has severely hindered the clinical applications of microarray. To overcome the drawbacks of traditional gene-based prediction (GBP) models, we propose a module-based prediction (MBP) strategy via unsupervised gene clustering. Results: K-means clustering is used to group genes sharing similar expression profiles into gene modules, and small modules are merged into their nearest neighbors. Conventional univariate or multivariate feature selection procedure is applied and a representative gene from each selected module is identified to construct the final prediction model. As a result, the prediction model is portable to any test study as long as partial genes in each module exist in the test study. We demonstrate that K-means cluster sizes generally follow a multinomial distribution and the failure probability of inter-study prediction due to missing genes is diminished by merging small clusters into their nearest neighbors. By simulation and applications of real datasets in inter-study predictions, we show that the proposed MBP provides slightly improved accuracy while is considerably more robust than traditional GBP. Availability: http://www.biostat.pitt.edu/bioinfo/ Contact: ctseng@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btq472</identifier><identifier>PMID: 20719761</identifier><language>eng</language><publisher>Oxford: Oxford University Press</publisher><subject>Biological and medical sciences ; Cluster Analysis ; Computational Biology - methods ; Databases, Factual ; Fundamental and applied biological sciences. Psychology ; Gene Expression Profiling - methods ; General aspects ; Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) ; Oligonucleotide Array Sequence Analysis - methods ; Original Paper</subject><ispartof>Bioinformatics, 2010-10, Vol.26 (20), p.2586-2593</ispartof><rights>2015 INIST-CNRS</rights><rights>The Author 2010. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oxfordjournals.org 2010</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c510t-9c73e4633ea5d6ae3dcae0302335c30152ffc5b73a23e001017f85de7c90156a3</citedby><cites>FETCH-LOGICAL-c510t-9c73e4633ea5d6ae3dcae0302335c30152ffc5b73a23e001017f85de7c90156a3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2951088/pdf/$$EPDF$$P50$$Gpubmedcentral$$H</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC2951088/$$EHTML$$P50$$Gpubmedcentral$$H</linktohtml><link.rule.ids>230,314,727,780,784,885,27924,27925,53791,53793</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=23302435$$DView record in Pascal Francis$$Hfree_for_read</backlink><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/20719761$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Mi, Zhibao</creatorcontrib><creatorcontrib>Shen, Kui</creatorcontrib><creatorcontrib>Song, Nan</creatorcontrib><creatorcontrib>Cheng, Chunrong</creatorcontrib><creatorcontrib>Song, Chi</creatorcontrib><creatorcontrib>Kaminski, Naftali</creatorcontrib><creatorcontrib>Tseng, George C.</creatorcontrib><title>Module-based prediction approach for robust inter-study predictions in microarray data</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Motivation: Traditional genomic prediction models based on individual genes suffer from low reproducibility across microarray studies due to the lack of robustness to expression measurement noise and gene missingness when they are matched across platforms. It is common that some of the genes in the prediction model established in a training study cannot be matched to another test study because a different platform is applied. The failure of inter-study predictions has severely hindered the clinical applications of microarray. To overcome the drawbacks of traditional gene-based prediction (GBP) models, we propose a module-based prediction (MBP) strategy via unsupervised gene clustering. Results: K-means clustering is used to group genes sharing similar expression profiles into gene modules, and small modules are merged into their nearest neighbors. Conventional univariate or multivariate feature selection procedure is applied and a representative gene from each selected module is identified to construct the final prediction model. As a result, the prediction model is portable to any test study as long as partial genes in each module exist in the test study. We demonstrate that K-means cluster sizes generally follow a multinomial distribution and the failure probability of inter-study prediction due to missing genes is diminished by merging small clusters into their nearest neighbors. By simulation and applications of real datasets in inter-study predictions, we show that the proposed MBP provides slightly improved accuracy while is considerably more robust than traditional GBP. Availability: http://www.biostat.pitt.edu/bioinfo/ Contact: ctseng@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online.</description><subject>Biological and medical sciences</subject><subject>Cluster Analysis</subject><subject>Computational Biology - methods</subject><subject>Databases, Factual</subject><subject>Fundamental and applied biological sciences. Psychology</subject><subject>Gene Expression Profiling - methods</subject><subject>General aspects</subject><subject>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</subject><subject>Oligonucleotide Array Sequence Analysis - methods</subject><subject>Original Paper</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><recordid>eNqFkUFv1DAQhS0EoqXlJ4ByQZxCx5k4Ti5ItNAWqaVCQIW4WBPHoYYkTm2nYv89RrvdtidOtvy-eTOex9gLDm84NHjQWmen3vmRotXhoI3XpSwesV1eVpAXIJrH6Y6VzMsacIc9C-EXgOBlWT5lOwVI3siK77LLc9ctg8lbCqbLZm86q6N1U0bz7B3pqyz1yLxrlxAzO0Xj8xCXbnUPDek9G61OuPe0yjqKtM-e9DQE83xz7rFvxx--Hp3mZxcnH4_eneVacIh5oyWaskI0JLqKDHaaDCAUiEIjcFH0vRatRCrQAHDgsq9FZ6RuklgR7rG3a995aUfTaTNFT4OavR3Jr5Qjqx4qk71SP92NKpo0QF0ng9cbA--uFxOiGm3QZhhoMm4JqhaiqkVV_J-UoubIQUIixZpMKwnBm347Dwf1Lzz1MDy1Di_Vvbz_mW3VbVoJeLUBKGgaek-TtuGOw7S5EkXi8jVnQzR_tjr536qSKIU6_f5DHcLJ-eWnL-_VZ_wLCRG5wQ</recordid><startdate>20101015</startdate><enddate>20101015</enddate><creator>Mi, Zhibao</creator><creator>Shen, Kui</creator><creator>Song, Nan</creator><creator>Cheng, Chunrong</creator><creator>Song, Chi</creator><creator>Kaminski, Naftali</creator><creator>Tseng, George C.</creator><general>Oxford University Press</general><scope>BSCLL</scope><scope>IQODW</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7QO</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>5PM</scope></search><sort><creationdate>20101015</creationdate><title>Module-based prediction approach for robust inter-study predictions in microarray data</title><author>Mi, Zhibao ; Shen, Kui ; Song, Nan ; Cheng, Chunrong ; Song, Chi ; Kaminski, Naftali ; Tseng, George C.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c510t-9c73e4633ea5d6ae3dcae0302335c30152ffc5b73a23e001017f85de7c90156a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Biological and medical sciences</topic><topic>Cluster Analysis</topic><topic>Computational Biology - methods</topic><topic>Databases, Factual</topic><topic>Fundamental and applied biological sciences. Psychology</topic><topic>Gene Expression Profiling - methods</topic><topic>General aspects</topic><topic>Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects)</topic><topic>Oligonucleotide Array Sequence Analysis - methods</topic><topic>Original Paper</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Mi, Zhibao</creatorcontrib><creatorcontrib>Shen, Kui</creatorcontrib><creatorcontrib>Song, Nan</creatorcontrib><creatorcontrib>Cheng, Chunrong</creatorcontrib><creatorcontrib>Song, Chi</creatorcontrib><creatorcontrib>Kaminski, Naftali</creatorcontrib><creatorcontrib>Tseng, George C.</creatorcontrib><collection>Istex</collection><collection>Pascal-Francis</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Biotechnology Research Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Mi, Zhibao</au><au>Shen, Kui</au><au>Song, Nan</au><au>Cheng, Chunrong</au><au>Song, Chi</au><au>Kaminski, Naftali</au><au>Tseng, George C.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Module-based prediction approach for robust inter-study predictions in microarray data</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2010-10-15</date><risdate>2010</risdate><volume>26</volume><issue>20</issue><spage>2586</spage><epage>2593</epage><pages>2586-2593</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><abstract>Motivation: Traditional genomic prediction models based on individual genes suffer from low reproducibility across microarray studies due to the lack of robustness to expression measurement noise and gene missingness when they are matched across platforms. It is common that some of the genes in the prediction model established in a training study cannot be matched to another test study because a different platform is applied. The failure of inter-study predictions has severely hindered the clinical applications of microarray. To overcome the drawbacks of traditional gene-based prediction (GBP) models, we propose a module-based prediction (MBP) strategy via unsupervised gene clustering. Results: K-means clustering is used to group genes sharing similar expression profiles into gene modules, and small modules are merged into their nearest neighbors. Conventional univariate or multivariate feature selection procedure is applied and a representative gene from each selected module is identified to construct the final prediction model. As a result, the prediction model is portable to any test study as long as partial genes in each module exist in the test study. We demonstrate that K-means cluster sizes generally follow a multinomial distribution and the failure probability of inter-study prediction due to missing genes is diminished by merging small clusters into their nearest neighbors. By simulation and applications of real datasets in inter-study predictions, we show that the proposed MBP provides slightly improved accuracy while is considerably more robust than traditional GBP. Availability: http://www.biostat.pitt.edu/bioinfo/ Contact: ctseng@pitt.edu Supplementary information: Supplementary data are available at Bioinformatics online.</abstract><cop>Oxford</cop><pub>Oxford University Press</pub><pmid>20719761</pmid><doi>10.1093/bioinformatics/btq472</doi><tpages>8</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1367-4803 |
ispartof | Bioinformatics, 2010-10, Vol.26 (20), p.2586-2593 |
issn | 1367-4803 1460-2059 1367-4811 |
language | eng |
recordid | cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_2951088 |
source | PubMed (Medline); Oxford Academic Journals (Open Access) |
subjects | Biological and medical sciences Cluster Analysis Computational Biology - methods Databases, Factual Fundamental and applied biological sciences. Psychology Gene Expression Profiling - methods General aspects Mathematics in biology. Statistical analysis. Models. Metrology. Data processing in biology (general aspects) Oligonucleotide Array Sequence Analysis - methods Original Paper |
title | Module-based prediction approach for robust inter-study predictions in microarray data |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T21%3A52%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Module-based%20prediction%20approach%20for%20robust%20inter-study%20predictions%20in%20microarray%20data&rft.jtitle=Bioinformatics&rft.au=Mi,%20Zhibao&rft.date=2010-10-15&rft.volume=26&rft.issue=20&rft.spage=2586&rft.epage=2593&rft.pages=2586-2593&rft.issn=1367-4803&rft.eissn=1460-2059&rft_id=info:doi/10.1093/bioinformatics/btq472&rft_dat=%3Cproquest_pubme%3E758131070%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c510t-9c73e4633ea5d6ae3dcae0302335c30152ffc5b73a23e001017f85de7c90156a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=758131070&rft_id=info:pmid/20719761&rfr_iscdi=true |