Loading…

Estimation of relevant variables on high-dimensional biological patterns using iterated weighted kernel functions

The analysis of complex proteomic and genomic profiles involves the identification of significant markers within a set of hundreds or even thousands of variables that represent a high-dimensional problem space. The occurrence of noise, redundancy or combinatorial interactions in the profile makes th...

Full description

Saved in:
Bibliographic Details
Published in:PloS one 2008-03, Vol.3 (3), p.e1806-e1806
Main Authors: Rojas-Galeano, Sergio, Hsieh, Emily, Agranoff, Dan, Krishna, Sanjeev, Fernandez-Reyes, Delmiro
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c693t-e126279a9d657c21b9023c7fcc5588435f6bd8d25cbf17ca849af4b472af5a463
cites cdi_FETCH-LOGICAL-c693t-e126279a9d657c21b9023c7fcc5588435f6bd8d25cbf17ca849af4b472af5a463
container_end_page e1806
container_issue 3
container_start_page e1806
container_title PloS one
container_volume 3
creator Rojas-Galeano, Sergio
Hsieh, Emily
Agranoff, Dan
Krishna, Sanjeev
Fernandez-Reyes, Delmiro
description The analysis of complex proteomic and genomic profiles involves the identification of significant markers within a set of hundreds or even thousands of variables that represent a high-dimensional problem space. The occurrence of noise, redundancy or combinatorial interactions in the profile makes the selection of relevant variables harder. Here we propose a method to select variables based on estimated relevance to hidden patterns. Our method combines a weighted-kernel discriminant with an iterative stochastic probability estimation algorithm to discover the relevance distribution over the set of variables. We verified the ability of our method to select predefined relevant variables in synthetic proteome-like data and then assessed its performance on biological high-dimensional problems. Experiments were run on serum proteomic datasets of infectious diseases. The resulting variable subsets achieved classification accuracies of 99% on Human African Trypanosomiasis, 91% on Tuberculosis, and 91% on Malaria serum proteomic profiles with fewer than 20% of variables selected. Our method scaled-up to dimensionalities of much higher orders of magnitude as shown with gene expression microarray datasets in which we obtained classification accuracies close to 90% with fewer than 1% of the total number of variables. Our method consistently found relevant variables attaining high classification accuracies across synthetic and biological datasets. Notably, it yielded very compact subsets compared to the original number of variables, which should simplify downstream biological experimentation.
doi_str_mv 10.1371/journal.pone.0001806
format article
fullrecord <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_1319483428</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A472656029</galeid><doaj_id>oai_doaj_org_article_56a54a7989d44bf9905a402d4704c42d</doaj_id><sourcerecordid>A472656029</sourcerecordid><originalsourceid>FETCH-LOGICAL-c693t-e126279a9d657c21b9023c7fcc5588435f6bd8d25cbf17ca849af4b472af5a463</originalsourceid><addsrcrecordid>eNqNk12L1DAUhoso7rr6D0QLwoIXM-b740ZYllUHFhb8ug1pmnYyZpLZJh3135vZqTojgtKLnuQ873ua05yqegrBHGIOX63iOATt55sY7BwAAAVg96pTKDGaMQTw_YP4pHqU0goAigVjD6sTKCiQFMHT6vYqZbfW2cVQx64erLdbHXK91YPTjbepLoml65ez1q1tSIXTvm5c9LF3poQbnbMdQqrH5EJfu7LQ2bb1V1tEu-BLyVpfd2MwuyrpcfWg0z7ZJ9P7rPr05urj5bvZ9c3bxeXF9cwwifPMQsQQl1q2jHKDYCMBwoZ3xlAqBMG0Y00rWkRN00FutCBSd6QhHOmOasLwWfV877vxMampW0lBDCURmCBRiMWeaKNeqc1Q-jB8V1E7dbcRh17pITvjraJMU6K5FLIlpOmkBKUGQC3hgBiC2uL1eqo2NmvbGhvyoP2R6XEmuKXq41YhLJngtBicTwZDvB1tymrtkrHe62DjmBQHXBAO-T9BBAEVHIMCvvgD_HsT5nuq1-WcLnSxfJ4pT2vXzpS71bmyf1H6yigDSBbByyNBYbL9lns9pqQWH97_P3vz-Zg9P2CXVvu8TNGPd5fmGCR70AwxpcF2v7oMgdqNxs9zqt1oqGk0iuzZ4R_6LZpmAf8APtALkQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1319483428</pqid></control><display><type>article</type><title>Estimation of relevant variables on high-dimensional biological patterns using iterated weighted kernel functions</title><source>PubMed (Medline)</source><source>Publicly Available Content Database</source><creator>Rojas-Galeano, Sergio ; Hsieh, Emily ; Agranoff, Dan ; Krishna, Sanjeev ; Fernandez-Reyes, Delmiro</creator><contributor>Stolovitzky, Gustavo</contributor><creatorcontrib>Rojas-Galeano, Sergio ; Hsieh, Emily ; Agranoff, Dan ; Krishna, Sanjeev ; Fernandez-Reyes, Delmiro ; Stolovitzky, Gustavo</creatorcontrib><description>The analysis of complex proteomic and genomic profiles involves the identification of significant markers within a set of hundreds or even thousands of variables that represent a high-dimensional problem space. The occurrence of noise, redundancy or combinatorial interactions in the profile makes the selection of relevant variables harder. Here we propose a method to select variables based on estimated relevance to hidden patterns. Our method combines a weighted-kernel discriminant with an iterative stochastic probability estimation algorithm to discover the relevance distribution over the set of variables. We verified the ability of our method to select predefined relevant variables in synthetic proteome-like data and then assessed its performance on biological high-dimensional problems. Experiments were run on serum proteomic datasets of infectious diseases. The resulting variable subsets achieved classification accuracies of 99% on Human African Trypanosomiasis, 91% on Tuberculosis, and 91% on Malaria serum proteomic profiles with fewer than 20% of variables selected. Our method scaled-up to dimensionalities of much higher orders of magnitude as shown with gene expression microarray datasets in which we obtained classification accuracies close to 90% with fewer than 1% of the total number of variables. Our method consistently found relevant variables attaining high classification accuracies across synthetic and biological datasets. Notably, it yielded very compact subsets compared to the original number of variables, which should simplify downstream biological experimentation.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0001806</identifier><identifier>PMID: 18509521</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>African trypanosomiasis ; Algorithms ; Analysis ; Artificial intelligence ; Cancer ; Classification ; Combinatorial analysis ; Computational Biology - statistics &amp; numerical data ; Computational Biology/Systems Biology ; Computer Science ; Datasets ; DNA microarrays ; Experimentation ; Gene expression ; Genetic algorithms ; Genomics - statistics &amp; numerical data ; Humans ; Infectious Diseases ; Infectious Diseases/Neglected Tropical Diseases ; Infectious Diseases/Tropical and Travel-Associated Diseases ; Information storage ; Integral equations ; Kernel functions ; Malaria ; Mathematical analysis ; Medical research ; Methods ; Mycobacterium ; Oligonucleotide Array Sequence Analysis - statistics &amp; numerical data ; Parasitology ; Pattern Recognition, Automated ; Proteomics ; Proteomics - statistics &amp; numerical data ; Redundancy ; Software ; Stochasticity ; Tuberculosis ; Variables ; Vector-borne diseases</subject><ispartof>PloS one, 2008-03, Vol.3 (3), p.e1806-e1806</ispartof><rights>COPYRIGHT 2008 Public Library of Science</rights><rights>2008 Rojas-Galeano et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>Rojas-Galeano et al. 2008</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c693t-e126279a9d657c21b9023c7fcc5588435f6bd8d25cbf17ca849af4b472af5a463</citedby><cites>FETCH-LOGICAL-c693t-e126279a9d657c21b9023c7fcc5588435f6bd8d25cbf17ca849af4b472af5a463</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/1319483428/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/1319483428?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,44590,53791,53793,75126</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/18509521$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Stolovitzky, Gustavo</contributor><creatorcontrib>Rojas-Galeano, Sergio</creatorcontrib><creatorcontrib>Hsieh, Emily</creatorcontrib><creatorcontrib>Agranoff, Dan</creatorcontrib><creatorcontrib>Krishna, Sanjeev</creatorcontrib><creatorcontrib>Fernandez-Reyes, Delmiro</creatorcontrib><title>Estimation of relevant variables on high-dimensional biological patterns using iterated weighted kernel functions</title><title>PloS one</title><addtitle>PLoS One</addtitle><description>The analysis of complex proteomic and genomic profiles involves the identification of significant markers within a set of hundreds or even thousands of variables that represent a high-dimensional problem space. The occurrence of noise, redundancy or combinatorial interactions in the profile makes the selection of relevant variables harder. Here we propose a method to select variables based on estimated relevance to hidden patterns. Our method combines a weighted-kernel discriminant with an iterative stochastic probability estimation algorithm to discover the relevance distribution over the set of variables. We verified the ability of our method to select predefined relevant variables in synthetic proteome-like data and then assessed its performance on biological high-dimensional problems. Experiments were run on serum proteomic datasets of infectious diseases. The resulting variable subsets achieved classification accuracies of 99% on Human African Trypanosomiasis, 91% on Tuberculosis, and 91% on Malaria serum proteomic profiles with fewer than 20% of variables selected. Our method scaled-up to dimensionalities of much higher orders of magnitude as shown with gene expression microarray datasets in which we obtained classification accuracies close to 90% with fewer than 1% of the total number of variables. Our method consistently found relevant variables attaining high classification accuracies across synthetic and biological datasets. Notably, it yielded very compact subsets compared to the original number of variables, which should simplify downstream biological experimentation.</description><subject>African trypanosomiasis</subject><subject>Algorithms</subject><subject>Analysis</subject><subject>Artificial intelligence</subject><subject>Cancer</subject><subject>Classification</subject><subject>Combinatorial analysis</subject><subject>Computational Biology - statistics &amp; numerical data</subject><subject>Computational Biology/Systems Biology</subject><subject>Computer Science</subject><subject>Datasets</subject><subject>DNA microarrays</subject><subject>Experimentation</subject><subject>Gene expression</subject><subject>Genetic algorithms</subject><subject>Genomics - statistics &amp; numerical data</subject><subject>Humans</subject><subject>Infectious Diseases</subject><subject>Infectious Diseases/Neglected Tropical Diseases</subject><subject>Infectious Diseases/Tropical and Travel-Associated Diseases</subject><subject>Information storage</subject><subject>Integral equations</subject><subject>Kernel functions</subject><subject>Malaria</subject><subject>Mathematical analysis</subject><subject>Medical research</subject><subject>Methods</subject><subject>Mycobacterium</subject><subject>Oligonucleotide Array Sequence Analysis - statistics &amp; numerical data</subject><subject>Parasitology</subject><subject>Pattern Recognition, Automated</subject><subject>Proteomics</subject><subject>Proteomics - statistics &amp; numerical data</subject><subject>Redundancy</subject><subject>Software</subject><subject>Stochasticity</subject><subject>Tuberculosis</subject><subject>Variables</subject><subject>Vector-borne diseases</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNqNk12L1DAUhoso7rr6D0QLwoIXM-b740ZYllUHFhb8ug1pmnYyZpLZJh3135vZqTojgtKLnuQ873ua05yqegrBHGIOX63iOATt55sY7BwAAAVg96pTKDGaMQTw_YP4pHqU0goAigVjD6sTKCiQFMHT6vYqZbfW2cVQx64erLdbHXK91YPTjbepLoml65ez1q1tSIXTvm5c9LF3poQbnbMdQqrH5EJfu7LQ2bb1V1tEu-BLyVpfd2MwuyrpcfWg0z7ZJ9P7rPr05urj5bvZ9c3bxeXF9cwwifPMQsQQl1q2jHKDYCMBwoZ3xlAqBMG0Y00rWkRN00FutCBSd6QhHOmOasLwWfV877vxMampW0lBDCURmCBRiMWeaKNeqc1Q-jB8V1E7dbcRh17pITvjraJMU6K5FLIlpOmkBKUGQC3hgBiC2uL1eqo2NmvbGhvyoP2R6XEmuKXq41YhLJngtBicTwZDvB1tymrtkrHe62DjmBQHXBAO-T9BBAEVHIMCvvgD_HsT5nuq1-WcLnSxfJ4pT2vXzpS71bmyf1H6yigDSBbByyNBYbL9lns9pqQWH97_P3vz-Zg9P2CXVvu8TNGPd5fmGCR70AwxpcF2v7oMgdqNxs9zqt1oqGk0iuzZ4R_6LZpmAf8APtALkQ</recordid><startdate>20080326</startdate><enddate>20080326</enddate><creator>Rojas-Galeano, Sergio</creator><creator>Hsieh, Emily</creator><creator>Agranoff, Dan</creator><creator>Krishna, Sanjeev</creator><creator>Fernandez-Reyes, Delmiro</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20080326</creationdate><title>Estimation of relevant variables on high-dimensional biological patterns using iterated weighted kernel functions</title><author>Rojas-Galeano, Sergio ; Hsieh, Emily ; Agranoff, Dan ; Krishna, Sanjeev ; Fernandez-Reyes, Delmiro</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c693t-e126279a9d657c21b9023c7fcc5588435f6bd8d25cbf17ca849af4b472af5a463</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2008</creationdate><topic>African trypanosomiasis</topic><topic>Algorithms</topic><topic>Analysis</topic><topic>Artificial intelligence</topic><topic>Cancer</topic><topic>Classification</topic><topic>Combinatorial analysis</topic><topic>Computational Biology - statistics &amp; numerical data</topic><topic>Computational Biology/Systems Biology</topic><topic>Computer Science</topic><topic>Datasets</topic><topic>DNA microarrays</topic><topic>Experimentation</topic><topic>Gene expression</topic><topic>Genetic algorithms</topic><topic>Genomics - statistics &amp; numerical data</topic><topic>Humans</topic><topic>Infectious Diseases</topic><topic>Infectious Diseases/Neglected Tropical Diseases</topic><topic>Infectious Diseases/Tropical and Travel-Associated Diseases</topic><topic>Information storage</topic><topic>Integral equations</topic><topic>Kernel functions</topic><topic>Malaria</topic><topic>Mathematical analysis</topic><topic>Medical research</topic><topic>Methods</topic><topic>Mycobacterium</topic><topic>Oligonucleotide Array Sequence Analysis - statistics &amp; numerical data</topic><topic>Parasitology</topic><topic>Pattern Recognition, Automated</topic><topic>Proteomics</topic><topic>Proteomics - statistics &amp; numerical data</topic><topic>Redundancy</topic><topic>Software</topic><topic>Stochasticity</topic><topic>Tuberculosis</topic><topic>Variables</topic><topic>Vector-borne diseases</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rojas-Galeano, Sergio</creatorcontrib><creatorcontrib>Hsieh, Emily</creatorcontrib><creatorcontrib>Agranoff, Dan</creatorcontrib><creatorcontrib>Krishna, Sanjeev</creatorcontrib><creatorcontrib>Fernandez-Reyes, Delmiro</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Nursing &amp; Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological &amp; Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>ProQuest Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>Agricultural &amp; Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Materials Science Database</collection><collection>Nursing &amp; Allied Health Database (Alumni Edition)</collection><collection>Meteorological &amp; Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Agricultural Science Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing &amp; Allied Health Premium</collection><collection>ProQuest advanced technologies &amp; aerospace journals</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials science collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rojas-Galeano, Sergio</au><au>Hsieh, Emily</au><au>Agranoff, Dan</au><au>Krishna, Sanjeev</au><au>Fernandez-Reyes, Delmiro</au><au>Stolovitzky, Gustavo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Estimation of relevant variables on high-dimensional biological patterns using iterated weighted kernel functions</atitle><jtitle>PloS one</jtitle><addtitle>PLoS One</addtitle><date>2008-03-26</date><risdate>2008</risdate><volume>3</volume><issue>3</issue><spage>e1806</spage><epage>e1806</epage><pages>e1806-e1806</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>The analysis of complex proteomic and genomic profiles involves the identification of significant markers within a set of hundreds or even thousands of variables that represent a high-dimensional problem space. The occurrence of noise, redundancy or combinatorial interactions in the profile makes the selection of relevant variables harder. Here we propose a method to select variables based on estimated relevance to hidden patterns. Our method combines a weighted-kernel discriminant with an iterative stochastic probability estimation algorithm to discover the relevance distribution over the set of variables. We verified the ability of our method to select predefined relevant variables in synthetic proteome-like data and then assessed its performance on biological high-dimensional problems. Experiments were run on serum proteomic datasets of infectious diseases. The resulting variable subsets achieved classification accuracies of 99% on Human African Trypanosomiasis, 91% on Tuberculosis, and 91% on Malaria serum proteomic profiles with fewer than 20% of variables selected. Our method scaled-up to dimensionalities of much higher orders of magnitude as shown with gene expression microarray datasets in which we obtained classification accuracies close to 90% with fewer than 1% of the total number of variables. Our method consistently found relevant variables attaining high classification accuracies across synthetic and biological datasets. Notably, it yielded very compact subsets compared to the original number of variables, which should simplify downstream biological experimentation.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>18509521</pmid><doi>10.1371/journal.pone.0001806</doi><tpages>e1806</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1932-6203
ispartof PloS one, 2008-03, Vol.3 (3), p.e1806-e1806
issn 1932-6203
1932-6203
language eng
recordid cdi_plos_journals_1319483428
source PubMed (Medline); Publicly Available Content Database
subjects African trypanosomiasis
Algorithms
Analysis
Artificial intelligence
Cancer
Classification
Combinatorial analysis
Computational Biology - statistics & numerical data
Computational Biology/Systems Biology
Computer Science
Datasets
DNA microarrays
Experimentation
Gene expression
Genetic algorithms
Genomics - statistics & numerical data
Humans
Infectious Diseases
Infectious Diseases/Neglected Tropical Diseases
Infectious Diseases/Tropical and Travel-Associated Diseases
Information storage
Integral equations
Kernel functions
Malaria
Mathematical analysis
Medical research
Methods
Mycobacterium
Oligonucleotide Array Sequence Analysis - statistics & numerical data
Parasitology
Pattern Recognition, Automated
Proteomics
Proteomics - statistics & numerical data
Redundancy
Software
Stochasticity
Tuberculosis
Variables
Vector-borne diseases
title Estimation of relevant variables on high-dimensional biological patterns using iterated weighted kernel functions
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T01%3A03%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Estimation%20of%20relevant%20variables%20on%20high-dimensional%20biological%20patterns%20using%20iterated%20weighted%20kernel%20functions&rft.jtitle=PloS%20one&rft.au=Rojas-Galeano,%20Sergio&rft.date=2008-03-26&rft.volume=3&rft.issue=3&rft.spage=e1806&rft.epage=e1806&rft.pages=e1806-e1806&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0001806&rft_dat=%3Cgale_plos_%3EA472656029%3C/gale_plos_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c693t-e126279a9d657c21b9023c7fcc5588435f6bd8d25cbf17ca849af4b472af5a463%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1319483428&rft_id=info:pmid/18509521&rft_galeid=A472656029&rfr_iscdi=true