Loading…
Estimation of relevant variables on high-dimensional biological patterns using iterated weighted kernel functions
The analysis of complex proteomic and genomic profiles involves the identification of significant markers within a set of hundreds or even thousands of variables that represent a high-dimensional problem space. The occurrence of noise, redundancy or combinatorial interactions in the profile makes th...
Saved in:
Published in: | PloS one 2008-03, Vol.3 (3), p.e1806-e1806 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c693t-e126279a9d657c21b9023c7fcc5588435f6bd8d25cbf17ca849af4b472af5a463 |
---|---|
cites | cdi_FETCH-LOGICAL-c693t-e126279a9d657c21b9023c7fcc5588435f6bd8d25cbf17ca849af4b472af5a463 |
container_end_page | e1806 |
container_issue | 3 |
container_start_page | e1806 |
container_title | PloS one |
container_volume | 3 |
creator | Rojas-Galeano, Sergio Hsieh, Emily Agranoff, Dan Krishna, Sanjeev Fernandez-Reyes, Delmiro |
description | The analysis of complex proteomic and genomic profiles involves the identification of significant markers within a set of hundreds or even thousands of variables that represent a high-dimensional problem space. The occurrence of noise, redundancy or combinatorial interactions in the profile makes the selection of relevant variables harder.
Here we propose a method to select variables based on estimated relevance to hidden patterns. Our method combines a weighted-kernel discriminant with an iterative stochastic probability estimation algorithm to discover the relevance distribution over the set of variables. We verified the ability of our method to select predefined relevant variables in synthetic proteome-like data and then assessed its performance on biological high-dimensional problems. Experiments were run on serum proteomic datasets of infectious diseases. The resulting variable subsets achieved classification accuracies of 99% on Human African Trypanosomiasis, 91% on Tuberculosis, and 91% on Malaria serum proteomic profiles with fewer than 20% of variables selected. Our method scaled-up to dimensionalities of much higher orders of magnitude as shown with gene expression microarray datasets in which we obtained classification accuracies close to 90% with fewer than 1% of the total number of variables.
Our method consistently found relevant variables attaining high classification accuracies across synthetic and biological datasets. Notably, it yielded very compact subsets compared to the original number of variables, which should simplify downstream biological experimentation. |
doi_str_mv | 10.1371/journal.pone.0001806 |
format | article |
fullrecord | <record><control><sourceid>gale_plos_</sourceid><recordid>TN_cdi_plos_journals_1319483428</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A472656029</galeid><doaj_id>oai_doaj_org_article_56a54a7989d44bf9905a402d4704c42d</doaj_id><sourcerecordid>A472656029</sourcerecordid><originalsourceid>FETCH-LOGICAL-c693t-e126279a9d657c21b9023c7fcc5588435f6bd8d25cbf17ca849af4b472af5a463</originalsourceid><addsrcrecordid>eNqNk12L1DAUhoso7rr6D0QLwoIXM-b740ZYllUHFhb8ug1pmnYyZpLZJh3135vZqTojgtKLnuQ873ua05yqegrBHGIOX63iOATt55sY7BwAAAVg96pTKDGaMQTw_YP4pHqU0goAigVjD6sTKCiQFMHT6vYqZbfW2cVQx64erLdbHXK91YPTjbepLoml65ez1q1tSIXTvm5c9LF3poQbnbMdQqrH5EJfu7LQ2bb1V1tEu-BLyVpfd2MwuyrpcfWg0z7ZJ9P7rPr05urj5bvZ9c3bxeXF9cwwifPMQsQQl1q2jHKDYCMBwoZ3xlAqBMG0Y00rWkRN00FutCBSd6QhHOmOasLwWfV877vxMampW0lBDCURmCBRiMWeaKNeqc1Q-jB8V1E7dbcRh17pITvjraJMU6K5FLIlpOmkBKUGQC3hgBiC2uL1eqo2NmvbGhvyoP2R6XEmuKXq41YhLJngtBicTwZDvB1tymrtkrHe62DjmBQHXBAO-T9BBAEVHIMCvvgD_HsT5nuq1-WcLnSxfJ4pT2vXzpS71bmyf1H6yigDSBbByyNBYbL9lns9pqQWH97_P3vz-Zg9P2CXVvu8TNGPd5fmGCR70AwxpcF2v7oMgdqNxs9zqt1oqGk0iuzZ4R_6LZpmAf8APtALkQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1319483428</pqid></control><display><type>article</type><title>Estimation of relevant variables on high-dimensional biological patterns using iterated weighted kernel functions</title><source>PubMed (Medline)</source><source>Publicly Available Content Database</source><creator>Rojas-Galeano, Sergio ; Hsieh, Emily ; Agranoff, Dan ; Krishna, Sanjeev ; Fernandez-Reyes, Delmiro</creator><contributor>Stolovitzky, Gustavo</contributor><creatorcontrib>Rojas-Galeano, Sergio ; Hsieh, Emily ; Agranoff, Dan ; Krishna, Sanjeev ; Fernandez-Reyes, Delmiro ; Stolovitzky, Gustavo</creatorcontrib><description>The analysis of complex proteomic and genomic profiles involves the identification of significant markers within a set of hundreds or even thousands of variables that represent a high-dimensional problem space. The occurrence of noise, redundancy or combinatorial interactions in the profile makes the selection of relevant variables harder.
Here we propose a method to select variables based on estimated relevance to hidden patterns. Our method combines a weighted-kernel discriminant with an iterative stochastic probability estimation algorithm to discover the relevance distribution over the set of variables. We verified the ability of our method to select predefined relevant variables in synthetic proteome-like data and then assessed its performance on biological high-dimensional problems. Experiments were run on serum proteomic datasets of infectious diseases. The resulting variable subsets achieved classification accuracies of 99% on Human African Trypanosomiasis, 91% on Tuberculosis, and 91% on Malaria serum proteomic profiles with fewer than 20% of variables selected. Our method scaled-up to dimensionalities of much higher orders of magnitude as shown with gene expression microarray datasets in which we obtained classification accuracies close to 90% with fewer than 1% of the total number of variables.
Our method consistently found relevant variables attaining high classification accuracies across synthetic and biological datasets. Notably, it yielded very compact subsets compared to the original number of variables, which should simplify downstream biological experimentation.</description><identifier>ISSN: 1932-6203</identifier><identifier>EISSN: 1932-6203</identifier><identifier>DOI: 10.1371/journal.pone.0001806</identifier><identifier>PMID: 18509521</identifier><language>eng</language><publisher>United States: Public Library of Science</publisher><subject>African trypanosomiasis ; Algorithms ; Analysis ; Artificial intelligence ; Cancer ; Classification ; Combinatorial analysis ; Computational Biology - statistics & numerical data ; Computational Biology/Systems Biology ; Computer Science ; Datasets ; DNA microarrays ; Experimentation ; Gene expression ; Genetic algorithms ; Genomics - statistics & numerical data ; Humans ; Infectious Diseases ; Infectious Diseases/Neglected Tropical Diseases ; Infectious Diseases/Tropical and Travel-Associated Diseases ; Information storage ; Integral equations ; Kernel functions ; Malaria ; Mathematical analysis ; Medical research ; Methods ; Mycobacterium ; Oligonucleotide Array Sequence Analysis - statistics & numerical data ; Parasitology ; Pattern Recognition, Automated ; Proteomics ; Proteomics - statistics & numerical data ; Redundancy ; Software ; Stochasticity ; Tuberculosis ; Variables ; Vector-borne diseases</subject><ispartof>PloS one, 2008-03, Vol.3 (3), p.e1806-e1806</ispartof><rights>COPYRIGHT 2008 Public Library of Science</rights><rights>2008 Rojas-Galeano et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>Rojas-Galeano et al. 2008</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c693t-e126279a9d657c21b9023c7fcc5588435f6bd8d25cbf17ca849af4b472af5a463</citedby><cites>FETCH-LOGICAL-c693t-e126279a9d657c21b9023c7fcc5588435f6bd8d25cbf17ca849af4b472af5a463</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/1319483428/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/1319483428?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,44590,53791,53793,75126</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/18509521$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Stolovitzky, Gustavo</contributor><creatorcontrib>Rojas-Galeano, Sergio</creatorcontrib><creatorcontrib>Hsieh, Emily</creatorcontrib><creatorcontrib>Agranoff, Dan</creatorcontrib><creatorcontrib>Krishna, Sanjeev</creatorcontrib><creatorcontrib>Fernandez-Reyes, Delmiro</creatorcontrib><title>Estimation of relevant variables on high-dimensional biological patterns using iterated weighted kernel functions</title><title>PloS one</title><addtitle>PLoS One</addtitle><description>The analysis of complex proteomic and genomic profiles involves the identification of significant markers within a set of hundreds or even thousands of variables that represent a high-dimensional problem space. The occurrence of noise, redundancy or combinatorial interactions in the profile makes the selection of relevant variables harder.
Here we propose a method to select variables based on estimated relevance to hidden patterns. Our method combines a weighted-kernel discriminant with an iterative stochastic probability estimation algorithm to discover the relevance distribution over the set of variables. We verified the ability of our method to select predefined relevant variables in synthetic proteome-like data and then assessed its performance on biological high-dimensional problems. Experiments were run on serum proteomic datasets of infectious diseases. The resulting variable subsets achieved classification accuracies of 99% on Human African Trypanosomiasis, 91% on Tuberculosis, and 91% on Malaria serum proteomic profiles with fewer than 20% of variables selected. Our method scaled-up to dimensionalities of much higher orders of magnitude as shown with gene expression microarray datasets in which we obtained classification accuracies close to 90% with fewer than 1% of the total number of variables.
Our method consistently found relevant variables attaining high classification accuracies across synthetic and biological datasets. Notably, it yielded very compact subsets compared to the original number of variables, which should simplify downstream biological experimentation.</description><subject>African trypanosomiasis</subject><subject>Algorithms</subject><subject>Analysis</subject><subject>Artificial intelligence</subject><subject>Cancer</subject><subject>Classification</subject><subject>Combinatorial analysis</subject><subject>Computational Biology - statistics & numerical data</subject><subject>Computational Biology/Systems Biology</subject><subject>Computer Science</subject><subject>Datasets</subject><subject>DNA microarrays</subject><subject>Experimentation</subject><subject>Gene expression</subject><subject>Genetic algorithms</subject><subject>Genomics - statistics & numerical data</subject><subject>Humans</subject><subject>Infectious Diseases</subject><subject>Infectious Diseases/Neglected Tropical Diseases</subject><subject>Infectious Diseases/Tropical and Travel-Associated Diseases</subject><subject>Information storage</subject><subject>Integral equations</subject><subject>Kernel functions</subject><subject>Malaria</subject><subject>Mathematical analysis</subject><subject>Medical research</subject><subject>Methods</subject><subject>Mycobacterium</subject><subject>Oligonucleotide Array Sequence Analysis - statistics & numerical data</subject><subject>Parasitology</subject><subject>Pattern Recognition, Automated</subject><subject>Proteomics</subject><subject>Proteomics - statistics & numerical data</subject><subject>Redundancy</subject><subject>Software</subject><subject>Stochasticity</subject><subject>Tuberculosis</subject><subject>Variables</subject><subject>Vector-borne diseases</subject><issn>1932-6203</issn><issn>1932-6203</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2008</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNqNk12L1DAUhoso7rr6D0QLwoIXM-b740ZYllUHFhb8ug1pmnYyZpLZJh3135vZqTojgtKLnuQ873ua05yqegrBHGIOX63iOATt55sY7BwAAAVg96pTKDGaMQTw_YP4pHqU0goAigVjD6sTKCiQFMHT6vYqZbfW2cVQx64erLdbHXK91YPTjbepLoml65ez1q1tSIXTvm5c9LF3poQbnbMdQqrH5EJfu7LQ2bb1V1tEu-BLyVpfd2MwuyrpcfWg0z7ZJ9P7rPr05urj5bvZ9c3bxeXF9cwwifPMQsQQl1q2jHKDYCMBwoZ3xlAqBMG0Y00rWkRN00FutCBSd6QhHOmOasLwWfV877vxMampW0lBDCURmCBRiMWeaKNeqc1Q-jB8V1E7dbcRh17pITvjraJMU6K5FLIlpOmkBKUGQC3hgBiC2uL1eqo2NmvbGhvyoP2R6XEmuKXq41YhLJngtBicTwZDvB1tymrtkrHe62DjmBQHXBAO-T9BBAEVHIMCvvgD_HsT5nuq1-WcLnSxfJ4pT2vXzpS71bmyf1H6yigDSBbByyNBYbL9lns9pqQWH97_P3vz-Zg9P2CXVvu8TNGPd5fmGCR70AwxpcF2v7oMgdqNxs9zqt1oqGk0iuzZ4R_6LZpmAf8APtALkQ</recordid><startdate>20080326</startdate><enddate>20080326</enddate><creator>Rojas-Galeano, Sergio</creator><creator>Hsieh, Emily</creator><creator>Agranoff, Dan</creator><creator>Krishna, Sanjeev</creator><creator>Fernandez-Reyes, Delmiro</creator><general>Public Library of Science</general><general>Public Library of Science (PLoS)</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>IOV</scope><scope>ISR</scope><scope>3V.</scope><scope>7QG</scope><scope>7QL</scope><scope>7QO</scope><scope>7RV</scope><scope>7SN</scope><scope>7SS</scope><scope>7T5</scope><scope>7TG</scope><scope>7TM</scope><scope>7U9</scope><scope>7X2</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AO</scope><scope>8C1</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>ATCPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>C1K</scope><scope>CCPQU</scope><scope>D1I</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>H94</scope><scope>HCIFZ</scope><scope>K9.</scope><scope>KB.</scope><scope>KB0</scope><scope>KL.</scope><scope>L6V</scope><scope>LK8</scope><scope>M0K</scope><scope>M0S</scope><scope>M1P</scope><scope>M7N</scope><scope>M7P</scope><scope>M7S</scope><scope>NAPCQ</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PATMY</scope><scope>PDBOC</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>PYCSY</scope><scope>RC3</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20080326</creationdate><title>Estimation of relevant variables on high-dimensional biological patterns using iterated weighted kernel functions</title><author>Rojas-Galeano, Sergio ; Hsieh, Emily ; Agranoff, Dan ; Krishna, Sanjeev ; Fernandez-Reyes, Delmiro</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c693t-e126279a9d657c21b9023c7fcc5588435f6bd8d25cbf17ca849af4b472af5a463</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2008</creationdate><topic>African trypanosomiasis</topic><topic>Algorithms</topic><topic>Analysis</topic><topic>Artificial intelligence</topic><topic>Cancer</topic><topic>Classification</topic><topic>Combinatorial analysis</topic><topic>Computational Biology - statistics & numerical data</topic><topic>Computational Biology/Systems Biology</topic><topic>Computer Science</topic><topic>Datasets</topic><topic>DNA microarrays</topic><topic>Experimentation</topic><topic>Gene expression</topic><topic>Genetic algorithms</topic><topic>Genomics - statistics & numerical data</topic><topic>Humans</topic><topic>Infectious Diseases</topic><topic>Infectious Diseases/Neglected Tropical Diseases</topic><topic>Infectious Diseases/Tropical and Travel-Associated Diseases</topic><topic>Information storage</topic><topic>Integral equations</topic><topic>Kernel functions</topic><topic>Malaria</topic><topic>Mathematical analysis</topic><topic>Medical research</topic><topic>Methods</topic><topic>Mycobacterium</topic><topic>Oligonucleotide Array Sequence Analysis - statistics & numerical data</topic><topic>Parasitology</topic><topic>Pattern Recognition, Automated</topic><topic>Proteomics</topic><topic>Proteomics - statistics & numerical data</topic><topic>Redundancy</topic><topic>Software</topic><topic>Stochasticity</topic><topic>Tuberculosis</topic><topic>Variables</topic><topic>Vector-borne diseases</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rojas-Galeano, Sergio</creatorcontrib><creatorcontrib>Hsieh, Emily</creatorcontrib><creatorcontrib>Agranoff, Dan</creatorcontrib><creatorcontrib>Krishna, Sanjeev</creatorcontrib><creatorcontrib>Fernandez-Reyes, Delmiro</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Opposing Viewpoints</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Animal Behavior Abstracts</collection><collection>Bacteriology Abstracts (Microbiology B)</collection><collection>Biotechnology Research Abstracts</collection><collection>Nursing & Allied Health Database</collection><collection>Ecology Abstracts</collection><collection>Entomology Abstracts (Full archive)</collection><collection>Immunology Abstracts</collection><collection>Meteorological & Geoastrophysical Abstracts</collection><collection>Nucleic Acids Abstracts</collection><collection>Virology and AIDS Abstracts</collection><collection>Agricultural Science Collection</collection><collection>ProQuest Health & Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Public Health Database</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>Agricultural & Environmental Science Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ProQuest One Community College</collection><collection>ProQuest Materials Science Collection</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>AIDS and Cancer Research Abstracts</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Health & Medical Complete (Alumni)</collection><collection>Materials Science Database</collection><collection>Nursing & Allied Health Database (Alumni Edition)</collection><collection>Meteorological & Geoastrophysical Abstracts - Academic</collection><collection>ProQuest Engineering Collection</collection><collection>ProQuest Biological Science Collection</collection><collection>Agricultural Science Database</collection><collection>Health & Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Algology Mycology and Protozoology Abstracts (Microbiology C)</collection><collection>Biological Science Database</collection><collection>Engineering Database</collection><collection>Nursing & Allied Health Premium</collection><collection>ProQuest advanced technologies & aerospace journals</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Environmental Science Database</collection><collection>Materials science collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection><collection>Environmental Science Collection</collection><collection>Genetics Abstracts</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Directory of Open Access Journals</collection><jtitle>PloS one</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rojas-Galeano, Sergio</au><au>Hsieh, Emily</au><au>Agranoff, Dan</au><au>Krishna, Sanjeev</au><au>Fernandez-Reyes, Delmiro</au><au>Stolovitzky, Gustavo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Estimation of relevant variables on high-dimensional biological patterns using iterated weighted kernel functions</atitle><jtitle>PloS one</jtitle><addtitle>PLoS One</addtitle><date>2008-03-26</date><risdate>2008</risdate><volume>3</volume><issue>3</issue><spage>e1806</spage><epage>e1806</epage><pages>e1806-e1806</pages><issn>1932-6203</issn><eissn>1932-6203</eissn><abstract>The analysis of complex proteomic and genomic profiles involves the identification of significant markers within a set of hundreds or even thousands of variables that represent a high-dimensional problem space. The occurrence of noise, redundancy or combinatorial interactions in the profile makes the selection of relevant variables harder.
Here we propose a method to select variables based on estimated relevance to hidden patterns. Our method combines a weighted-kernel discriminant with an iterative stochastic probability estimation algorithm to discover the relevance distribution over the set of variables. We verified the ability of our method to select predefined relevant variables in synthetic proteome-like data and then assessed its performance on biological high-dimensional problems. Experiments were run on serum proteomic datasets of infectious diseases. The resulting variable subsets achieved classification accuracies of 99% on Human African Trypanosomiasis, 91% on Tuberculosis, and 91% on Malaria serum proteomic profiles with fewer than 20% of variables selected. Our method scaled-up to dimensionalities of much higher orders of magnitude as shown with gene expression microarray datasets in which we obtained classification accuracies close to 90% with fewer than 1% of the total number of variables.
Our method consistently found relevant variables attaining high classification accuracies across synthetic and biological datasets. Notably, it yielded very compact subsets compared to the original number of variables, which should simplify downstream biological experimentation.</abstract><cop>United States</cop><pub>Public Library of Science</pub><pmid>18509521</pmid><doi>10.1371/journal.pone.0001806</doi><tpages>e1806</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1932-6203 |
ispartof | PloS one, 2008-03, Vol.3 (3), p.e1806-e1806 |
issn | 1932-6203 1932-6203 |
language | eng |
recordid | cdi_plos_journals_1319483428 |
source | PubMed (Medline); Publicly Available Content Database |
subjects | African trypanosomiasis Algorithms Analysis Artificial intelligence Cancer Classification Combinatorial analysis Computational Biology - statistics & numerical data Computational Biology/Systems Biology Computer Science Datasets DNA microarrays Experimentation Gene expression Genetic algorithms Genomics - statistics & numerical data Humans Infectious Diseases Infectious Diseases/Neglected Tropical Diseases Infectious Diseases/Tropical and Travel-Associated Diseases Information storage Integral equations Kernel functions Malaria Mathematical analysis Medical research Methods Mycobacterium Oligonucleotide Array Sequence Analysis - statistics & numerical data Parasitology Pattern Recognition, Automated Proteomics Proteomics - statistics & numerical data Redundancy Software Stochasticity Tuberculosis Variables Vector-borne diseases |
title | Estimation of relevant variables on high-dimensional biological patterns using iterated weighted kernel functions |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T01%3A03%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_plos_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Estimation%20of%20relevant%20variables%20on%20high-dimensional%20biological%20patterns%20using%20iterated%20weighted%20kernel%20functions&rft.jtitle=PloS%20one&rft.au=Rojas-Galeano,%20Sergio&rft.date=2008-03-26&rft.volume=3&rft.issue=3&rft.spage=e1806&rft.epage=e1806&rft.pages=e1806-e1806&rft.issn=1932-6203&rft.eissn=1932-6203&rft_id=info:doi/10.1371/journal.pone.0001806&rft_dat=%3Cgale_plos_%3EA472656029%3C/gale_plos_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c693t-e126279a9d657c21b9023c7fcc5588435f6bd8d25cbf17ca849af4b472af5a463%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1319483428&rft_id=info:pmid/18509521&rft_galeid=A472656029&rfr_iscdi=true |