Loading…

A novel strategy for classifying the output from an in silico vaccine discovery pipeline for eukaryotic pathogens using machine learning algorithms

An in silico vaccine discovery pipeline for eukaryotic pathogens typically consists of several computational tools to predict protein characteristics. The aim of the in silico approach to discovering subunit vaccines is to use predicted characteristics to identify proteins which are worthy of labora...

Full description

Saved in:
Bibliographic Details
Published in:BMC bioinformatics 2013-11, Vol.14 (1), p.315-315
Main Authors: Goodswen, Stephen J, Kennedy, Paul J, Ellis, John T
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites
container_end_page 315
container_issue 1
container_start_page 315
container_title BMC bioinformatics
container_volume 14
creator Goodswen, Stephen J
Kennedy, Paul J
Ellis, John T
description An in silico vaccine discovery pipeline for eukaryotic pathogens typically consists of several computational tools to predict protein characteristics. The aim of the in silico approach to discovering subunit vaccines is to use predicted characteristics to identify proteins which are worthy of laboratory investigation. A major challenge is that these predictions are inherent with hidden inaccuracies and contradictions. This study focuses on how to reduce the number of false candidates using machine learning algorithms rather than relying on expensive laboratory validation. Proteins from Toxoplasma gondii, Plasmodium sp., and Caenorhabditis elegans were used as training and test datasets. The results show that machine learning algorithms can effectively distinguish expected true from expected false vaccine candidates (with an average sensitivity and specificity of 0.97 and 0.98 respectively), for proteins observed to induce immune responses experimentally. Vaccine candidates from an in silico approach can only be truly validated in a laboratory. Given any in silico output and appropriate training data, the number of false candidates allocated for validation can be dramatically reduced using a pool of machine learning algorithms. This will ultimately save time and money in the laboratory.
doi_str_mv 10.1186/1471-2105-14-315
format article
fullrecord <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3826511</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1642331314</sourcerecordid><originalsourceid>FETCH-LOGICAL-b420t-8403dedc96b014704638ffe5ff79617da3f94aa0e8662522315c741f578fa95a3</originalsourceid><addsrcrecordid>eNqNkk1rFTEUhoNYbL26dyUBN27G5ntmNkK5-AWFbnQdcnOTmdRMMiaZC_d39A-bobW0gthVDue8PJz3PQHgDUYfMO7EOWYtbghGvMGsoZg_A2f3recP6lPwMudrhHDbIf4CnBKGa0HEGbi5gCEejIe5JFXMcIQ2Jqi9ytnZowsDLKOBcSnzUqBNcYIqQBdgdt7pCA9KaxcM3LusKyYd4exm49fWyjHLT5WOsTgNZ1XGOJiQ4ZJX7KT0uMq8USmsDeWHmFwZp_wKnFjls3l9927Aj8-fvm-_NpdXX75tLy6bHSOoNB1DdG_2uhc7VJ0iJmhnreHWtr3A7V5R2zOlkOmEIJyQmo9uGba87azquaIb8PGWOy-7qYJMqBl4OSc31a1lVE4-ngQ3yiEeJO2I4BhXwPYWsHPxH4DHEx0nuR5Frkeplaw7Vcr7uzVS_LWYXORU0zTeq2DikiUWjFCKKWb_l7IeiZ7xvn2CVHTVB0OkSt_9Jb2OSwo1-qriVYVIzXYD3j5M697ln79EfwM6Os5H</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1458380246</pqid></control><display><type>article</type><title>A novel strategy for classifying the output from an in silico vaccine discovery pipeline for eukaryotic pathogens using machine learning algorithms</title><source>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</source><source>PubMed Central</source><creator>Goodswen, Stephen J ; Kennedy, Paul J ; Ellis, John T</creator><creatorcontrib>Goodswen, Stephen J ; Kennedy, Paul J ; Ellis, John T</creatorcontrib><description>An in silico vaccine discovery pipeline for eukaryotic pathogens typically consists of several computational tools to predict protein characteristics. The aim of the in silico approach to discovering subunit vaccines is to use predicted characteristics to identify proteins which are worthy of laboratory investigation. A major challenge is that these predictions are inherent with hidden inaccuracies and contradictions. This study focuses on how to reduce the number of false candidates using machine learning algorithms rather than relying on expensive laboratory validation. Proteins from Toxoplasma gondii, Plasmodium sp., and Caenorhabditis elegans were used as training and test datasets. The results show that machine learning algorithms can effectively distinguish expected true from expected false vaccine candidates (with an average sensitivity and specificity of 0.97 and 0.98 respectively), for proteins observed to induce immune responses experimentally. Vaccine candidates from an in silico approach can only be truly validated in a laboratory. Given any in silico output and appropriate training data, the number of false candidates allocated for validation can be dramatically reduced using a pool of machine learning algorithms. This will ultimately save time and money in the laboratory.</description><identifier>ISSN: 1471-2105</identifier><identifier>EISSN: 1471-2105</identifier><identifier>DOI: 10.1186/1471-2105-14-315</identifier><identifier>PMID: 24180526</identifier><language>eng</language><publisher>England: BioMed Central</publisher><subject>Algorithms ; Animals ; Antigens - chemistry ; Antigens - immunology ; Artificial Intelligence ; Caenorhabditis elegans ; Caenorhabditis elegans Proteins - chemistry ; Caenorhabditis elegans Proteins - immunology ; Candidates ; Classification ; Computational Biology - methods ; Computer Simulation ; Drug Discovery ; Feasibility studies ; Immune system ; Machine learning ; Methodology ; Pathogens ; Pipelines ; Plasmodium ; Proteins ; Protozoan Proteins - chemistry ; Protozoan Proteins - immunology ; Sensitivity and Specificity ; Strategy ; Toxoplasma gondii ; Training ; Vaccines ; Vaccines - chemistry ; Vaccines - immunology</subject><ispartof>BMC bioinformatics, 2013-11, Vol.14 (1), p.315-315</ispartof><rights>2013 Goodswen et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</rights><rights>Copyright © 2013 Goodswen et al.; licensee BioMed Central Ltd. 2013 Goodswen et al.; licensee BioMed Central Ltd.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3826511/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/1458380246?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,25731,27901,27902,36989,36990,44566,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/24180526$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Goodswen, Stephen J</creatorcontrib><creatorcontrib>Kennedy, Paul J</creatorcontrib><creatorcontrib>Ellis, John T</creatorcontrib><title>A novel strategy for classifying the output from an in silico vaccine discovery pipeline for eukaryotic pathogens using machine learning algorithms</title><title>BMC bioinformatics</title><addtitle>BMC Bioinformatics</addtitle><description>An in silico vaccine discovery pipeline for eukaryotic pathogens typically consists of several computational tools to predict protein characteristics. The aim of the in silico approach to discovering subunit vaccines is to use predicted characteristics to identify proteins which are worthy of laboratory investigation. A major challenge is that these predictions are inherent with hidden inaccuracies and contradictions. This study focuses on how to reduce the number of false candidates using machine learning algorithms rather than relying on expensive laboratory validation. Proteins from Toxoplasma gondii, Plasmodium sp., and Caenorhabditis elegans were used as training and test datasets. The results show that machine learning algorithms can effectively distinguish expected true from expected false vaccine candidates (with an average sensitivity and specificity of 0.97 and 0.98 respectively), for proteins observed to induce immune responses experimentally. Vaccine candidates from an in silico approach can only be truly validated in a laboratory. Given any in silico output and appropriate training data, the number of false candidates allocated for validation can be dramatically reduced using a pool of machine learning algorithms. This will ultimately save time and money in the laboratory.</description><subject>Algorithms</subject><subject>Animals</subject><subject>Antigens - chemistry</subject><subject>Antigens - immunology</subject><subject>Artificial Intelligence</subject><subject>Caenorhabditis elegans</subject><subject>Caenorhabditis elegans Proteins - chemistry</subject><subject>Caenorhabditis elegans Proteins - immunology</subject><subject>Candidates</subject><subject>Classification</subject><subject>Computational Biology - methods</subject><subject>Computer Simulation</subject><subject>Drug Discovery</subject><subject>Feasibility studies</subject><subject>Immune system</subject><subject>Machine learning</subject><subject>Methodology</subject><subject>Pathogens</subject><subject>Pipelines</subject><subject>Plasmodium</subject><subject>Proteins</subject><subject>Protozoan Proteins - chemistry</subject><subject>Protozoan Proteins - immunology</subject><subject>Sensitivity and Specificity</subject><subject>Strategy</subject><subject>Toxoplasma gondii</subject><subject>Training</subject><subject>Vaccines</subject><subject>Vaccines - chemistry</subject><subject>Vaccines - immunology</subject><issn>1471-2105</issn><issn>1471-2105</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2013</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><recordid>eNqNkk1rFTEUhoNYbL26dyUBN27G5ntmNkK5-AWFbnQdcnOTmdRMMiaZC_d39A-bobW0gthVDue8PJz3PQHgDUYfMO7EOWYtbghGvMGsoZg_A2f3recP6lPwMudrhHDbIf4CnBKGa0HEGbi5gCEejIe5JFXMcIQ2Jqi9ytnZowsDLKOBcSnzUqBNcYIqQBdgdt7pCA9KaxcM3LusKyYd4exm49fWyjHLT5WOsTgNZ1XGOJiQ4ZJX7KT0uMq8USmsDeWHmFwZp_wKnFjls3l9927Aj8-fvm-_NpdXX75tLy6bHSOoNB1DdG_2uhc7VJ0iJmhnreHWtr3A7V5R2zOlkOmEIJyQmo9uGba87azquaIb8PGWOy-7qYJMqBl4OSc31a1lVE4-ngQ3yiEeJO2I4BhXwPYWsHPxH4DHEx0nuR5Frkeplaw7Vcr7uzVS_LWYXORU0zTeq2DikiUWjFCKKWb_l7IeiZ7xvn2CVHTVB0OkSt_9Jb2OSwo1-qriVYVIzXYD3j5M697ln79EfwM6Os5H</recordid><startdate>20131102</startdate><enddate>20131102</enddate><creator>Goodswen, Stephen J</creator><creator>Kennedy, Paul J</creator><creator>Ellis, John T</creator><general>BioMed Central</general><general>BioMed Central Ltd</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>3V.</scope><scope>7QO</scope><scope>7SC</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>L7M</scope><scope>LK8</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PHGZM</scope><scope>PHGZT</scope><scope>PIMPY</scope><scope>PJZUB</scope><scope>PKEHL</scope><scope>PPXIY</scope><scope>PQEST</scope><scope>PQGLB</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>C1K</scope><scope>F1W</scope><scope>H95</scope><scope>H97</scope><scope>L.G</scope><scope>7X8</scope><scope>7TB</scope><scope>5PM</scope></search><sort><creationdate>20131102</creationdate><title>A novel strategy for classifying the output from an in silico vaccine discovery pipeline for eukaryotic pathogens using machine learning algorithms</title><author>Goodswen, Stephen J ; Kennedy, Paul J ; Ellis, John T</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-b420t-8403dedc96b014704638ffe5ff79617da3f94aa0e8662522315c741f578fa95a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2013</creationdate><topic>Algorithms</topic><topic>Animals</topic><topic>Antigens - chemistry</topic><topic>Antigens - immunology</topic><topic>Artificial Intelligence</topic><topic>Caenorhabditis elegans</topic><topic>Caenorhabditis elegans Proteins - chemistry</topic><topic>Caenorhabditis elegans Proteins - immunology</topic><topic>Candidates</topic><topic>Classification</topic><topic>Computational Biology - methods</topic><topic>Computer Simulation</topic><topic>Drug Discovery</topic><topic>Feasibility studies</topic><topic>Immune system</topic><topic>Machine learning</topic><topic>Methodology</topic><topic>Pathogens</topic><topic>Pipelines</topic><topic>Plasmodium</topic><topic>Proteins</topic><topic>Protozoan Proteins - chemistry</topic><topic>Protozoan Proteins - immunology</topic><topic>Sensitivity and Specificity</topic><topic>Strategy</topic><topic>Toxoplasma gondii</topic><topic>Training</topic><topic>Vaccines</topic><topic>Vaccines - chemistry</topic><topic>Vaccines - immunology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Goodswen, Stephen J</creatorcontrib><creatorcontrib>Kennedy, Paul J</creatorcontrib><creatorcontrib>Ellis, John T</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Health &amp; Medical Complete (ProQuest Database)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Database‎ (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Biological Sciences</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>PML(ProQuest Medical Library)</collection><collection>Biological Science Database</collection><collection>ProQuest advanced technologies &amp; aerospace journals</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>ProQuest Central (New)</collection><collection>ProQuest One Academic (New)</collection><collection>Publicly Available Content Database (Proquest) (PQ_SDU_P3)</collection><collection>ProQuest Health &amp; Medical Research Collection</collection><collection>ProQuest One Academic Middle East (New)</collection><collection>ProQuest One Health &amp; Nursing</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Applied &amp; Life Sciences</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ASFA: Aquatic Sciences and Fisheries Abstracts</collection><collection>Aquatic Science &amp; Fisheries Abstracts (ASFA) 1: Biological Sciences &amp; Living Resources</collection><collection>Aquatic Science &amp; Fisheries Abstracts (ASFA) 3: Aquatic Pollution &amp; Environmental Quality</collection><collection>Aquatic Science &amp; Fisheries Abstracts (ASFA) Professional</collection><collection>MEDLINE - Academic</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>BMC bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Goodswen, Stephen J</au><au>Kennedy, Paul J</au><au>Ellis, John T</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A novel strategy for classifying the output from an in silico vaccine discovery pipeline for eukaryotic pathogens using machine learning algorithms</atitle><jtitle>BMC bioinformatics</jtitle><addtitle>BMC Bioinformatics</addtitle><date>2013-11-02</date><risdate>2013</risdate><volume>14</volume><issue>1</issue><spage>315</spage><epage>315</epage><pages>315-315</pages><issn>1471-2105</issn><eissn>1471-2105</eissn><abstract>An in silico vaccine discovery pipeline for eukaryotic pathogens typically consists of several computational tools to predict protein characteristics. The aim of the in silico approach to discovering subunit vaccines is to use predicted characteristics to identify proteins which are worthy of laboratory investigation. A major challenge is that these predictions are inherent with hidden inaccuracies and contradictions. This study focuses on how to reduce the number of false candidates using machine learning algorithms rather than relying on expensive laboratory validation. Proteins from Toxoplasma gondii, Plasmodium sp., and Caenorhabditis elegans were used as training and test datasets. The results show that machine learning algorithms can effectively distinguish expected true from expected false vaccine candidates (with an average sensitivity and specificity of 0.97 and 0.98 respectively), for proteins observed to induce immune responses experimentally. Vaccine candidates from an in silico approach can only be truly validated in a laboratory. Given any in silico output and appropriate training data, the number of false candidates allocated for validation can be dramatically reduced using a pool of machine learning algorithms. This will ultimately save time and money in the laboratory.</abstract><cop>England</cop><pub>BioMed Central</pub><pmid>24180526</pmid><doi>10.1186/1471-2105-14-315</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1471-2105
ispartof BMC bioinformatics, 2013-11, Vol.14 (1), p.315-315
issn 1471-2105
1471-2105
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_3826511
source Publicly Available Content Database (Proquest) (PQ_SDU_P3); PubMed Central
subjects Algorithms
Animals
Antigens - chemistry
Antigens - immunology
Artificial Intelligence
Caenorhabditis elegans
Caenorhabditis elegans Proteins - chemistry
Caenorhabditis elegans Proteins - immunology
Candidates
Classification
Computational Biology - methods
Computer Simulation
Drug Discovery
Feasibility studies
Immune system
Machine learning
Methodology
Pathogens
Pipelines
Plasmodium
Proteins
Protozoan Proteins - chemistry
Protozoan Proteins - immunology
Sensitivity and Specificity
Strategy
Toxoplasma gondii
Training
Vaccines
Vaccines - chemistry
Vaccines - immunology
title A novel strategy for classifying the output from an in silico vaccine discovery pipeline for eukaryotic pathogens using machine learning algorithms
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-23T16%3A45%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20novel%20strategy%20for%20classifying%20the%20output%20from%20an%20in%20silico%20vaccine%20discovery%20pipeline%20for%20eukaryotic%20pathogens%20using%20machine%20learning%20algorithms&rft.jtitle=BMC%20bioinformatics&rft.au=Goodswen,%20Stephen%20J&rft.date=2013-11-02&rft.volume=14&rft.issue=1&rft.spage=315&rft.epage=315&rft.pages=315-315&rft.issn=1471-2105&rft.eissn=1471-2105&rft_id=info:doi/10.1186/1471-2105-14-315&rft_dat=%3Cproquest_pubme%3E1642331314%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-b420t-8403dedc96b014704638ffe5ff79617da3f94aa0e8662522315c741f578fa95a3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1458380246&rft_id=info:pmid/24180526&rfr_iscdi=true