Loading…

Quantitative assessment of protein function prediction programs

Fast prediction of protein function is essential for high-throughput sequencing analysis. Bioinformatic resources provide cheaper and faster techniques for function prediction and have helped to accelerate the process of protein sequence characterization. In this study, we assessed protein function...

Full description

Saved in:
Bibliographic Details
Published in:Genetics and molecular research 2015-12, Vol.14 (4), p.17555-17566
Main Authors: Rodrigues, B N, Steffens, M B R, Raittz, R T, Santos-Weiss, I C R, Marchaukoski, J N
Format: Article
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c383t-e7632584f540818ef5a8b773172a3c0d058dadb53b9c808254721b83060bfea53
cites
container_end_page 17566
container_issue 4
container_start_page 17555
container_title Genetics and molecular research
container_volume 14
creator Rodrigues, B N
Steffens, M B R
Raittz, R T
Santos-Weiss, I C R
Marchaukoski, J N
description Fast prediction of protein function is essential for high-throughput sequencing analysis. Bioinformatic resources provide cheaper and faster techniques for function prediction and have helped to accelerate the process of protein sequence characterization. In this study, we assessed protein function prediction programs that accept amino acid sequences as input. We analyzed the classification, equality, and similarity between programs, and, additionally, compared program performance. The following programs were selected for our assessment: Blast2GO, InterProScan, PANTHER, Pfam, and ScanProsite. This selection was based on the high number of citations (over 500), fully automatic analysis, and the possibility of returning a single best classification per sequence. We tested these programs using 12 gold standard datasets from four different sources. The gold standard classification of the databases was based on expert analysis, the Protein Data Bank, or the Structure-Function Linkage Database. We found that the miss rate among the programs is globally over 50%. Furthermore, we observed little overlap in the correct predictions from each program. Therefore, a combination of multiple types of sources and methods, including experimental data, protein-protein interaction, and data mining, may be the best way to generate more reliable predictions and decrease the miss rate.
doi_str_mv 10.4238/2015.December.21.28
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1827922687</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1760895853</sourcerecordid><originalsourceid>FETCH-LOGICAL-c383t-e7632584f540818ef5a8b773172a3c0d058dadb53b9c808254721b83060bfea53</originalsourceid><addsrcrecordid>eNqFkE9LxDAQxYMo7rr6CQTp0UvrJGmS2ZPI-hcWRNBzSNupVLbtmqSC394uuyvePM1jeG_e8GPsnEOWC4lXArjKbqmktiCfCZ4JPGBTro1OlUY4_KMn7CSEDwChcoRjNhHaoMgBpuz6ZXBdbKKLzRclLgQKoaUuJn2drH0fqemSeujK2PTduKCq2cv-3bs2nLKj2q0Cne3mjL3d370uHtPl88PT4maZlhJlTMloKRTmtcoBOVKtHBbGSG6EkyVUoLByVaFkMS8RcPzTCF6gBA1FTU7JGbvc3h2LPwcK0bZNKGm1ch31Q7AchZkLodH8bzUacK5QydEqt9bS9yF4qu3aN63z35aD3UC2G8h2D9kKbgWOqYtdwVC0VP1m9lTlD1STeWs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1760895853</pqid></control><display><type>article</type><title>Quantitative assessment of protein function prediction programs</title><source>Alma/SFX Local Collection</source><creator>Rodrigues, B N ; Steffens, M B R ; Raittz, R T ; Santos-Weiss, I C R ; Marchaukoski, J N</creator><creatorcontrib>Rodrigues, B N ; Steffens, M B R ; Raittz, R T ; Santos-Weiss, I C R ; Marchaukoski, J N</creatorcontrib><description>Fast prediction of protein function is essential for high-throughput sequencing analysis. Bioinformatic resources provide cheaper and faster techniques for function prediction and have helped to accelerate the process of protein sequence characterization. In this study, we assessed protein function prediction programs that accept amino acid sequences as input. We analyzed the classification, equality, and similarity between programs, and, additionally, compared program performance. The following programs were selected for our assessment: Blast2GO, InterProScan, PANTHER, Pfam, and ScanProsite. This selection was based on the high number of citations (over 500), fully automatic analysis, and the possibility of returning a single best classification per sequence. We tested these programs using 12 gold standard datasets from four different sources. The gold standard classification of the databases was based on expert analysis, the Protein Data Bank, or the Structure-Function Linkage Database. We found that the miss rate among the programs is globally over 50%. Furthermore, we observed little overlap in the correct predictions from each program. Therefore, a combination of multiple types of sources and methods, including experimental data, protein-protein interaction, and data mining, may be the best way to generate more reliable predictions and decrease the miss rate.</description><identifier>ISSN: 1676-5680</identifier><identifier>EISSN: 1676-5680</identifier><identifier>DOI: 10.4238/2015.December.21.28</identifier><identifier>PMID: 26782400</identifier><language>eng</language><publisher>Brazil</publisher><subject>Algorithms ; Computational Biology ; Databases, Protein ; Proteins - genetics ; Sequence Analysis, Protein ; Software</subject><ispartof>Genetics and molecular research, 2015-12, Vol.14 (4), p.17555-17566</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c383t-e7632584f540818ef5a8b773172a3c0d058dadb53b9c808254721b83060bfea53</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,778,782,27907,27908</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/26782400$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Rodrigues, B N</creatorcontrib><creatorcontrib>Steffens, M B R</creatorcontrib><creatorcontrib>Raittz, R T</creatorcontrib><creatorcontrib>Santos-Weiss, I C R</creatorcontrib><creatorcontrib>Marchaukoski, J N</creatorcontrib><title>Quantitative assessment of protein function prediction programs</title><title>Genetics and molecular research</title><addtitle>Genet Mol Res</addtitle><description>Fast prediction of protein function is essential for high-throughput sequencing analysis. Bioinformatic resources provide cheaper and faster techniques for function prediction and have helped to accelerate the process of protein sequence characterization. In this study, we assessed protein function prediction programs that accept amino acid sequences as input. We analyzed the classification, equality, and similarity between programs, and, additionally, compared program performance. The following programs were selected for our assessment: Blast2GO, InterProScan, PANTHER, Pfam, and ScanProsite. This selection was based on the high number of citations (over 500), fully automatic analysis, and the possibility of returning a single best classification per sequence. We tested these programs using 12 gold standard datasets from four different sources. The gold standard classification of the databases was based on expert analysis, the Protein Data Bank, or the Structure-Function Linkage Database. We found that the miss rate among the programs is globally over 50%. Furthermore, we observed little overlap in the correct predictions from each program. Therefore, a combination of multiple types of sources and methods, including experimental data, protein-protein interaction, and data mining, may be the best way to generate more reliable predictions and decrease the miss rate.</description><subject>Algorithms</subject><subject>Computational Biology</subject><subject>Databases, Protein</subject><subject>Proteins - genetics</subject><subject>Sequence Analysis, Protein</subject><subject>Software</subject><issn>1676-5680</issn><issn>1676-5680</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNqFkE9LxDAQxYMo7rr6CQTp0UvrJGmS2ZPI-hcWRNBzSNupVLbtmqSC394uuyvePM1jeG_e8GPsnEOWC4lXArjKbqmktiCfCZ4JPGBTro1OlUY4_KMn7CSEDwChcoRjNhHaoMgBpuz6ZXBdbKKLzRclLgQKoaUuJn2drH0fqemSeujK2PTduKCq2cv-3bs2nLKj2q0Cne3mjL3d370uHtPl88PT4maZlhJlTMloKRTmtcoBOVKtHBbGSG6EkyVUoLByVaFkMS8RcPzTCF6gBA1FTU7JGbvc3h2LPwcK0bZNKGm1ch31Q7AchZkLodH8bzUacK5QydEqt9bS9yF4qu3aN63z35aD3UC2G8h2D9kKbgWOqYtdwVC0VP1m9lTlD1STeWs</recordid><startdate>20151221</startdate><enddate>20151221</enddate><creator>Rodrigues, B N</creator><creator>Steffens, M B R</creator><creator>Raittz, R T</creator><creator>Santos-Weiss, I C R</creator><creator>Marchaukoski, J N</creator><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>RC3</scope></search><sort><creationdate>20151221</creationdate><title>Quantitative assessment of protein function prediction programs</title><author>Rodrigues, B N ; Steffens, M B R ; Raittz, R T ; Santos-Weiss, I C R ; Marchaukoski, J N</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c383t-e7632584f540818ef5a8b773172a3c0d058dadb53b9c808254721b83060bfea53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Algorithms</topic><topic>Computational Biology</topic><topic>Databases, Protein</topic><topic>Proteins - genetics</topic><topic>Sequence Analysis, Protein</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rodrigues, B N</creatorcontrib><creatorcontrib>Steffens, M B R</creatorcontrib><creatorcontrib>Raittz, R T</creatorcontrib><creatorcontrib>Santos-Weiss, I C R</creatorcontrib><creatorcontrib>Marchaukoski, J N</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><jtitle>Genetics and molecular research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rodrigues, B N</au><au>Steffens, M B R</au><au>Raittz, R T</au><au>Santos-Weiss, I C R</au><au>Marchaukoski, J N</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Quantitative assessment of protein function prediction programs</atitle><jtitle>Genetics and molecular research</jtitle><addtitle>Genet Mol Res</addtitle><date>2015-12-21</date><risdate>2015</risdate><volume>14</volume><issue>4</issue><spage>17555</spage><epage>17566</epage><pages>17555-17566</pages><issn>1676-5680</issn><eissn>1676-5680</eissn><abstract>Fast prediction of protein function is essential for high-throughput sequencing analysis. Bioinformatic resources provide cheaper and faster techniques for function prediction and have helped to accelerate the process of protein sequence characterization. In this study, we assessed protein function prediction programs that accept amino acid sequences as input. We analyzed the classification, equality, and similarity between programs, and, additionally, compared program performance. The following programs were selected for our assessment: Blast2GO, InterProScan, PANTHER, Pfam, and ScanProsite. This selection was based on the high number of citations (over 500), fully automatic analysis, and the possibility of returning a single best classification per sequence. We tested these programs using 12 gold standard datasets from four different sources. The gold standard classification of the databases was based on expert analysis, the Protein Data Bank, or the Structure-Function Linkage Database. We found that the miss rate among the programs is globally over 50%. Furthermore, we observed little overlap in the correct predictions from each program. Therefore, a combination of multiple types of sources and methods, including experimental data, protein-protein interaction, and data mining, may be the best way to generate more reliable predictions and decrease the miss rate.</abstract><cop>Brazil</cop><pmid>26782400</pmid><doi>10.4238/2015.December.21.28</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1676-5680
ispartof Genetics and molecular research, 2015-12, Vol.14 (4), p.17555-17566
issn 1676-5680
1676-5680
language eng
recordid cdi_proquest_miscellaneous_1827922687
source Alma/SFX Local Collection
subjects Algorithms
Computational Biology
Databases, Protein
Proteins - genetics
Sequence Analysis, Protein
Software
title Quantitative assessment of protein function prediction programs
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T22%3A37%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Quantitative%20assessment%20of%20protein%20function%20prediction%20programs&rft.jtitle=Genetics%20and%20molecular%20research&rft.au=Rodrigues,%20B%20N&rft.date=2015-12-21&rft.volume=14&rft.issue=4&rft.spage=17555&rft.epage=17566&rft.pages=17555-17566&rft.issn=1676-5680&rft.eissn=1676-5680&rft_id=info:doi/10.4238/2015.December.21.28&rft_dat=%3Cproquest_cross%3E1760895853%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c383t-e7632584f540818ef5a8b773172a3c0d058dadb53b9c808254721b83060bfea53%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1760895853&rft_id=info:pmid/26782400&rfr_iscdi=true