Loading…
Quantitative assessment of protein function prediction programs
Fast prediction of protein function is essential for high-throughput sequencing analysis. Bioinformatic resources provide cheaper and faster techniques for function prediction and have helped to accelerate the process of protein sequence characterization. In this study, we assessed protein function...
Saved in:
Published in: | Genetics and molecular research 2015-12, Vol.14 (4), p.17555-17566 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c383t-e7632584f540818ef5a8b773172a3c0d058dadb53b9c808254721b83060bfea53 |
---|---|
cites | |
container_end_page | 17566 |
container_issue | 4 |
container_start_page | 17555 |
container_title | Genetics and molecular research |
container_volume | 14 |
creator | Rodrigues, B N Steffens, M B R Raittz, R T Santos-Weiss, I C R Marchaukoski, J N |
description | Fast prediction of protein function is essential for high-throughput sequencing analysis. Bioinformatic resources provide cheaper and faster techniques for function prediction and have helped to accelerate the process of protein sequence characterization. In this study, we assessed protein function prediction programs that accept amino acid sequences as input. We analyzed the classification, equality, and similarity between programs, and, additionally, compared program performance. The following programs were selected for our assessment: Blast2GO, InterProScan, PANTHER, Pfam, and ScanProsite. This selection was based on the high number of citations (over 500), fully automatic analysis, and the possibility of returning a single best classification per sequence. We tested these programs using 12 gold standard datasets from four different sources. The gold standard classification of the databases was based on expert analysis, the Protein Data Bank, or the Structure-Function Linkage Database. We found that the miss rate among the programs is globally over 50%. Furthermore, we observed little overlap in the correct predictions from each program. Therefore, a combination of multiple types of sources and methods, including experimental data, protein-protein interaction, and data mining, may be the best way to generate more reliable predictions and decrease the miss rate. |
doi_str_mv | 10.4238/2015.December.21.28 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_1827922687</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1760895853</sourcerecordid><originalsourceid>FETCH-LOGICAL-c383t-e7632584f540818ef5a8b773172a3c0d058dadb53b9c808254721b83060bfea53</originalsourceid><addsrcrecordid>eNqFkE9LxDAQxYMo7rr6CQTp0UvrJGmS2ZPI-hcWRNBzSNupVLbtmqSC394uuyvePM1jeG_e8GPsnEOWC4lXArjKbqmktiCfCZ4JPGBTro1OlUY4_KMn7CSEDwChcoRjNhHaoMgBpuz6ZXBdbKKLzRclLgQKoaUuJn2drH0fqemSeujK2PTduKCq2cv-3bs2nLKj2q0Cne3mjL3d370uHtPl88PT4maZlhJlTMloKRTmtcoBOVKtHBbGSG6EkyVUoLByVaFkMS8RcPzTCF6gBA1FTU7JGbvc3h2LPwcK0bZNKGm1ch31Q7AchZkLodH8bzUacK5QydEqt9bS9yF4qu3aN63z35aD3UC2G8h2D9kKbgWOqYtdwVC0VP1m9lTlD1STeWs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1760895853</pqid></control><display><type>article</type><title>Quantitative assessment of protein function prediction programs</title><source>Alma/SFX Local Collection</source><creator>Rodrigues, B N ; Steffens, M B R ; Raittz, R T ; Santos-Weiss, I C R ; Marchaukoski, J N</creator><creatorcontrib>Rodrigues, B N ; Steffens, M B R ; Raittz, R T ; Santos-Weiss, I C R ; Marchaukoski, J N</creatorcontrib><description>Fast prediction of protein function is essential for high-throughput sequencing analysis. Bioinformatic resources provide cheaper and faster techniques for function prediction and have helped to accelerate the process of protein sequence characterization. In this study, we assessed protein function prediction programs that accept amino acid sequences as input. We analyzed the classification, equality, and similarity between programs, and, additionally, compared program performance. The following programs were selected for our assessment: Blast2GO, InterProScan, PANTHER, Pfam, and ScanProsite. This selection was based on the high number of citations (over 500), fully automatic analysis, and the possibility of returning a single best classification per sequence. We tested these programs using 12 gold standard datasets from four different sources. The gold standard classification of the databases was based on expert analysis, the Protein Data Bank, or the Structure-Function Linkage Database. We found that the miss rate among the programs is globally over 50%. Furthermore, we observed little overlap in the correct predictions from each program. Therefore, a combination of multiple types of sources and methods, including experimental data, protein-protein interaction, and data mining, may be the best way to generate more reliable predictions and decrease the miss rate.</description><identifier>ISSN: 1676-5680</identifier><identifier>EISSN: 1676-5680</identifier><identifier>DOI: 10.4238/2015.December.21.28</identifier><identifier>PMID: 26782400</identifier><language>eng</language><publisher>Brazil</publisher><subject>Algorithms ; Computational Biology ; Databases, Protein ; Proteins - genetics ; Sequence Analysis, Protein ; Software</subject><ispartof>Genetics and molecular research, 2015-12, Vol.14 (4), p.17555-17566</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c383t-e7632584f540818ef5a8b773172a3c0d058dadb53b9c808254721b83060bfea53</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,778,782,27907,27908</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/26782400$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Rodrigues, B N</creatorcontrib><creatorcontrib>Steffens, M B R</creatorcontrib><creatorcontrib>Raittz, R T</creatorcontrib><creatorcontrib>Santos-Weiss, I C R</creatorcontrib><creatorcontrib>Marchaukoski, J N</creatorcontrib><title>Quantitative assessment of protein function prediction programs</title><title>Genetics and molecular research</title><addtitle>Genet Mol Res</addtitle><description>Fast prediction of protein function is essential for high-throughput sequencing analysis. Bioinformatic resources provide cheaper and faster techniques for function prediction and have helped to accelerate the process of protein sequence characterization. In this study, we assessed protein function prediction programs that accept amino acid sequences as input. We analyzed the classification, equality, and similarity between programs, and, additionally, compared program performance. The following programs were selected for our assessment: Blast2GO, InterProScan, PANTHER, Pfam, and ScanProsite. This selection was based on the high number of citations (over 500), fully automatic analysis, and the possibility of returning a single best classification per sequence. We tested these programs using 12 gold standard datasets from four different sources. The gold standard classification of the databases was based on expert analysis, the Protein Data Bank, or the Structure-Function Linkage Database. We found that the miss rate among the programs is globally over 50%. Furthermore, we observed little overlap in the correct predictions from each program. Therefore, a combination of multiple types of sources and methods, including experimental data, protein-protein interaction, and data mining, may be the best way to generate more reliable predictions and decrease the miss rate.</description><subject>Algorithms</subject><subject>Computational Biology</subject><subject>Databases, Protein</subject><subject>Proteins - genetics</subject><subject>Sequence Analysis, Protein</subject><subject>Software</subject><issn>1676-5680</issn><issn>1676-5680</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2015</creationdate><recordtype>article</recordtype><recordid>eNqFkE9LxDAQxYMo7rr6CQTp0UvrJGmS2ZPI-hcWRNBzSNupVLbtmqSC394uuyvePM1jeG_e8GPsnEOWC4lXArjKbqmktiCfCZ4JPGBTro1OlUY4_KMn7CSEDwChcoRjNhHaoMgBpuz6ZXBdbKKLzRclLgQKoaUuJn2drH0fqemSeujK2PTduKCq2cv-3bs2nLKj2q0Cne3mjL3d370uHtPl88PT4maZlhJlTMloKRTmtcoBOVKtHBbGSG6EkyVUoLByVaFkMS8RcPzTCF6gBA1FTU7JGbvc3h2LPwcK0bZNKGm1ch31Q7AchZkLodH8bzUacK5QydEqt9bS9yF4qu3aN63z35aD3UC2G8h2D9kKbgWOqYtdwVC0VP1m9lTlD1STeWs</recordid><startdate>20151221</startdate><enddate>20151221</enddate><creator>Rodrigues, B N</creator><creator>Steffens, M B R</creator><creator>Raittz, R T</creator><creator>Santos-Weiss, I C R</creator><creator>Marchaukoski, J N</creator><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>8FD</scope><scope>FR3</scope><scope>P64</scope><scope>RC3</scope></search><sort><creationdate>20151221</creationdate><title>Quantitative assessment of protein function prediction programs</title><author>Rodrigues, B N ; Steffens, M B R ; Raittz, R T ; Santos-Weiss, I C R ; Marchaukoski, J N</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c383t-e7632584f540818ef5a8b773172a3c0d058dadb53b9c808254721b83060bfea53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2015</creationdate><topic>Algorithms</topic><topic>Computational Biology</topic><topic>Databases, Protein</topic><topic>Proteins - genetics</topic><topic>Sequence Analysis, Protein</topic><topic>Software</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rodrigues, B N</creatorcontrib><creatorcontrib>Steffens, M B R</creatorcontrib><creatorcontrib>Raittz, R T</creatorcontrib><creatorcontrib>Santos-Weiss, I C R</creatorcontrib><creatorcontrib>Marchaukoski, J N</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Genetics Abstracts</collection><jtitle>Genetics and molecular research</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Rodrigues, B N</au><au>Steffens, M B R</au><au>Raittz, R T</au><au>Santos-Weiss, I C R</au><au>Marchaukoski, J N</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Quantitative assessment of protein function prediction programs</atitle><jtitle>Genetics and molecular research</jtitle><addtitle>Genet Mol Res</addtitle><date>2015-12-21</date><risdate>2015</risdate><volume>14</volume><issue>4</issue><spage>17555</spage><epage>17566</epage><pages>17555-17566</pages><issn>1676-5680</issn><eissn>1676-5680</eissn><abstract>Fast prediction of protein function is essential for high-throughput sequencing analysis. Bioinformatic resources provide cheaper and faster techniques for function prediction and have helped to accelerate the process of protein sequence characterization. In this study, we assessed protein function prediction programs that accept amino acid sequences as input. We analyzed the classification, equality, and similarity between programs, and, additionally, compared program performance. The following programs were selected for our assessment: Blast2GO, InterProScan, PANTHER, Pfam, and ScanProsite. This selection was based on the high number of citations (over 500), fully automatic analysis, and the possibility of returning a single best classification per sequence. We tested these programs using 12 gold standard datasets from four different sources. The gold standard classification of the databases was based on expert analysis, the Protein Data Bank, or the Structure-Function Linkage Database. We found that the miss rate among the programs is globally over 50%. Furthermore, we observed little overlap in the correct predictions from each program. Therefore, a combination of multiple types of sources and methods, including experimental data, protein-protein interaction, and data mining, may be the best way to generate more reliable predictions and decrease the miss rate.</abstract><cop>Brazil</cop><pmid>26782400</pmid><doi>10.4238/2015.December.21.28</doi><tpages>12</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1676-5680 |
ispartof | Genetics and molecular research, 2015-12, Vol.14 (4), p.17555-17566 |
issn | 1676-5680 1676-5680 |
language | eng |
recordid | cdi_proquest_miscellaneous_1827922687 |
source | Alma/SFX Local Collection |
subjects | Algorithms Computational Biology Databases, Protein Proteins - genetics Sequence Analysis, Protein Software |
title | Quantitative assessment of protein function prediction programs |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T22%3A37%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Quantitative%20assessment%20of%20protein%20function%20prediction%20programs&rft.jtitle=Genetics%20and%20molecular%20research&rft.au=Rodrigues,%20B%20N&rft.date=2015-12-21&rft.volume=14&rft.issue=4&rft.spage=17555&rft.epage=17566&rft.pages=17555-17566&rft.issn=1676-5680&rft.eissn=1676-5680&rft_id=info:doi/10.4238/2015.December.21.28&rft_dat=%3Cproquest_cross%3E1760895853%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c383t-e7632584f540818ef5a8b773172a3c0d058dadb53b9c808254721b83060bfea53%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1760895853&rft_id=info:pmid/26782400&rfr_iscdi=true |