Loading…

Ranking near-native candidate protein structures via random forest classification

In ab initio protein-structure predictions, a large set of structural decoys are often generated, with the requirement to select best five or three candidates from the decoys. The clustered central structures with the most number of neighbors are frequently regarded as the near-native protein struct...

Full description

Saved in:
Bibliographic Details
Published in:BMC bioinformatics 2019-12, Vol.20 (Suppl 25), p.683-683, Article 683
Main Authors: Wu, Hongjie, Huang, Hongmei, Lu, Weizhong, Fu, Qiming, Ding, Yijie, Qiu, Jing, Li, Haiou
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c594t-eb87d815d00f2552d38c88ae714cd8a205049dc2aa1bb2a4713e02c15112732b3
cites cdi_FETCH-LOGICAL-c594t-eb87d815d00f2552d38c88ae714cd8a205049dc2aa1bb2a4713e02c15112732b3
container_end_page 683
container_issue Suppl 25
container_start_page 683
container_title BMC bioinformatics
container_volume 20
creator Wu, Hongjie
Huang, Hongmei
Lu, Weizhong
Fu, Qiming
Ding, Yijie
Qiu, Jing
Li, Haiou
description In ab initio protein-structure predictions, a large set of structural decoys are often generated, with the requirement to select best five or three candidates from the decoys. The clustered central structures with the most number of neighbors are frequently regarded as the near-native protein structures with the lowest free energy; however, limitations in clustering methods and three-dimensional structural-distance assessments make identifying exact order of the best five or three near-native candidate structures difficult. To address this issue, we propose a method that re-ranks the candidate structures via random forest classification using intra- and inter-cluster features from the results of the clustering. Comparative analysis indicated that our method was better able to identify the order of the candidate structures as comparing with current methods SPICKR, Calibur, and Durandal. The results confirmed that the identification of the first model were closer to the native structure in 12 of 43 cases versus four for SPICKER, and the same as the native structure in up to 27 of 43 cases versus 14 for Calibur and up to eight of 43 cases versus two for Durandal. In this study, we presented an improved method based on random forest classification to transform the problem of re-ranking the candidate structures by an binary classification. Our results indicate that this method is a powerful method for the problem and the effect of this method is better than other methods.
doi_str_mv 10.1186/s12859-019-3257-8
format article
fullrecord <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_89ed5f274aaa4173aa8bdbd229a5a1a3</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A610270714</galeid><doaj_id>oai_doaj_org_article_89ed5f274aaa4173aa8bdbd229a5a1a3</doaj_id><sourcerecordid>A610270714</sourcerecordid><originalsourceid>FETCH-LOGICAL-c594t-eb87d815d00f2552d38c88ae714cd8a205049dc2aa1bb2a4713e02c15112732b3</originalsourceid><addsrcrecordid>eNptkl-P1CAUxRujcdfVD-CLaeKLPnTlQin0xWSz8c8kmxhXfSa3QCtjB0agE_32Ms667hjDA-TyO4dwcqrqKZBzANm9SkAl7xsCfcMoF428V51CK6ChQPj9O-eT6lFKa0JASMIfVicMpGh5351WH6_Rf3N-qr3F2HjMbmdrjd44g9nW2xiydb5OOS46L9GmeuewjgUIm3oMZZBrPWNKbnS6qIN_XD0YcU72yc1-Vn15--bz5fvm6sO71eXFVaN53-bGDlIYCdwQMlLOqWFSS4lWQKuNREo4aXujKSIMA8XyE2YJ1cABqGB0YGfV6uBrAq7VNroNxp8qoFO_ByFOCmN2erZK9tbwkYoWEVsQDFEOZjCU9sgRkBWv1wev7TJsrNHW54jzkenxjXdf1RR2qutpz5goBi9uDGL4vpRQ1MYlbecZvQ1LUpQxwnvJGS3o83_QdViiL1EVqiVdR4DCX2rC8gHnx1De1XtTddEBoYKUoAp1_h-qLGM3TgdvR1fmR4KXR4LCZPsjT7ikpFafro9ZOLA6hpSiHW_zAKL2_VOH_qnSP7Xvn5JF8-xukLeKP4VjvwB1m9Qz</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2340660121</pqid></control><display><type>article</type><title>Ranking near-native candidate protein structures via random forest classification</title><source>Publicly Available Content Database</source><source>PubMed Central</source><creator>Wu, Hongjie ; Huang, Hongmei ; Lu, Weizhong ; Fu, Qiming ; Ding, Yijie ; Qiu, Jing ; Li, Haiou</creator><creatorcontrib>Wu, Hongjie ; Huang, Hongmei ; Lu, Weizhong ; Fu, Qiming ; Ding, Yijie ; Qiu, Jing ; Li, Haiou</creatorcontrib><description>In ab initio protein-structure predictions, a large set of structural decoys are often generated, with the requirement to select best five or three candidates from the decoys. The clustered central structures with the most number of neighbors are frequently regarded as the near-native protein structures with the lowest free energy; however, limitations in clustering methods and three-dimensional structural-distance assessments make identifying exact order of the best five or three near-native candidate structures difficult. To address this issue, we propose a method that re-ranks the candidate structures via random forest classification using intra- and inter-cluster features from the results of the clustering. Comparative analysis indicated that our method was better able to identify the order of the candidate structures as comparing with current methods SPICKR, Calibur, and Durandal. The results confirmed that the identification of the first model were closer to the native structure in 12 of 43 cases versus four for SPICKER, and the same as the native structure in up to 27 of 43 cases versus 14 for Calibur and up to eight of 43 cases versus two for Durandal. In this study, we presented an improved method based on random forest classification to transform the problem of re-ranking the candidate structures by an binary classification. Our results indicate that this method is a powerful method for the problem and the effect of this method is better than other methods.</description><identifier>ISSN: 1471-2105</identifier><identifier>EISSN: 1471-2105</identifier><identifier>DOI: 10.1186/s12859-019-3257-8</identifier><identifier>PMID: 31874596</identifier><language>eng</language><publisher>England: BioMed Central Ltd</publisher><subject>Accuracy ; Algorithms ; Analysis ; Candidates ; Classification ; Cluster Analysis ; Clustering ; Comparative analysis ; Decoys ; Forecasts and trends ; Free energy ; Identification and classification ; Methods ; Protein Conformation ; Protein structural prediction ; Protein structure ; Proteins ; Proteins - chemistry ; Proteomics ; Random forest ; Ranking ; SPICKER</subject><ispartof>BMC bioinformatics, 2019-12, Vol.20 (Suppl 25), p.683-683, Article 683</ispartof><rights>COPYRIGHT 2019 BioMed Central Ltd.</rights><rights>2019. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>The Author(s). 2019</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c594t-eb87d815d00f2552d38c88ae714cd8a205049dc2aa1bb2a4713e02c15112732b3</citedby><cites>FETCH-LOGICAL-c594t-eb87d815d00f2552d38c88ae714cd8a205049dc2aa1bb2a4713e02c15112732b3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929337/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/2340660121?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,44590,53791,53793</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/31874596$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Wu, Hongjie</creatorcontrib><creatorcontrib>Huang, Hongmei</creatorcontrib><creatorcontrib>Lu, Weizhong</creatorcontrib><creatorcontrib>Fu, Qiming</creatorcontrib><creatorcontrib>Ding, Yijie</creatorcontrib><creatorcontrib>Qiu, Jing</creatorcontrib><creatorcontrib>Li, Haiou</creatorcontrib><title>Ranking near-native candidate protein structures via random forest classification</title><title>BMC bioinformatics</title><addtitle>BMC Bioinformatics</addtitle><description>In ab initio protein-structure predictions, a large set of structural decoys are often generated, with the requirement to select best five or three candidates from the decoys. The clustered central structures with the most number of neighbors are frequently regarded as the near-native protein structures with the lowest free energy; however, limitations in clustering methods and three-dimensional structural-distance assessments make identifying exact order of the best five or three near-native candidate structures difficult. To address this issue, we propose a method that re-ranks the candidate structures via random forest classification using intra- and inter-cluster features from the results of the clustering. Comparative analysis indicated that our method was better able to identify the order of the candidate structures as comparing with current methods SPICKR, Calibur, and Durandal. The results confirmed that the identification of the first model were closer to the native structure in 12 of 43 cases versus four for SPICKER, and the same as the native structure in up to 27 of 43 cases versus 14 for Calibur and up to eight of 43 cases versus two for Durandal. In this study, we presented an improved method based on random forest classification to transform the problem of re-ranking the candidate structures by an binary classification. Our results indicate that this method is a powerful method for the problem and the effect of this method is better than other methods.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Analysis</subject><subject>Candidates</subject><subject>Classification</subject><subject>Cluster Analysis</subject><subject>Clustering</subject><subject>Comparative analysis</subject><subject>Decoys</subject><subject>Forecasts and trends</subject><subject>Free energy</subject><subject>Identification and classification</subject><subject>Methods</subject><subject>Protein Conformation</subject><subject>Protein structural prediction</subject><subject>Protein structure</subject><subject>Proteins</subject><subject>Proteins - chemistry</subject><subject>Proteomics</subject><subject>Random forest</subject><subject>Ranking</subject><subject>SPICKER</subject><issn>1471-2105</issn><issn>1471-2105</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNptkl-P1CAUxRujcdfVD-CLaeKLPnTlQin0xWSz8c8kmxhXfSa3QCtjB0agE_32Ms667hjDA-TyO4dwcqrqKZBzANm9SkAl7xsCfcMoF428V51CK6ChQPj9O-eT6lFKa0JASMIfVicMpGh5351WH6_Rf3N-qr3F2HjMbmdrjd44g9nW2xiydb5OOS46L9GmeuewjgUIm3oMZZBrPWNKbnS6qIN_XD0YcU72yc1-Vn15--bz5fvm6sO71eXFVaN53-bGDlIYCdwQMlLOqWFSS4lWQKuNREo4aXujKSIMA8XyE2YJ1cABqGB0YGfV6uBrAq7VNroNxp8qoFO_ByFOCmN2erZK9tbwkYoWEVsQDFEOZjCU9sgRkBWv1wev7TJsrNHW54jzkenxjXdf1RR2qutpz5goBi9uDGL4vpRQ1MYlbecZvQ1LUpQxwnvJGS3o83_QdViiL1EVqiVdR4DCX2rC8gHnx1De1XtTddEBoYKUoAp1_h-qLGM3TgdvR1fmR4KXR4LCZPsjT7ikpFafro9ZOLA6hpSiHW_zAKL2_VOH_qnSP7Xvn5JF8-xukLeKP4VjvwB1m9Qz</recordid><startdate>20191224</startdate><enddate>20191224</enddate><creator>Wu, Hongjie</creator><creator>Huang, Hongmei</creator><creator>Lu, Weizhong</creator><creator>Fu, Qiming</creator><creator>Ding, Yijie</creator><creator>Qiu, Jing</creator><creator>Li, Haiou</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><general>BMC</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7SC</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>L7M</scope><scope>LK8</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20191224</creationdate><title>Ranking near-native candidate protein structures via random forest classification</title><author>Wu, Hongjie ; Huang, Hongmei ; Lu, Weizhong ; Fu, Qiming ; Ding, Yijie ; Qiu, Jing ; Li, Haiou</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c594t-eb87d815d00f2552d38c88ae714cd8a205049dc2aa1bb2a4713e02c15112732b3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Analysis</topic><topic>Candidates</topic><topic>Classification</topic><topic>Cluster Analysis</topic><topic>Clustering</topic><topic>Comparative analysis</topic><topic>Decoys</topic><topic>Forecasts and trends</topic><topic>Free energy</topic><topic>Identification and classification</topic><topic>Methods</topic><topic>Protein Conformation</topic><topic>Protein structural prediction</topic><topic>Protein structure</topic><topic>Proteins</topic><topic>Proteins - chemistry</topic><topic>Proteomics</topic><topic>Random forest</topic><topic>Ranking</topic><topic>SPICKER</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wu, Hongjie</creatorcontrib><creatorcontrib>Huang, Hongmei</creatorcontrib><creatorcontrib>Lu, Weizhong</creatorcontrib><creatorcontrib>Fu, Qiming</creatorcontrib><creatorcontrib>Ding, Yijie</creatorcontrib><creatorcontrib>Qiu, Jing</creatorcontrib><creatorcontrib>Li, Haiou</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ProQuest Biological Science Collection</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>Medical Database</collection><collection>Biological Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>BMC bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wu, Hongjie</au><au>Huang, Hongmei</au><au>Lu, Weizhong</au><au>Fu, Qiming</au><au>Ding, Yijie</au><au>Qiu, Jing</au><au>Li, Haiou</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Ranking near-native candidate protein structures via random forest classification</atitle><jtitle>BMC bioinformatics</jtitle><addtitle>BMC Bioinformatics</addtitle><date>2019-12-24</date><risdate>2019</risdate><volume>20</volume><issue>Suppl 25</issue><spage>683</spage><epage>683</epage><pages>683-683</pages><artnum>683</artnum><issn>1471-2105</issn><eissn>1471-2105</eissn><abstract>In ab initio protein-structure predictions, a large set of structural decoys are often generated, with the requirement to select best five or three candidates from the decoys. The clustered central structures with the most number of neighbors are frequently regarded as the near-native protein structures with the lowest free energy; however, limitations in clustering methods and three-dimensional structural-distance assessments make identifying exact order of the best five or three near-native candidate structures difficult. To address this issue, we propose a method that re-ranks the candidate structures via random forest classification using intra- and inter-cluster features from the results of the clustering. Comparative analysis indicated that our method was better able to identify the order of the candidate structures as comparing with current methods SPICKR, Calibur, and Durandal. The results confirmed that the identification of the first model were closer to the native structure in 12 of 43 cases versus four for SPICKER, and the same as the native structure in up to 27 of 43 cases versus 14 for Calibur and up to eight of 43 cases versus two for Durandal. In this study, we presented an improved method based on random forest classification to transform the problem of re-ranking the candidate structures by an binary classification. Our results indicate that this method is a powerful method for the problem and the effect of this method is better than other methods.</abstract><cop>England</cop><pub>BioMed Central Ltd</pub><pmid>31874596</pmid><doi>10.1186/s12859-019-3257-8</doi><tpages>1</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1471-2105
ispartof BMC bioinformatics, 2019-12, Vol.20 (Suppl 25), p.683-683, Article 683
issn 1471-2105
1471-2105
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_89ed5f274aaa4173aa8bdbd229a5a1a3
source Publicly Available Content Database; PubMed Central
subjects Accuracy
Algorithms
Analysis
Candidates
Classification
Cluster Analysis
Clustering
Comparative analysis
Decoys
Forecasts and trends
Free energy
Identification and classification
Methods
Protein Conformation
Protein structural prediction
Protein structure
Proteins
Proteins - chemistry
Proteomics
Random forest
Ranking
SPICKER
title Ranking near-native candidate protein structures via random forest classification
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T22%3A21%3A22IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Ranking%20near-native%20candidate%20protein%20structures%20via%20random%20forest%20classification&rft.jtitle=BMC%20bioinformatics&rft.au=Wu,%20Hongjie&rft.date=2019-12-24&rft.volume=20&rft.issue=Suppl%2025&rft.spage=683&rft.epage=683&rft.pages=683-683&rft.artnum=683&rft.issn=1471-2105&rft.eissn=1471-2105&rft_id=info:doi/10.1186/s12859-019-3257-8&rft_dat=%3Cgale_doaj_%3EA610270714%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c594t-eb87d815d00f2552d38c88ae714cd8a205049dc2aa1bb2a4713e02c15112732b3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2340660121&rft_id=info:pmid/31874596&rft_galeid=A610270714&rfr_iscdi=true