Loading…
ProFitFun: a protein tertiary structure fitness function for quantifying the accuracies of model structures
Abstract Motivation An accurate estimation of the quality of protein model structures typifies as a cornerstone in protein structure prediction regimes. Despite the recent groundbreaking success in the field of protein structure prediction, there are certain prospects for the improvement in model qu...
Saved in:
Published in: | Bioinformatics 2022-01, Vol.38 (2), p.369-376 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Request full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c353t-c45d691e2e9d3b92f461770f0c8c4d59bb55289d8a2b13e192b7323f59a62e863 |
---|---|
cites | cdi_FETCH-LOGICAL-c353t-c45d691e2e9d3b92f461770f0c8c4d59bb55289d8a2b13e192b7323f59a62e863 |
container_end_page | 376 |
container_issue | 2 |
container_start_page | 369 |
container_title | Bioinformatics |
container_volume | 38 |
creator | Kaushik, Rahul Zhang, Kam Y J |
description | Abstract
Motivation
An accurate estimation of the quality of protein model structures typifies as a cornerstone in protein structure prediction regimes. Despite the recent groundbreaking success in the field of protein structure prediction, there are certain prospects for the improvement in model quality estimation at multiple stages of protein structure prediction and thus, to further push the prediction accuracy. Here, a novel approach, named ProFitFun, for assessing the quality of protein models is proposed by harnessing the sequence and structural features of experimental protein structures in terms of the preferences of backbone dihedral angles and relative surface accessibility of their amino acid residues at the tripeptide level. The proposed approach leverages upon the backbone dihedral angle and surface accessibility preferences of the residues by accounting for its N-terminal and C-terminal neighbors in the protein structure. These preferences are used to evaluate protein structures through a machine learning approach and tested on an extensive dataset of diverse proteins.
Results
The approach was extensively validated on a large test dataset (n = 25 005) of protein structures, comprising 23 661 models of 82 non-homologous proteins and 1344 non-homologous experimental structures. In addition, an external dataset of 40 000 models of 200 non-homologous proteins was also used for the validation of the proposed method. Both datasets were further used for benchmarking the proposed method with four different state-of-the-art methods for protein structure quality assessment. In the benchmarking, the proposed method outperformed some state-of-the-art methods in terms of Spearman’s and Pearson’s correlation coefficients, average GDT-TS loss, sum of z-scores and average absolute difference of predictions over corresponding observed values. The high accuracy of the proposed approach promises a potential use of the sequence and structural features in computational protein design.
Availability and implementation
http://github.com/KYZ-LSB/ProTerS-FitFun.
Supplementary information
Supplementary data are available at Bioinformatics online. |
doi_str_mv | 10.1093/bioinformatics/btab666 |
format | article |
fullrecord | <record><control><sourceid>proquest_TOX</sourceid><recordid>TN_cdi_proquest_miscellaneous_2574740264</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/btab666</oup_id><sourcerecordid>2574740264</sourcerecordid><originalsourceid>FETCH-LOGICAL-c353t-c45d691e2e9d3b92f461770f0c8c4d59bb55289d8a2b13e192b7323f59a62e863</originalsourceid><addsrcrecordid>eNqNkLtOHDEUhq0oKMAmr7BymWZY32ecLkJZQEJKClKPbM9xYtixF18K3p6JdgGlozqn-C_6P4TWlFxQovnGhhSiT3k2NbiysdVYpdQHdEaFIh0jUn9cfq76TgyEn6LzUu4JkVQI8QmdciEFU0SdoYdfOW1D3bb4DRu8z6lCiLhCrsHkJ1xqbq62DNiHGqEU7Ft0NaSIl3L82EyswT-F-AfXv4CNcy0bF6Dg5PGcJti9RZTP6MSbXYEvx7tCv7c_7i6vu9ufVzeX3287xyWvnRNyUpoCAz1xq5kXivY98cQNTkxSWyslG_Q0GGYpB6qZ7TnjXmqjGAyKr9DXQ-4y57FBqeMcioPdzkRIrYxM9qIXhCmxSNVB6nIqJYMf9znMy_KRkvEf6PF_0OMR9GJcHzuanWF6tb2QXQT0IEht_97QZ_utk2A</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2574740264</pqid></control><display><type>article</type><title>ProFitFun: a protein tertiary structure fitness function for quantifying the accuracies of model structures</title><source>Open Access: Oxford University Press Open Journals</source><creator>Kaushik, Rahul ; Zhang, Kam Y J</creator><contributor>Gorodkin, Jan</contributor><creatorcontrib>Kaushik, Rahul ; Zhang, Kam Y J ; Gorodkin, Jan</creatorcontrib><description>Abstract
Motivation
An accurate estimation of the quality of protein model structures typifies as a cornerstone in protein structure prediction regimes. Despite the recent groundbreaking success in the field of protein structure prediction, there are certain prospects for the improvement in model quality estimation at multiple stages of protein structure prediction and thus, to further push the prediction accuracy. Here, a novel approach, named ProFitFun, for assessing the quality of protein models is proposed by harnessing the sequence and structural features of experimental protein structures in terms of the preferences of backbone dihedral angles and relative surface accessibility of their amino acid residues at the tripeptide level. The proposed approach leverages upon the backbone dihedral angle and surface accessibility preferences of the residues by accounting for its N-terminal and C-terminal neighbors in the protein structure. These preferences are used to evaluate protein structures through a machine learning approach and tested on an extensive dataset of diverse proteins.
Results
The approach was extensively validated on a large test dataset (n = 25 005) of protein structures, comprising 23 661 models of 82 non-homologous proteins and 1344 non-homologous experimental structures. In addition, an external dataset of 40 000 models of 200 non-homologous proteins was also used for the validation of the proposed method. Both datasets were further used for benchmarking the proposed method with four different state-of-the-art methods for protein structure quality assessment. In the benchmarking, the proposed method outperformed some state-of-the-art methods in terms of Spearman’s and Pearson’s correlation coefficients, average GDT-TS loss, sum of z-scores and average absolute difference of predictions over corresponding observed values. The high accuracy of the proposed approach promises a potential use of the sequence and structural features in computational protein design.
Availability and implementation
http://github.com/KYZ-LSB/ProTerS-FitFun.
Supplementary information
Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/btab666</identifier><identifier>PMID: 34542606</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><subject>Amino Acids ; Computational Biology - methods ; Machine Learning ; Proteins - chemistry</subject><ispartof>Bioinformatics, 2022-01, Vol.38 (2), p.369-376</ispartof><rights>The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com 2021</rights><rights>The Author(s) 2021. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c353t-c45d691e2e9d3b92f461770f0c8c4d59bb55289d8a2b13e192b7323f59a62e863</citedby><cites>FETCH-LOGICAL-c353t-c45d691e2e9d3b92f461770f0c8c4d59bb55289d8a2b13e192b7323f59a62e863</cites><orcidid>0000-0002-9282-8045</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,1604,27924,27925</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bioinformatics/btab666$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34542606$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Gorodkin, Jan</contributor><creatorcontrib>Kaushik, Rahul</creatorcontrib><creatorcontrib>Zhang, Kam Y J</creatorcontrib><title>ProFitFun: a protein tertiary structure fitness function for quantifying the accuracies of model structures</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Abstract
Motivation
An accurate estimation of the quality of protein model structures typifies as a cornerstone in protein structure prediction regimes. Despite the recent groundbreaking success in the field of protein structure prediction, there are certain prospects for the improvement in model quality estimation at multiple stages of protein structure prediction and thus, to further push the prediction accuracy. Here, a novel approach, named ProFitFun, for assessing the quality of protein models is proposed by harnessing the sequence and structural features of experimental protein structures in terms of the preferences of backbone dihedral angles and relative surface accessibility of their amino acid residues at the tripeptide level. The proposed approach leverages upon the backbone dihedral angle and surface accessibility preferences of the residues by accounting for its N-terminal and C-terminal neighbors in the protein structure. These preferences are used to evaluate protein structures through a machine learning approach and tested on an extensive dataset of diverse proteins.
Results
The approach was extensively validated on a large test dataset (n = 25 005) of protein structures, comprising 23 661 models of 82 non-homologous proteins and 1344 non-homologous experimental structures. In addition, an external dataset of 40 000 models of 200 non-homologous proteins was also used for the validation of the proposed method. Both datasets were further used for benchmarking the proposed method with four different state-of-the-art methods for protein structure quality assessment. In the benchmarking, the proposed method outperformed some state-of-the-art methods in terms of Spearman’s and Pearson’s correlation coefficients, average GDT-TS loss, sum of z-scores and average absolute difference of predictions over corresponding observed values. The high accuracy of the proposed approach promises a potential use of the sequence and structural features in computational protein design.
Availability and implementation
http://github.com/KYZ-LSB/ProTerS-FitFun.
Supplementary information
Supplementary data are available at Bioinformatics online.</description><subject>Amino Acids</subject><subject>Computational Biology - methods</subject><subject>Machine Learning</subject><subject>Proteins - chemistry</subject><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><recordid>eNqNkLtOHDEUhq0oKMAmr7BymWZY32ecLkJZQEJKClKPbM9xYtixF18K3p6JdgGlozqn-C_6P4TWlFxQovnGhhSiT3k2NbiysdVYpdQHdEaFIh0jUn9cfq76TgyEn6LzUu4JkVQI8QmdciEFU0SdoYdfOW1D3bb4DRu8z6lCiLhCrsHkJ1xqbq62DNiHGqEU7Ft0NaSIl3L82EyswT-F-AfXv4CNcy0bF6Dg5PGcJti9RZTP6MSbXYEvx7tCv7c_7i6vu9ufVzeX3287xyWvnRNyUpoCAz1xq5kXivY98cQNTkxSWyslG_Q0GGYpB6qZ7TnjXmqjGAyKr9DXQ-4y57FBqeMcioPdzkRIrYxM9qIXhCmxSNVB6nIqJYMf9znMy_KRkvEf6PF_0OMR9GJcHzuanWF6tb2QXQT0IEht_97QZ_utk2A</recordid><startdate>20220103</startdate><enddate>20220103</enddate><creator>Kaushik, Rahul</creator><creator>Zhang, Kam Y J</creator><general>Oxford University Press</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-9282-8045</orcidid></search><sort><creationdate>20220103</creationdate><title>ProFitFun: a protein tertiary structure fitness function for quantifying the accuracies of model structures</title><author>Kaushik, Rahul ; Zhang, Kam Y J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c353t-c45d691e2e9d3b92f461770f0c8c4d59bb55289d8a2b13e192b7323f59a62e863</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Amino Acids</topic><topic>Computational Biology - methods</topic><topic>Machine Learning</topic><topic>Proteins - chemistry</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kaushik, Rahul</creatorcontrib><creatorcontrib>Zhang, Kam Y J</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kaushik, Rahul</au><au>Zhang, Kam Y J</au><au>Gorodkin, Jan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>ProFitFun: a protein tertiary structure fitness function for quantifying the accuracies of model structures</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2022-01-03</date><risdate>2022</risdate><volume>38</volume><issue>2</issue><spage>369</spage><epage>376</epage><pages>369-376</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><abstract>Abstract
Motivation
An accurate estimation of the quality of protein model structures typifies as a cornerstone in protein structure prediction regimes. Despite the recent groundbreaking success in the field of protein structure prediction, there are certain prospects for the improvement in model quality estimation at multiple stages of protein structure prediction and thus, to further push the prediction accuracy. Here, a novel approach, named ProFitFun, for assessing the quality of protein models is proposed by harnessing the sequence and structural features of experimental protein structures in terms of the preferences of backbone dihedral angles and relative surface accessibility of their amino acid residues at the tripeptide level. The proposed approach leverages upon the backbone dihedral angle and surface accessibility preferences of the residues by accounting for its N-terminal and C-terminal neighbors in the protein structure. These preferences are used to evaluate protein structures through a machine learning approach and tested on an extensive dataset of diverse proteins.
Results
The approach was extensively validated on a large test dataset (n = 25 005) of protein structures, comprising 23 661 models of 82 non-homologous proteins and 1344 non-homologous experimental structures. In addition, an external dataset of 40 000 models of 200 non-homologous proteins was also used for the validation of the proposed method. Both datasets were further used for benchmarking the proposed method with four different state-of-the-art methods for protein structure quality assessment. In the benchmarking, the proposed method outperformed some state-of-the-art methods in terms of Spearman’s and Pearson’s correlation coefficients, average GDT-TS loss, sum of z-scores and average absolute difference of predictions over corresponding observed values. The high accuracy of the proposed approach promises a potential use of the sequence and structural features in computational protein design.
Availability and implementation
http://github.com/KYZ-LSB/ProTerS-FitFun.
Supplementary information
Supplementary data are available at Bioinformatics online.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>34542606</pmid><doi>10.1093/bioinformatics/btab666</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0002-9282-8045</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1367-4803 |
ispartof | Bioinformatics, 2022-01, Vol.38 (2), p.369-376 |
issn | 1367-4803 1460-2059 1367-4811 |
language | eng |
recordid | cdi_proquest_miscellaneous_2574740264 |
source | Open Access: Oxford University Press Open Journals |
subjects | Amino Acids Computational Biology - methods Machine Learning Proteins - chemistry |
title | ProFitFun: a protein tertiary structure fitness function for quantifying the accuracies of model structures |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T07%3A53%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=ProFitFun:%20a%20protein%20tertiary%20structure%20fitness%20function%20for%20quantifying%20the%20accuracies%20of%20model%20structures&rft.jtitle=Bioinformatics&rft.au=Kaushik,%20Rahul&rft.date=2022-01-03&rft.volume=38&rft.issue=2&rft.spage=369&rft.epage=376&rft.pages=369-376&rft.issn=1367-4803&rft.eissn=1460-2059&rft_id=info:doi/10.1093/bioinformatics/btab666&rft_dat=%3Cproquest_TOX%3E2574740264%3C/proquest_TOX%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c353t-c45d691e2e9d3b92f461770f0c8c4d59bb55289d8a2b13e192b7323f59a62e863%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2574740264&rft_id=info:pmid/34542606&rft_oup_id=10.1093/bioinformatics/btab666&rfr_iscdi=true |