Loading…
Prediction of protein function using a deep convolutional neural network ensemble
The availability of large databases containing high resolution three-dimensional (3D) models of proteins in conjunction with functional annotation allows the exploitation of advanced supervised machine learning techniques for automatic protein function prediction. In this work, novel shape features...
Saved in:
Published in: | PeerJ. Computer science 2017-07, Vol.3, p.e124-17, Article e124 |
---|---|
Main Author: | |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c502t-2ed043094a0fef3952c279f4e9b8fdcedc7d744b8ce5e21fd92a5acf25da2bc33 |
---|---|
cites | cdi_FETCH-LOGICAL-c502t-2ed043094a0fef3952c279f4e9b8fdcedc7d744b8ce5e21fd92a5acf25da2bc33 |
container_end_page | 17 |
container_issue | |
container_start_page | e124 |
container_title | PeerJ. Computer science |
container_volume | 3 |
creator | Zacharaki, Evangelia I |
description | The availability of large databases containing high resolution three-dimensional (3D) models of proteins in conjunction with functional annotation allows the exploitation of advanced supervised machine learning techniques for automatic protein function prediction. In this work, novel shape features are extracted representing protein structure in the form of local (per amino acid) distribution of angles and amino acid distances, respectively. Each of the multi-channel feature maps is introduced into a deep convolutional neural network (CNN) for function prediction and the outputs are fused through support vector machines or a correlation-based k-nearest neighbor classifier. Two different architectures are investigated employing either one CNN per multi-channel feature set, or one CNN per image channel. Cross validation experiments on single-functional enzymes (n=44,661) from the PDB database achieved 90.1% correct classification, demonstrating an improvement over previous results on the same dataset when sequence similarity was not considered. The automatic prediction of protein function can provide quick annotations on extensive datasets opening the path for relevant applications, such as pharmacological target identification. The proposed method shows promise for structure-based protein function prediction, but sufficient data may not yet be available to properly assess the method's performance on non-homologous proteins and thus reduce the confounding factor of evolutionary relationships. |
doi_str_mv | 10.7717/peerj-cs.124 |
format | article |
fullrecord | <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_81d17cc6c71e497fac494d285c92fa92</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A498685477</galeid><doaj_id>oai_doaj_org_article_81d17cc6c71e497fac494d285c92fa92</doaj_id><sourcerecordid>A498685477</sourcerecordid><originalsourceid>FETCH-LOGICAL-c502t-2ed043094a0fef3952c279f4e9b8fdcedc7d744b8ce5e21fd92a5acf25da2bc33</originalsourceid><addsrcrecordid>eNptkk1rGzEQhpfSQkOaW3_AQk-BritppdXqaEKbGAz9Pgt5NHLlrleupE2bfx_ZW0oMlQ4jXj3zMjNMVb2mZCElle8OiHHXQFpQxp9VF6yVXSOUYs-fvF9WVyntCCFU0HLURfX5U0TrIfsw1sHVhxgy-rF20zhrU_Ljtja1RTzUEMb7MEzHDzPUI07xFPLvEH_WOCbcbwZ8Vb1wZkh49TdeVt8_vP92c9esP96ubpbrBgRhuWFoCW-J4oY4dK0SDJhUjqPa9M4CWpBWcr7pAQUy6qxiRhhwTFjDNtC2l9Vq9rXB7PQh-r2JDzoYr09CiFttYvYwoO6ppRKgA0mRK-kMcMUt6wUo5oxixet69vphhjOru-VaHzVCO96Llt_Twr6Z2TKrXxOmrHdhimUgSdPSRSdJz59QW1MK8KMLORrY-wR6yVXf9YJLWajFf6hyLe59mTY6X_SzhOuzhMJk_JO3ZkpJr75-OWffzizEkFJE968zSvRxZfRpZTSUwhlvHwG0p7PD</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1952670841</pqid></control><display><type>article</type><title>Prediction of protein function using a deep convolutional neural network ensemble</title><source>Publicly Available Content Database</source><source>PubMed Central</source><creator>Zacharaki, Evangelia I</creator><creatorcontrib>Zacharaki, Evangelia I</creatorcontrib><description>The availability of large databases containing high resolution three-dimensional (3D) models of proteins in conjunction with functional annotation allows the exploitation of advanced supervised machine learning techniques for automatic protein function prediction. In this work, novel shape features are extracted representing protein structure in the form of local (per amino acid) distribution of angles and amino acid distances, respectively. Each of the multi-channel feature maps is introduced into a deep convolutional neural network (CNN) for function prediction and the outputs are fused through support vector machines or a correlation-based k-nearest neighbor classifier. Two different architectures are investigated employing either one CNN per multi-channel feature set, or one CNN per image channel. Cross validation experiments on single-functional enzymes (n=44,661) from the PDB database achieved 90.1% correct classification, demonstrating an improvement over previous results on the same dataset when sequence similarity was not considered. The automatic prediction of protein function can provide quick annotations on extensive datasets opening the path for relevant applications, such as pharmacological target identification. The proposed method shows promise for structure-based protein function prediction, but sufficient data may not yet be available to properly assess the method's performance on non-homologous proteins and thus reduce the confounding factor of evolutionary relationships.</description><identifier>ISSN: 2376-5992</identifier><identifier>EISSN: 2376-5992</identifier><identifier>DOI: 10.7717/peerj-cs.124</identifier><language>eng</language><publisher>San Diego: PeerJ. Ltd</publisher><subject>Amino acids ; Analysis ; Annotations ; Applied research ; Artificial intelligence ; Artificial neural networks ; Bioinformatics ; Classification ; Computer Science ; Convolutional neural networks ; Deep learning ; Enzyme classification ; Enzymes ; Exploitation ; Feature extraction ; Feature maps ; Function predition ; Homology ; International conferences ; K-nearest neighbors algorithm ; Machine Learning ; Methods ; Molecular biology ; Pharmacology ; Protein structure prediction ; Proteins ; Structure representation ; Support vector machines ; Target recognition ; Three dimensional models ; Wavelet transforms</subject><ispartof>PeerJ. Computer science, 2017-07, Vol.3, p.e124-17, Article e124</ispartof><rights>COPYRIGHT 2017 PeerJ. Ltd.</rights><rights>2017 Zacharaki. This is an open access article distributed under the terms of the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c502t-2ed043094a0fef3952c279f4e9b8fdcedc7d744b8ce5e21fd92a5acf25da2bc33</citedby><cites>FETCH-LOGICAL-c502t-2ed043094a0fef3952c279f4e9b8fdcedc7d744b8ce5e21fd92a5acf25da2bc33</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/1952670841/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/1952670841?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,778,782,883,25740,27911,27912,36999,44577,74881</link.rule.ids><backlink>$$Uhttps://inria.hal.science/hal-01648534$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Zacharaki, Evangelia I</creatorcontrib><title>Prediction of protein function using a deep convolutional neural network ensemble</title><title>PeerJ. Computer science</title><description>The availability of large databases containing high resolution three-dimensional (3D) models of proteins in conjunction with functional annotation allows the exploitation of advanced supervised machine learning techniques for automatic protein function prediction. In this work, novel shape features are extracted representing protein structure in the form of local (per amino acid) distribution of angles and amino acid distances, respectively. Each of the multi-channel feature maps is introduced into a deep convolutional neural network (CNN) for function prediction and the outputs are fused through support vector machines or a correlation-based k-nearest neighbor classifier. Two different architectures are investigated employing either one CNN per multi-channel feature set, or one CNN per image channel. Cross validation experiments on single-functional enzymes (n=44,661) from the PDB database achieved 90.1% correct classification, demonstrating an improvement over previous results on the same dataset when sequence similarity was not considered. The automatic prediction of protein function can provide quick annotations on extensive datasets opening the path for relevant applications, such as pharmacological target identification. The proposed method shows promise for structure-based protein function prediction, but sufficient data may not yet be available to properly assess the method's performance on non-homologous proteins and thus reduce the confounding factor of evolutionary relationships.</description><subject>Amino acids</subject><subject>Analysis</subject><subject>Annotations</subject><subject>Applied research</subject><subject>Artificial intelligence</subject><subject>Artificial neural networks</subject><subject>Bioinformatics</subject><subject>Classification</subject><subject>Computer Science</subject><subject>Convolutional neural networks</subject><subject>Deep learning</subject><subject>Enzyme classification</subject><subject>Enzymes</subject><subject>Exploitation</subject><subject>Feature extraction</subject><subject>Feature maps</subject><subject>Function predition</subject><subject>Homology</subject><subject>International conferences</subject><subject>K-nearest neighbors algorithm</subject><subject>Machine Learning</subject><subject>Methods</subject><subject>Molecular biology</subject><subject>Pharmacology</subject><subject>Protein structure prediction</subject><subject>Proteins</subject><subject>Structure representation</subject><subject>Support vector machines</subject><subject>Target recognition</subject><subject>Three dimensional models</subject><subject>Wavelet transforms</subject><issn>2376-5992</issn><issn>2376-5992</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNptkk1rGzEQhpfSQkOaW3_AQk-BritppdXqaEKbGAz9Pgt5NHLlrleupE2bfx_ZW0oMlQ4jXj3zMjNMVb2mZCElle8OiHHXQFpQxp9VF6yVXSOUYs-fvF9WVyntCCFU0HLURfX5U0TrIfsw1sHVhxgy-rF20zhrU_Ljtja1RTzUEMb7MEzHDzPUI07xFPLvEH_WOCbcbwZ8Vb1wZkh49TdeVt8_vP92c9esP96ubpbrBgRhuWFoCW-J4oY4dK0SDJhUjqPa9M4CWpBWcr7pAQUy6qxiRhhwTFjDNtC2l9Vq9rXB7PQh-r2JDzoYr09CiFttYvYwoO6ppRKgA0mRK-kMcMUt6wUo5oxixet69vphhjOru-VaHzVCO96Llt_Twr6Z2TKrXxOmrHdhimUgSdPSRSdJz59QW1MK8KMLORrY-wR6yVXf9YJLWajFf6hyLe59mTY6X_SzhOuzhMJk_JO3ZkpJr75-OWffzizEkFJE968zSvRxZfRpZTSUwhlvHwG0p7PD</recordid><startdate>20170717</startdate><enddate>20170717</enddate><creator>Zacharaki, Evangelia I</creator><general>PeerJ. Ltd</general><general>PeerJ, Inc</general><general>PeerJ</general><general>PeerJ Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7XB</scope><scope>8AL</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>1XC</scope><scope>VOOES</scope><scope>DOA</scope></search><sort><creationdate>20170717</creationdate><title>Prediction of protein function using a deep convolutional neural network ensemble</title><author>Zacharaki, Evangelia I</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c502t-2ed043094a0fef3952c279f4e9b8fdcedc7d744b8ce5e21fd92a5acf25da2bc33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Amino acids</topic><topic>Analysis</topic><topic>Annotations</topic><topic>Applied research</topic><topic>Artificial intelligence</topic><topic>Artificial neural networks</topic><topic>Bioinformatics</topic><topic>Classification</topic><topic>Computer Science</topic><topic>Convolutional neural networks</topic><topic>Deep learning</topic><topic>Enzyme classification</topic><topic>Enzymes</topic><topic>Exploitation</topic><topic>Feature extraction</topic><topic>Feature maps</topic><topic>Function predition</topic><topic>Homology</topic><topic>International conferences</topic><topic>K-nearest neighbors algorithm</topic><topic>Machine Learning</topic><topic>Methods</topic><topic>Molecular biology</topic><topic>Pharmacology</topic><topic>Protein structure prediction</topic><topic>Proteins</topic><topic>Structure representation</topic><topic>Support vector machines</topic><topic>Target recognition</topic><topic>Three dimensional models</topic><topic>Wavelet transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zacharaki, Evangelia I</creatorcontrib><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Computing Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PeerJ. Computer science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zacharaki, Evangelia I</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Prediction of protein function using a deep convolutional neural network ensemble</atitle><jtitle>PeerJ. Computer science</jtitle><date>2017-07-17</date><risdate>2017</risdate><volume>3</volume><spage>e124</spage><epage>17</epage><pages>e124-17</pages><artnum>e124</artnum><issn>2376-5992</issn><eissn>2376-5992</eissn><abstract>The availability of large databases containing high resolution three-dimensional (3D) models of proteins in conjunction with functional annotation allows the exploitation of advanced supervised machine learning techniques for automatic protein function prediction. In this work, novel shape features are extracted representing protein structure in the form of local (per amino acid) distribution of angles and amino acid distances, respectively. Each of the multi-channel feature maps is introduced into a deep convolutional neural network (CNN) for function prediction and the outputs are fused through support vector machines or a correlation-based k-nearest neighbor classifier. Two different architectures are investigated employing either one CNN per multi-channel feature set, or one CNN per image channel. Cross validation experiments on single-functional enzymes (n=44,661) from the PDB database achieved 90.1% correct classification, demonstrating an improvement over previous results on the same dataset when sequence similarity was not considered. The automatic prediction of protein function can provide quick annotations on extensive datasets opening the path for relevant applications, such as pharmacological target identification. The proposed method shows promise for structure-based protein function prediction, but sufficient data may not yet be available to properly assess the method's performance on non-homologous proteins and thus reduce the confounding factor of evolutionary relationships.</abstract><cop>San Diego</cop><pub>PeerJ. Ltd</pub><doi>10.7717/peerj-cs.124</doi><tpages>e124</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2376-5992 |
ispartof | PeerJ. Computer science, 2017-07, Vol.3, p.e124-17, Article e124 |
issn | 2376-5992 2376-5992 |
language | eng |
recordid | cdi_doaj_primary_oai_doaj_org_article_81d17cc6c71e497fac494d285c92fa92 |
source | Publicly Available Content Database; PubMed Central |
subjects | Amino acids Analysis Annotations Applied research Artificial intelligence Artificial neural networks Bioinformatics Classification Computer Science Convolutional neural networks Deep learning Enzyme classification Enzymes Exploitation Feature extraction Feature maps Function predition Homology International conferences K-nearest neighbors algorithm Machine Learning Methods Molecular biology Pharmacology Protein structure prediction Proteins Structure representation Support vector machines Target recognition Three dimensional models Wavelet transforms |
title | Prediction of protein function using a deep convolutional neural network ensemble |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T16%3A40%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Prediction%20of%20protein%20function%20using%20a%20deep%20convolutional%20neural%20network%20ensemble&rft.jtitle=PeerJ.%20Computer%20science&rft.au=Zacharaki,%20Evangelia%20I&rft.date=2017-07-17&rft.volume=3&rft.spage=e124&rft.epage=17&rft.pages=e124-17&rft.artnum=e124&rft.issn=2376-5992&rft.eissn=2376-5992&rft_id=info:doi/10.7717/peerj-cs.124&rft_dat=%3Cgale_doaj_%3EA498685477%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c502t-2ed043094a0fef3952c279f4e9b8fdcedc7d744b8ce5e21fd92a5acf25da2bc33%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1952670841&rft_id=info:pmid/&rft_galeid=A498685477&rfr_iscdi=true |