Loading…

Prediction of protein function using a deep convolutional neural network ensemble

The availability of large databases containing high resolution three-dimensional (3D) models of proteins in conjunction with functional annotation allows the exploitation of advanced supervised machine learning techniques for automatic protein function prediction. In this work, novel shape features...

Full description

Saved in:
Bibliographic Details
Published in:PeerJ. Computer science 2017-07, Vol.3, p.e124-17, Article e124
Main Author: Zacharaki, Evangelia I
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c502t-2ed043094a0fef3952c279f4e9b8fdcedc7d744b8ce5e21fd92a5acf25da2bc33
cites cdi_FETCH-LOGICAL-c502t-2ed043094a0fef3952c279f4e9b8fdcedc7d744b8ce5e21fd92a5acf25da2bc33
container_end_page 17
container_issue
container_start_page e124
container_title PeerJ. Computer science
container_volume 3
creator Zacharaki, Evangelia I
description The availability of large databases containing high resolution three-dimensional (3D) models of proteins in conjunction with functional annotation allows the exploitation of advanced supervised machine learning techniques for automatic protein function prediction. In this work, novel shape features are extracted representing protein structure in the form of local (per amino acid) distribution of angles and amino acid distances, respectively. Each of the multi-channel feature maps is introduced into a deep convolutional neural network (CNN) for function prediction and the outputs are fused through support vector machines or a correlation-based k-nearest neighbor classifier. Two different architectures are investigated employing either one CNN per multi-channel feature set, or one CNN per image channel. Cross validation experiments on single-functional enzymes (n=44,661) from the PDB database achieved 90.1% correct classification, demonstrating an improvement over previous results on the same dataset when sequence similarity was not considered. The automatic prediction of protein function can provide quick annotations on extensive datasets opening the path for relevant applications, such as pharmacological target identification. The proposed method shows promise for structure-based protein function prediction, but sufficient data may not yet be available to properly assess the method's performance on non-homologous proteins and thus reduce the confounding factor of evolutionary relationships.
doi_str_mv 10.7717/peerj-cs.124
format article
fullrecord <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_81d17cc6c71e497fac494d285c92fa92</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A498685477</galeid><doaj_id>oai_doaj_org_article_81d17cc6c71e497fac494d285c92fa92</doaj_id><sourcerecordid>A498685477</sourcerecordid><originalsourceid>FETCH-LOGICAL-c502t-2ed043094a0fef3952c279f4e9b8fdcedc7d744b8ce5e21fd92a5acf25da2bc33</originalsourceid><addsrcrecordid>eNptkk1rGzEQhpfSQkOaW3_AQk-BritppdXqaEKbGAz9Pgt5NHLlrleupE2bfx_ZW0oMlQ4jXj3zMjNMVb2mZCElle8OiHHXQFpQxp9VF6yVXSOUYs-fvF9WVyntCCFU0HLURfX5U0TrIfsw1sHVhxgy-rF20zhrU_Ljtja1RTzUEMb7MEzHDzPUI07xFPLvEH_WOCbcbwZ8Vb1wZkh49TdeVt8_vP92c9esP96ubpbrBgRhuWFoCW-J4oY4dK0SDJhUjqPa9M4CWpBWcr7pAQUy6qxiRhhwTFjDNtC2l9Vq9rXB7PQh-r2JDzoYr09CiFttYvYwoO6ppRKgA0mRK-kMcMUt6wUo5oxixet69vphhjOru-VaHzVCO96Llt_Twr6Z2TKrXxOmrHdhimUgSdPSRSdJz59QW1MK8KMLORrY-wR6yVXf9YJLWajFf6hyLe59mTY6X_SzhOuzhMJk_JO3ZkpJr75-OWffzizEkFJE968zSvRxZfRpZTSUwhlvHwG0p7PD</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1952670841</pqid></control><display><type>article</type><title>Prediction of protein function using a deep convolutional neural network ensemble</title><source>Publicly Available Content Database</source><source>PubMed Central</source><creator>Zacharaki, Evangelia I</creator><creatorcontrib>Zacharaki, Evangelia I</creatorcontrib><description>The availability of large databases containing high resolution three-dimensional (3D) models of proteins in conjunction with functional annotation allows the exploitation of advanced supervised machine learning techniques for automatic protein function prediction. In this work, novel shape features are extracted representing protein structure in the form of local (per amino acid) distribution of angles and amino acid distances, respectively. Each of the multi-channel feature maps is introduced into a deep convolutional neural network (CNN) for function prediction and the outputs are fused through support vector machines or a correlation-based k-nearest neighbor classifier. Two different architectures are investigated employing either one CNN per multi-channel feature set, or one CNN per image channel. Cross validation experiments on single-functional enzymes (n=44,661) from the PDB database achieved 90.1% correct classification, demonstrating an improvement over previous results on the same dataset when sequence similarity was not considered. The automatic prediction of protein function can provide quick annotations on extensive datasets opening the path for relevant applications, such as pharmacological target identification. The proposed method shows promise for structure-based protein function prediction, but sufficient data may not yet be available to properly assess the method's performance on non-homologous proteins and thus reduce the confounding factor of evolutionary relationships.</description><identifier>ISSN: 2376-5992</identifier><identifier>EISSN: 2376-5992</identifier><identifier>DOI: 10.7717/peerj-cs.124</identifier><language>eng</language><publisher>San Diego: PeerJ. Ltd</publisher><subject>Amino acids ; Analysis ; Annotations ; Applied research ; Artificial intelligence ; Artificial neural networks ; Bioinformatics ; Classification ; Computer Science ; Convolutional neural networks ; Deep learning ; Enzyme classification ; Enzymes ; Exploitation ; Feature extraction ; Feature maps ; Function predition ; Homology ; International conferences ; K-nearest neighbors algorithm ; Machine Learning ; Methods ; Molecular biology ; Pharmacology ; Protein structure prediction ; Proteins ; Structure representation ; Support vector machines ; Target recognition ; Three dimensional models ; Wavelet transforms</subject><ispartof>PeerJ. Computer science, 2017-07, Vol.3, p.e124-17, Article e124</ispartof><rights>COPYRIGHT 2017 PeerJ. Ltd.</rights><rights>2017 Zacharaki. This is an open access article distributed under the terms of the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c502t-2ed043094a0fef3952c279f4e9b8fdcedc7d744b8ce5e21fd92a5acf25da2bc33</citedby><cites>FETCH-LOGICAL-c502t-2ed043094a0fef3952c279f4e9b8fdcedc7d744b8ce5e21fd92a5acf25da2bc33</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/1952670841/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/1952670841?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,778,782,883,25740,27911,27912,36999,44577,74881</link.rule.ids><backlink>$$Uhttps://inria.hal.science/hal-01648534$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Zacharaki, Evangelia I</creatorcontrib><title>Prediction of protein function using a deep convolutional neural network ensemble</title><title>PeerJ. Computer science</title><description>The availability of large databases containing high resolution three-dimensional (3D) models of proteins in conjunction with functional annotation allows the exploitation of advanced supervised machine learning techniques for automatic protein function prediction. In this work, novel shape features are extracted representing protein structure in the form of local (per amino acid) distribution of angles and amino acid distances, respectively. Each of the multi-channel feature maps is introduced into a deep convolutional neural network (CNN) for function prediction and the outputs are fused through support vector machines or a correlation-based k-nearest neighbor classifier. Two different architectures are investigated employing either one CNN per multi-channel feature set, or one CNN per image channel. Cross validation experiments on single-functional enzymes (n=44,661) from the PDB database achieved 90.1% correct classification, demonstrating an improvement over previous results on the same dataset when sequence similarity was not considered. The automatic prediction of protein function can provide quick annotations on extensive datasets opening the path for relevant applications, such as pharmacological target identification. The proposed method shows promise for structure-based protein function prediction, but sufficient data may not yet be available to properly assess the method's performance on non-homologous proteins and thus reduce the confounding factor of evolutionary relationships.</description><subject>Amino acids</subject><subject>Analysis</subject><subject>Annotations</subject><subject>Applied research</subject><subject>Artificial intelligence</subject><subject>Artificial neural networks</subject><subject>Bioinformatics</subject><subject>Classification</subject><subject>Computer Science</subject><subject>Convolutional neural networks</subject><subject>Deep learning</subject><subject>Enzyme classification</subject><subject>Enzymes</subject><subject>Exploitation</subject><subject>Feature extraction</subject><subject>Feature maps</subject><subject>Function predition</subject><subject>Homology</subject><subject>International conferences</subject><subject>K-nearest neighbors algorithm</subject><subject>Machine Learning</subject><subject>Methods</subject><subject>Molecular biology</subject><subject>Pharmacology</subject><subject>Protein structure prediction</subject><subject>Proteins</subject><subject>Structure representation</subject><subject>Support vector machines</subject><subject>Target recognition</subject><subject>Three dimensional models</subject><subject>Wavelet transforms</subject><issn>2376-5992</issn><issn>2376-5992</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNptkk1rGzEQhpfSQkOaW3_AQk-BritppdXqaEKbGAz9Pgt5NHLlrleupE2bfx_ZW0oMlQ4jXj3zMjNMVb2mZCElle8OiHHXQFpQxp9VF6yVXSOUYs-fvF9WVyntCCFU0HLURfX5U0TrIfsw1sHVhxgy-rF20zhrU_Ljtja1RTzUEMb7MEzHDzPUI07xFPLvEH_WOCbcbwZ8Vb1wZkh49TdeVt8_vP92c9esP96ubpbrBgRhuWFoCW-J4oY4dK0SDJhUjqPa9M4CWpBWcr7pAQUy6qxiRhhwTFjDNtC2l9Vq9rXB7PQh-r2JDzoYr09CiFttYvYwoO6ppRKgA0mRK-kMcMUt6wUo5oxixet69vphhjOru-VaHzVCO96Llt_Twr6Z2TKrXxOmrHdhimUgSdPSRSdJz59QW1MK8KMLORrY-wR6yVXf9YJLWajFf6hyLe59mTY6X_SzhOuzhMJk_JO3ZkpJr75-OWffzizEkFJE968zSvRxZfRpZTSUwhlvHwG0p7PD</recordid><startdate>20170717</startdate><enddate>20170717</enddate><creator>Zacharaki, Evangelia I</creator><general>PeerJ. Ltd</general><general>PeerJ, Inc</general><general>PeerJ</general><general>PeerJ Inc</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7XB</scope><scope>8AL</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope><scope>1XC</scope><scope>VOOES</scope><scope>DOA</scope></search><sort><creationdate>20170717</creationdate><title>Prediction of protein function using a deep convolutional neural network ensemble</title><author>Zacharaki, Evangelia I</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c502t-2ed043094a0fef3952c279f4e9b8fdcedc7d744b8ce5e21fd92a5acf25da2bc33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Amino acids</topic><topic>Analysis</topic><topic>Annotations</topic><topic>Applied research</topic><topic>Artificial intelligence</topic><topic>Artificial neural networks</topic><topic>Bioinformatics</topic><topic>Classification</topic><topic>Computer Science</topic><topic>Convolutional neural networks</topic><topic>Deep learning</topic><topic>Enzyme classification</topic><topic>Enzymes</topic><topic>Exploitation</topic><topic>Feature extraction</topic><topic>Feature maps</topic><topic>Function predition</topic><topic>Homology</topic><topic>International conferences</topic><topic>K-nearest neighbors algorithm</topic><topic>Machine Learning</topic><topic>Methods</topic><topic>Molecular biology</topic><topic>Pharmacology</topic><topic>Protein structure prediction</topic><topic>Proteins</topic><topic>Structure representation</topic><topic>Support vector machines</topic><topic>Target recognition</topic><topic>Three dimensional models</topic><topic>Wavelet transforms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zacharaki, Evangelia I</creatorcontrib><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Computing Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>PeerJ. Computer science</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zacharaki, Evangelia I</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Prediction of protein function using a deep convolutional neural network ensemble</atitle><jtitle>PeerJ. Computer science</jtitle><date>2017-07-17</date><risdate>2017</risdate><volume>3</volume><spage>e124</spage><epage>17</epage><pages>e124-17</pages><artnum>e124</artnum><issn>2376-5992</issn><eissn>2376-5992</eissn><abstract>The availability of large databases containing high resolution three-dimensional (3D) models of proteins in conjunction with functional annotation allows the exploitation of advanced supervised machine learning techniques for automatic protein function prediction. In this work, novel shape features are extracted representing protein structure in the form of local (per amino acid) distribution of angles and amino acid distances, respectively. Each of the multi-channel feature maps is introduced into a deep convolutional neural network (CNN) for function prediction and the outputs are fused through support vector machines or a correlation-based k-nearest neighbor classifier. Two different architectures are investigated employing either one CNN per multi-channel feature set, or one CNN per image channel. Cross validation experiments on single-functional enzymes (n=44,661) from the PDB database achieved 90.1% correct classification, demonstrating an improvement over previous results on the same dataset when sequence similarity was not considered. The automatic prediction of protein function can provide quick annotations on extensive datasets opening the path for relevant applications, such as pharmacological target identification. The proposed method shows promise for structure-based protein function prediction, but sufficient data may not yet be available to properly assess the method's performance on non-homologous proteins and thus reduce the confounding factor of evolutionary relationships.</abstract><cop>San Diego</cop><pub>PeerJ. Ltd</pub><doi>10.7717/peerj-cs.124</doi><tpages>e124</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2376-5992
ispartof PeerJ. Computer science, 2017-07, Vol.3, p.e124-17, Article e124
issn 2376-5992
2376-5992
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_81d17cc6c71e497fac494d285c92fa92
source Publicly Available Content Database; PubMed Central
subjects Amino acids
Analysis
Annotations
Applied research
Artificial intelligence
Artificial neural networks
Bioinformatics
Classification
Computer Science
Convolutional neural networks
Deep learning
Enzyme classification
Enzymes
Exploitation
Feature extraction
Feature maps
Function predition
Homology
International conferences
K-nearest neighbors algorithm
Machine Learning
Methods
Molecular biology
Pharmacology
Protein structure prediction
Proteins
Structure representation
Support vector machines
Target recognition
Three dimensional models
Wavelet transforms
title Prediction of protein function using a deep convolutional neural network ensemble
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T16%3A40%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Prediction%20of%20protein%20function%20using%20a%20deep%20convolutional%20neural%20network%20ensemble&rft.jtitle=PeerJ.%20Computer%20science&rft.au=Zacharaki,%20Evangelia%20I&rft.date=2017-07-17&rft.volume=3&rft.spage=e124&rft.epage=17&rft.pages=e124-17&rft.artnum=e124&rft.issn=2376-5992&rft.eissn=2376-5992&rft_id=info:doi/10.7717/peerj-cs.124&rft_dat=%3Cgale_doaj_%3EA498685477%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c502t-2ed043094a0fef3952c279f4e9b8fdcedc7d744b8ce5e21fd92a5acf25da2bc33%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1952670841&rft_id=info:pmid/&rft_galeid=A498685477&rfr_iscdi=true