Loading…

Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods

DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefo...

Full description

Saved in:
Bibliographic Details
Published in:Molecules (Basel, Switzerland) Switzerland), 2017-09, Vol.22 (10), p.1602
Main Authors: Qu, Kaiyang, Han, Ke, Wu, Song, Wang, Guohua, Wei, Leyi
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c493t-e0d00e190d4bba44557a5f48c9318de1c9f5921fdaf3cd85663a6f59b20fe8f13
cites cdi_FETCH-LOGICAL-c493t-e0d00e190d4bba44557a5f48c9318de1c9f5921fdaf3cd85663a6f59b20fe8f13
container_end_page
container_issue 10
container_start_page 1602
container_title Molecules (Basel, Switzerland)
container_volume 22
creator Qu, Kaiyang
Han, Ke
Wu, Song
Wang, Guohua
Wei, Leyi
description DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF), is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features.
doi_str_mv 10.3390/molecules22101602
format article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_88cb3b5406fd4d01813b16bf4a838026</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_88cb3b5406fd4d01813b16bf4a838026</doaj_id><sourcerecordid>1965692921</sourcerecordid><originalsourceid>FETCH-LOGICAL-c493t-e0d00e190d4bba44557a5f48c9318de1c9f5921fdaf3cd85663a6f59b20fe8f13</originalsourceid><addsrcrecordid>eNplkV9vFSEQxYnR2Fr9AL6YTXzxZZUBloUXk1qt3qT1X-wzYWG45WbvcoVdo99e6q1Nq0_AcM4vc2YIeQr0JeeavtqmEd0yYmEMKEjK7pFDEIy2nAp9_9b9gDwqZUMpAwHdQ3LAlOa9FP0h-bLyOM0xRGfnmKYmhebtx-P2TZx8nNbN55xmjFNpLsrV8zz-RN-cop2XjM1X3GUs1b63nuN8mXx5TB4EOxZ8cn0ekYvTd99OPrRnn96vTo7PWic0n1uknlIETb0YBitE1_W2C0I5zUF5BKdDpxkEbwN3XnVScitraWA0oArAj8hqz_XJbswux63Nv0yy0fwppLw2Ns_RjWiUcgMfOkFl8MJTUMAHkEMQVnFFmays13vWbhm26F3NlO14B3r3Z4qXZp1-GAkd1M4r4MU1IKfvC5bZbGNxOI52wrQUA1ow2Qveqyp9_o90k5Y81VFVleykZjV2VcFe5XIqJWO4aQaouVq--W_51fPsdoobx99t89-FLq0D</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1965692921</pqid></control><display><type>article</type><title>Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods</title><source>Open Access: PubMed Central</source><source>Publicly Available Content Database</source><creator>Qu, Kaiyang ; Han, Ke ; Wu, Song ; Wang, Guohua ; Wei, Leyi</creator><creatorcontrib>Qu, Kaiyang ; Han, Ke ; Wu, Song ; Wang, Guohua ; Wei, Leyi</creatorcontrib><description>DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF), is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features.</description><identifier>ISSN: 1420-3049</identifier><identifier>EISSN: 1420-3049</identifier><identifier>DOI: 10.3390/molecules22101602</identifier><identifier>PMID: 28937647</identifier><language>eng</language><publisher>Switzerland: MDPI AG</publisher><subject>Amino Acid Sequence ; Computational Biology - methods ; Deoxyribonucleic acid ; DNA ; DNA - chemistry ; DNA biosynthesis ; DNA-binding protein ; DNA-Binding Proteins - metabolism ; Gene regulation ; Identification methods ; Information theory ; Learning algorithms ; Machine Learning ; Methods ; mixed feature representation methods ; Packaging ; Proteins ; Representations ; Support Vector Machine ; Test procedures ; Transcription</subject><ispartof>Molecules (Basel, Switzerland), 2017-09, Vol.22 (10), p.1602</ispartof><rights>Copyright MDPI AG 2017</rights><rights>2017 by the authors. 2017</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c493t-e0d00e190d4bba44557a5f48c9318de1c9f5921fdaf3cd85663a6f59b20fe8f13</citedby><cites>FETCH-LOGICAL-c493t-e0d00e190d4bba44557a5f48c9318de1c9f5921fdaf3cd85663a6f59b20fe8f13</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/1965692921/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/1965692921?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>230,314,727,780,784,885,25753,27924,27925,37012,37013,44590,53791,53793,75126</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/28937647$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Qu, Kaiyang</creatorcontrib><creatorcontrib>Han, Ke</creatorcontrib><creatorcontrib>Wu, Song</creatorcontrib><creatorcontrib>Wang, Guohua</creatorcontrib><creatorcontrib>Wei, Leyi</creatorcontrib><title>Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods</title><title>Molecules (Basel, Switzerland)</title><addtitle>Molecules</addtitle><description>DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF), is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features.</description><subject>Amino Acid Sequence</subject><subject>Computational Biology - methods</subject><subject>Deoxyribonucleic acid</subject><subject>DNA</subject><subject>DNA - chemistry</subject><subject>DNA biosynthesis</subject><subject>DNA-binding protein</subject><subject>DNA-Binding Proteins - metabolism</subject><subject>Gene regulation</subject><subject>Identification methods</subject><subject>Information theory</subject><subject>Learning algorithms</subject><subject>Machine Learning</subject><subject>Methods</subject><subject>mixed feature representation methods</subject><subject>Packaging</subject><subject>Proteins</subject><subject>Representations</subject><subject>Support Vector Machine</subject><subject>Test procedures</subject><subject>Transcription</subject><issn>1420-3049</issn><issn>1420-3049</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNplkV9vFSEQxYnR2Fr9AL6YTXzxZZUBloUXk1qt3qT1X-wzYWG45WbvcoVdo99e6q1Nq0_AcM4vc2YIeQr0JeeavtqmEd0yYmEMKEjK7pFDEIy2nAp9_9b9gDwqZUMpAwHdQ3LAlOa9FP0h-bLyOM0xRGfnmKYmhebtx-P2TZx8nNbN55xmjFNpLsrV8zz-RN-cop2XjM1X3GUs1b63nuN8mXx5TB4EOxZ8cn0ekYvTd99OPrRnn96vTo7PWic0n1uknlIETb0YBitE1_W2C0I5zUF5BKdDpxkEbwN3XnVScitraWA0oArAj8hqz_XJbswux63Nv0yy0fwppLw2Ns_RjWiUcgMfOkFl8MJTUMAHkEMQVnFFmays13vWbhm26F3NlO14B3r3Z4qXZp1-GAkd1M4r4MU1IKfvC5bZbGNxOI52wrQUA1ow2Qveqyp9_o90k5Y81VFVleykZjV2VcFe5XIqJWO4aQaouVq--W_51fPsdoobx99t89-FLq0D</recordid><startdate>20170922</startdate><enddate>20170922</enddate><creator>Qu, Kaiyang</creator><creator>Han, Ke</creator><creator>Wu, Song</creator><creator>Wang, Guohua</creator><creator>Wei, Leyi</creator><general>MDPI AG</general><general>MDPI</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>K9.</scope><scope>M0S</scope><scope>M1P</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20170922</creationdate><title>Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods</title><author>Qu, Kaiyang ; Han, Ke ; Wu, Song ; Wang, Guohua ; Wei, Leyi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c493t-e0d00e190d4bba44557a5f48c9318de1c9f5921fdaf3cd85663a6f59b20fe8f13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Amino Acid Sequence</topic><topic>Computational Biology - methods</topic><topic>Deoxyribonucleic acid</topic><topic>DNA</topic><topic>DNA - chemistry</topic><topic>DNA biosynthesis</topic><topic>DNA-binding protein</topic><topic>DNA-Binding Proteins - metabolism</topic><topic>Gene regulation</topic><topic>Identification methods</topic><topic>Information theory</topic><topic>Learning algorithms</topic><topic>Machine Learning</topic><topic>Methods</topic><topic>mixed feature representation methods</topic><topic>Packaging</topic><topic>Proteins</topic><topic>Representations</topic><topic>Support Vector Machine</topic><topic>Test procedures</topic><topic>Transcription</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Qu, Kaiyang</creatorcontrib><creatorcontrib>Han, Ke</creatorcontrib><creatorcontrib>Wu, Song</creatorcontrib><creatorcontrib>Wang, Guohua</creatorcontrib><creatorcontrib>Wei, Leyi</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>PML(ProQuest Medical Library)</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>Open Access: DOAJ - Directory of Open Access Journals</collection><jtitle>Molecules (Basel, Switzerland)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Qu, Kaiyang</au><au>Han, Ke</au><au>Wu, Song</au><au>Wang, Guohua</au><au>Wei, Leyi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods</atitle><jtitle>Molecules (Basel, Switzerland)</jtitle><addtitle>Molecules</addtitle><date>2017-09-22</date><risdate>2017</risdate><volume>22</volume><issue>10</issue><spage>1602</spage><pages>1602-</pages><issn>1420-3049</issn><eissn>1420-3049</eissn><abstract>DNA-binding proteins play vital roles in cellular processes, such as DNA packaging, replication, transcription, regulation, and other DNA-associated activities. The current main prediction method is based on machine learning, and its accuracy mainly depends on the features extraction method. Therefore, using an efficient feature representation method is important to enhance the classification accuracy. However, existing feature representation methods cannot efficiently distinguish DNA-binding proteins from non-DNA-binding proteins. In this paper, a multi-feature representation method, which combines three feature representation methods, namely, K-Skip-N-Grams, Information theory, and Sequential and structural features (SSF), is used to represent the protein sequences and improve feature representation ability. In addition, the classifier is a support vector machine. The mixed-feature representation method is evaluated using 10-fold cross-validation and a test set. Feature vectors, which are obtained from a combination of three feature extractions, show the best performance in 10-fold cross-validation both under non-dimensional reduction and dimensional reduction by max-relevance-max-distance. Moreover, the reduced mixed feature method performs better than the non-reduced mixed feature technique. The feature vectors, which are a combination of SSF and K-Skip-N-Grams, show the best performance in the test set. Among these methods, mixed features exhibit superiority over the single features.</abstract><cop>Switzerland</cop><pub>MDPI AG</pub><pmid>28937647</pmid><doi>10.3390/molecules22101602</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1420-3049
ispartof Molecules (Basel, Switzerland), 2017-09, Vol.22 (10), p.1602
issn 1420-3049
1420-3049
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_88cb3b5406fd4d01813b16bf4a838026
source Open Access: PubMed Central; Publicly Available Content Database
subjects Amino Acid Sequence
Computational Biology - methods
Deoxyribonucleic acid
DNA
DNA - chemistry
DNA biosynthesis
DNA-binding protein
DNA-Binding Proteins - metabolism
Gene regulation
Identification methods
Information theory
Learning algorithms
Machine Learning
Methods
mixed feature representation methods
Packaging
Proteins
Representations
Support Vector Machine
Test procedures
Transcription
title Identification of DNA-Binding Proteins Using Mixed Feature Representation Methods
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T17%3A55%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Identification%20of%20DNA-Binding%20Proteins%20Using%20Mixed%20Feature%20Representation%20Methods&rft.jtitle=Molecules%20(Basel,%20Switzerland)&rft.au=Qu,%20Kaiyang&rft.date=2017-09-22&rft.volume=22&rft.issue=10&rft.spage=1602&rft.pages=1602-&rft.issn=1420-3049&rft.eissn=1420-3049&rft_id=info:doi/10.3390/molecules22101602&rft_dat=%3Cproquest_doaj_%3E1965692921%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c493t-e0d00e190d4bba44557a5f48c9318de1c9f5921fdaf3cd85663a6f59b20fe8f13%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=1965692921&rft_id=info:pmid/28937647&rfr_iscdi=true