Loading…

The lncLocator: a subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier

Abstract Motivation The long non-coding RNA (lncRNA) studies have been hot topics in the field of RNA biology. Recent studies have shown that their subcellular localizations carry important information for understanding their complex biological functions. Considering the costly and time-consuming ex...

Full description

Saved in:
Bibliographic Details
Published in:Bioinformatics 2018-07, Vol.34 (13), p.2185-2194
Main Authors: Cao, Zhen, Pan, Xiaoyong, Yang, Yang, Huang, Yan, Shen, Hong-Bin
Format: Article
Language:English
Citations: Items that this one cites
Items that cite this one
Online Access:Request full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c397t-bf72fd496b49e0b64bc90b032cd42d9548726c673c5df677b8d592bbf17c97ed3
cites cdi_FETCH-LOGICAL-c397t-bf72fd496b49e0b64bc90b032cd42d9548726c673c5df677b8d592bbf17c97ed3
container_end_page 2194
container_issue 13
container_start_page 2185
container_title Bioinformatics
container_volume 34
creator Cao, Zhen
Pan, Xiaoyong
Yang, Yang
Huang, Yan
Shen, Hong-Bin
description Abstract Motivation The long non-coding RNA (lncRNA) studies have been hot topics in the field of RNA biology. Recent studies have shown that their subcellular localizations carry important information for understanding their complex biological functions. Considering the costly and time-consuming experiments for identifying subcellular localization of lncRNAs, computational methods are urgently desired. However, to the best of our knowledge, there are no computational tools for predicting the lncRNA subcellular locations to date. Results In this study, we report an ensemble classifier-based predictor, lncLocator, for predicting the lncRNA subcellular localizations. To fully exploit lncRNA sequence information, we adopt both k-mer features and high-level abstraction features generated by unsupervised deep models, and construct four classifiers by feeding these two types of features to support vector machine (SVM) and random forest (RF), respectively. Then we use a stacked ensemble strategy to combine the four classifiers and get the final prediction results. The current lncLocator can predict five subcellular localizations of lncRNAs, including cytoplasm, nucleus, cytosol, ribosome and exosome, and yield an overall accuracy of 0.59 on the constructed benchmark dataset. Availability and implementation The lncLocator is available at www.csbio.sjtu.edu.cn/bioinf/lncLocator. Supplementary information Supplementary data are available at Bioinformatics online.
doi_str_mv 10.1093/bioinformatics/bty085
format article
fullrecord <record><control><sourceid>proquest_TOX</sourceid><recordid>TN_cdi_proquest_miscellaneous_2007113794</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><oup_id>10.1093/bioinformatics/bty085</oup_id><sourcerecordid>2007113794</sourcerecordid><originalsourceid>FETCH-LOGICAL-c397t-bf72fd496b49e0b64bc90b032cd42d9548726c673c5df677b8d592bbf17c97ed3</originalsourceid><addsrcrecordid>eNqNkDtPwzAUhS0EoqXwE0AeWUIdx4ljtqriJVUgoTJHfoLBiYudDOXX4yoFiY3pPvzdc-QDwHmOrnLEirmw3nbGh5b3Vsa56LeoLg_ANCcVyjAq2WHqi4pmpEbFBJzE-I5QmRNCjsEEM1JhXKIp6NdvGrpOrrzkvQ_XkMM4CKmdGxwP0KW1s1_Jw3dwE7SyMlEw-aan7hV2vsukVza1z4-LCAWPWsHEJpmey4806C7qVjgNpeMxWmN1OAVHhruoz_Z1Bl5ub9bL-2z1dPewXKwyWTDaZ8JQbBRhlSBMI1ERIRkSqMBSEaxYSWqKK1nRQpbKVJSKWpUMC2FyKhnVqpiBy1F3E_znoGPftDbu_sY77YfYYIRonheUkYSWIyqDjzFo02yCbXnYNjlqdoE3fwNvxsDT3cXeYhCtVr9XPwknAI2AHzb_1PwG0guVEQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2007113794</pqid></control><display><type>article</type><title>The lncLocator: a subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier</title><source>Oxford University Press Open Access</source><creator>Cao, Zhen ; Pan, Xiaoyong ; Yang, Yang ; Huang, Yan ; Shen, Hong-Bin</creator><contributor>Hancock, John</contributor><creatorcontrib>Cao, Zhen ; Pan, Xiaoyong ; Yang, Yang ; Huang, Yan ; Shen, Hong-Bin ; Hancock, John</creatorcontrib><description>Abstract Motivation The long non-coding RNA (lncRNA) studies have been hot topics in the field of RNA biology. Recent studies have shown that their subcellular localizations carry important information for understanding their complex biological functions. Considering the costly and time-consuming experiments for identifying subcellular localization of lncRNAs, computational methods are urgently desired. However, to the best of our knowledge, there are no computational tools for predicting the lncRNA subcellular locations to date. Results In this study, we report an ensemble classifier-based predictor, lncLocator, for predicting the lncRNA subcellular localizations. To fully exploit lncRNA sequence information, we adopt both k-mer features and high-level abstraction features generated by unsupervised deep models, and construct four classifiers by feeding these two types of features to support vector machine (SVM) and random forest (RF), respectively. Then we use a stacked ensemble strategy to combine the four classifiers and get the final prediction results. The current lncLocator can predict five subcellular localizations of lncRNAs, including cytoplasm, nucleus, cytosol, ribosome and exosome, and yield an overall accuracy of 0.59 on the constructed benchmark dataset. Availability and implementation The lncLocator is available at www.csbio.sjtu.edu.cn/bioinf/lncLocator. Supplementary information Supplementary data are available at Bioinformatics online.</description><identifier>ISSN: 1367-4803</identifier><identifier>EISSN: 1460-2059</identifier><identifier>EISSN: 1367-4811</identifier><identifier>DOI: 10.1093/bioinformatics/bty085</identifier><identifier>PMID: 29462250</identifier><language>eng</language><publisher>England: Oxford University Press</publisher><ispartof>Bioinformatics, 2018-07, Vol.34 (13), p.2185-2194</ispartof><rights>The Author(s) 2018. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c397t-bf72fd496b49e0b64bc90b032cd42d9548726c673c5df677b8d592bbf17c97ed3</citedby><cites>FETCH-LOGICAL-c397t-bf72fd496b49e0b64bc90b032cd42d9548726c673c5df677b8d592bbf17c97ed3</cites><orcidid>0000-0002-4029-3325</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,1604,27924,27925</link.rule.ids><linktorsrc>$$Uhttps://dx.doi.org/10.1093/bioinformatics/bty085$$EView_record_in_Oxford_University_Press$$FView_record_in_$$GOxford_University_Press</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/29462250$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Hancock, John</contributor><creatorcontrib>Cao, Zhen</creatorcontrib><creatorcontrib>Pan, Xiaoyong</creatorcontrib><creatorcontrib>Yang, Yang</creatorcontrib><creatorcontrib>Huang, Yan</creatorcontrib><creatorcontrib>Shen, Hong-Bin</creatorcontrib><title>The lncLocator: a subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier</title><title>Bioinformatics</title><addtitle>Bioinformatics</addtitle><description>Abstract Motivation The long non-coding RNA (lncRNA) studies have been hot topics in the field of RNA biology. Recent studies have shown that their subcellular localizations carry important information for understanding their complex biological functions. Considering the costly and time-consuming experiments for identifying subcellular localization of lncRNAs, computational methods are urgently desired. However, to the best of our knowledge, there are no computational tools for predicting the lncRNA subcellular locations to date. Results In this study, we report an ensemble classifier-based predictor, lncLocator, for predicting the lncRNA subcellular localizations. To fully exploit lncRNA sequence information, we adopt both k-mer features and high-level abstraction features generated by unsupervised deep models, and construct four classifiers by feeding these two types of features to support vector machine (SVM) and random forest (RF), respectively. Then we use a stacked ensemble strategy to combine the four classifiers and get the final prediction results. The current lncLocator can predict five subcellular localizations of lncRNAs, including cytoplasm, nucleus, cytosol, ribosome and exosome, and yield an overall accuracy of 0.59 on the constructed benchmark dataset. Availability and implementation The lncLocator is available at www.csbio.sjtu.edu.cn/bioinf/lncLocator. Supplementary information Supplementary data are available at Bioinformatics online.</description><issn>1367-4803</issn><issn>1460-2059</issn><issn>1367-4811</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNqNkDtPwzAUhS0EoqXwE0AeWUIdx4ljtqriJVUgoTJHfoLBiYudDOXX4yoFiY3pPvzdc-QDwHmOrnLEirmw3nbGh5b3Vsa56LeoLg_ANCcVyjAq2WHqi4pmpEbFBJzE-I5QmRNCjsEEM1JhXKIp6NdvGrpOrrzkvQ_XkMM4CKmdGxwP0KW1s1_Jw3dwE7SyMlEw-aan7hV2vsukVza1z4-LCAWPWsHEJpmey4806C7qVjgNpeMxWmN1OAVHhruoz_Z1Bl5ub9bL-2z1dPewXKwyWTDaZ8JQbBRhlSBMI1ERIRkSqMBSEaxYSWqKK1nRQpbKVJSKWpUMC2FyKhnVqpiBy1F3E_znoGPftDbu_sY77YfYYIRonheUkYSWIyqDjzFo02yCbXnYNjlqdoE3fwNvxsDT3cXeYhCtVr9XPwknAI2AHzb_1PwG0guVEQ</recordid><startdate>20180701</startdate><enddate>20180701</enddate><creator>Cao, Zhen</creator><creator>Pan, Xiaoyong</creator><creator>Yang, Yang</creator><creator>Huang, Yan</creator><creator>Shen, Hong-Bin</creator><general>Oxford University Press</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-4029-3325</orcidid></search><sort><creationdate>20180701</creationdate><title>The lncLocator: a subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier</title><author>Cao, Zhen ; Pan, Xiaoyong ; Yang, Yang ; Huang, Yan ; Shen, Hong-Bin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c397t-bf72fd496b49e0b64bc90b032cd42d9548726c673c5df677b8d592bbf17c97ed3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cao, Zhen</creatorcontrib><creatorcontrib>Pan, Xiaoyong</creatorcontrib><creatorcontrib>Yang, Yang</creatorcontrib><creatorcontrib>Huang, Yan</creatorcontrib><creatorcontrib>Shen, Hong-Bin</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cao, Zhen</au><au>Pan, Xiaoyong</au><au>Yang, Yang</au><au>Huang, Yan</au><au>Shen, Hong-Bin</au><au>Hancock, John</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The lncLocator: a subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier</atitle><jtitle>Bioinformatics</jtitle><addtitle>Bioinformatics</addtitle><date>2018-07-01</date><risdate>2018</risdate><volume>34</volume><issue>13</issue><spage>2185</spage><epage>2194</epage><pages>2185-2194</pages><issn>1367-4803</issn><eissn>1460-2059</eissn><eissn>1367-4811</eissn><abstract>Abstract Motivation The long non-coding RNA (lncRNA) studies have been hot topics in the field of RNA biology. Recent studies have shown that their subcellular localizations carry important information for understanding their complex biological functions. Considering the costly and time-consuming experiments for identifying subcellular localization of lncRNAs, computational methods are urgently desired. However, to the best of our knowledge, there are no computational tools for predicting the lncRNA subcellular locations to date. Results In this study, we report an ensemble classifier-based predictor, lncLocator, for predicting the lncRNA subcellular localizations. To fully exploit lncRNA sequence information, we adopt both k-mer features and high-level abstraction features generated by unsupervised deep models, and construct four classifiers by feeding these two types of features to support vector machine (SVM) and random forest (RF), respectively. Then we use a stacked ensemble strategy to combine the four classifiers and get the final prediction results. The current lncLocator can predict five subcellular localizations of lncRNAs, including cytoplasm, nucleus, cytosol, ribosome and exosome, and yield an overall accuracy of 0.59 on the constructed benchmark dataset. Availability and implementation The lncLocator is available at www.csbio.sjtu.edu.cn/bioinf/lncLocator. Supplementary information Supplementary data are available at Bioinformatics online.</abstract><cop>England</cop><pub>Oxford University Press</pub><pmid>29462250</pmid><doi>10.1093/bioinformatics/bty085</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0002-4029-3325</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1367-4803
ispartof Bioinformatics, 2018-07, Vol.34 (13), p.2185-2194
issn 1367-4803
1460-2059
1367-4811
language eng
recordid cdi_proquest_miscellaneous_2007113794
source Oxford University Press Open Access
title The lncLocator: a subcellular localization predictor for long non-coding RNAs based on a stacked ensemble classifier
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-24T20%3A33%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_TOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20lncLocator:%20a%20subcellular%20localization%20predictor%20for%20long%20non-coding%20RNAs%20based%20on%20a%20stacked%20ensemble%20classifier&rft.jtitle=Bioinformatics&rft.au=Cao,%20Zhen&rft.date=2018-07-01&rft.volume=34&rft.issue=13&rft.spage=2185&rft.epage=2194&rft.pages=2185-2194&rft.issn=1367-4803&rft.eissn=1460-2059&rft_id=info:doi/10.1093/bioinformatics/bty085&rft_dat=%3Cproquest_TOX%3E2007113794%3C/proquest_TOX%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c397t-bf72fd496b49e0b64bc90b032cd42d9548726c673c5df677b8d592bbf17c97ed3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2007113794&rft_id=info:pmid/29462250&rft_oup_id=10.1093/bioinformatics/bty085&rfr_iscdi=true