Loading…
NCYPred: A Bidirectional LSTM Network With Attention for Y RNA and Short Non-Coding RNA Classification
Short non-coding RNAs (sncRNAs) are involved in multiple cellular processes and can be divided into dozens of classes. Among such classes, Y RNAs have been gaining attention, being essential factors for the initiation of DNA replication on vertebrates, as well as potential tumor biomarkers. Homologs...
Saved in:
Published in: | IEEE/ACM transactions on computational biology and bioinformatics 2023-01, Vol.20 (1), p.557-565 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c349t-11868c6007f58fedaf5a98be873a3670fd99b9ab16ca08eb87ab187eb3a5a5e93 |
---|---|
cites | cdi_FETCH-LOGICAL-c349t-11868c6007f58fedaf5a98be873a3670fd99b9ab16ca08eb87ab187eb3a5a5e93 |
container_end_page | 565 |
container_issue | 1 |
container_start_page | 557 |
container_title | IEEE/ACM transactions on computational biology and bioinformatics |
container_volume | 20 |
creator | Lima, Diego de S. Amichi, Luiz J. A. Fernandez, Maria A. Constantino, Ademir A. Seixas, Flavio A. V. |
description | Short non-coding RNAs (sncRNAs) are involved in multiple cellular processes and can be divided into dozens of classes. Among such classes, Y RNAs have been gaining attention, being essential factors for the initiation of DNA replication on vertebrates, as well as potential tumor biomarkers. Homologs have also been described in nematodes and insects, as well as related sequences in bacteria. Methods capable of accurately predicting Y RNA transcripts are lacking. In this work, we developed an attention-based LSTM network and built a classification model able to classify sncRNAs (including Y RNA) directly from nucleotide sequences. A dataset consisting of 45,447 sncRNA sequences, from a wide range of organisms, obtained from Rfam 14.3 was built. Performance evaluation demonstrated that our proposed method, NCYPred ( N on -C oding/ Y RNA Pred iction ), can accurately predict Y RNA sequences and their homologs, as well as 11 additional classes, achieving results comparable with state-of-the-art methods. We also demonstrate that applying t-SNE on learned sequence representations could be useful for sequence analysis. Our model is freely available as a web-server ( https://www.gpea.uem.br/ncypred/ ). |
doi_str_mv | 10.1109/TCBB.2021.3131136 |
format | article |
fullrecord | <record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_proquest_journals_2773453662</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9627779</ieee_id><sourcerecordid>2604025311</sourcerecordid><originalsourceid>FETCH-LOGICAL-c349t-11868c6007f58fedaf5a98be873a3670fd99b9ab16ca08eb87ab187eb3a5a5e93</originalsourceid><addsrcrecordid>eNpdkU1P4zAQhi3Eio_CD0BIyBIXLil2HH9xayNgV-oWBEWIU-QkYzCkMdipVvvvSbZdDpxmpPd5ZzTzInREyZhSos8X-XQ6TklKx4wySpnYQnuUc5loLbLtoc94wrVgu2g_xldC0kyTbAftskylItVyD9l5_nQboL7AEzx1tQtQdc63psGz-8VvPIfujw9v-NF1L3jSddAOKrY-4Cd8N59g09b4_sWHDs99m-S-du3zPyFvTIzOusoMjgP0w5omwuGmjtDD1eUi_5nMbq5_5ZNZUrFMdwmlSqhKECItVxZqY7nRqgQlmWFCEltrXWpTUlEZoqBUsu-VhJIZbjhoNkJn67nvwX-sIHbF0sUKmsa04FexSAXJSMqHZ43Q6Tf01a9Cf3lPSckyzoRIe4quqSr4GAPY4j24pQl_C0qKIYRiCKEYQig2IfSek83kVbmE-svx_-s9cLwGHAB8yVr0e6VmnxRQiMs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2773453662</pqid></control><display><type>article</type><title>NCYPred: A Bidirectional LSTM Network With Attention for Y RNA and Short Non-Coding RNA Classification</title><source>Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list)</source><source>IEEE Xplore (Online service)</source><creator>Lima, Diego de S. ; Amichi, Luiz J. A. ; Fernandez, Maria A. ; Constantino, Ademir A. ; Seixas, Flavio A. V.</creator><creatorcontrib>Lima, Diego de S. ; Amichi, Luiz J. A. ; Fernandez, Maria A. ; Constantino, Ademir A. ; Seixas, Flavio A. V.</creatorcontrib><description>Short non-coding RNAs (sncRNAs) are involved in multiple cellular processes and can be divided into dozens of classes. Among such classes, Y RNAs have been gaining attention, being essential factors for the initiation of DNA replication on vertebrates, as well as potential tumor biomarkers. Homologs have also been described in nematodes and insects, as well as related sequences in bacteria. Methods capable of accurately predicting Y RNA transcripts are lacking. In this work, we developed an attention-based LSTM network and built a classification model able to classify sncRNAs (including Y RNA) directly from nucleotide sequences. A dataset consisting of 45,447 sncRNA sequences, from a wide range of organisms, obtained from Rfam 14.3 was built. Performance evaluation demonstrated that our proposed method, NCYPred ( N on -C oding/ Y RNA Pred iction ), can accurately predict Y RNA sequences and their homologs, as well as 11 additional classes, achieving results comparable with state-of-the-art methods. We also demonstrate that applying t-SNE on learned sequence representations could be useful for sequence analysis. Our model is freely available as a web-server ( https://www.gpea.uem.br/ncypred/ ).</description><identifier>ISSN: 1545-5963</identifier><identifier>EISSN: 1557-9964</identifier><identifier>DOI: 10.1109/TCBB.2021.3131136</identifier><identifier>PMID: 34826297</identifier><identifier>CODEN: ITCBCY</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Animals ; Bacteria - genetics ; Biological system modeling ; Biomarkers ; Classification ; Classification algorithms ; Computers ; DNA biosynthesis ; Encoding ; Feature extraction ; Gene sequencing ; Homology ; Insects ; Nematodes ; Non-coding RNA ; Nucleotides ; Performance evaluation ; Predictive models ; recurrent neural network ; Replication initiation ; Ribonucleic acid ; RNA ; RNA, Small Untranslated - genetics ; Sequence analysis ; Sequence Analysis, RNA ; sequence classification ; Training ; Vertebrates ; web server ; Y RNA</subject><ispartof>IEEE/ACM transactions on computational biology and bioinformatics, 2023-01, Vol.20 (1), p.557-565</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c349t-11868c6007f58fedaf5a98be873a3670fd99b9ab16ca08eb87ab187eb3a5a5e93</citedby><cites>FETCH-LOGICAL-c349t-11868c6007f58fedaf5a98be873a3670fd99b9ab16ca08eb87ab187eb3a5a5e93</cites><orcidid>0000-0002-0660-2390 ; 0000-0002-7696-5680 ; 0000-0002-0117-6919</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9627779$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/34826297$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Lima, Diego de S.</creatorcontrib><creatorcontrib>Amichi, Luiz J. A.</creatorcontrib><creatorcontrib>Fernandez, Maria A.</creatorcontrib><creatorcontrib>Constantino, Ademir A.</creatorcontrib><creatorcontrib>Seixas, Flavio A. V.</creatorcontrib><title>NCYPred: A Bidirectional LSTM Network With Attention for Y RNA and Short Non-Coding RNA Classification</title><title>IEEE/ACM transactions on computational biology and bioinformatics</title><addtitle>TCBB</addtitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><description>Short non-coding RNAs (sncRNAs) are involved in multiple cellular processes and can be divided into dozens of classes. Among such classes, Y RNAs have been gaining attention, being essential factors for the initiation of DNA replication on vertebrates, as well as potential tumor biomarkers. Homologs have also been described in nematodes and insects, as well as related sequences in bacteria. Methods capable of accurately predicting Y RNA transcripts are lacking. In this work, we developed an attention-based LSTM network and built a classification model able to classify sncRNAs (including Y RNA) directly from nucleotide sequences. A dataset consisting of 45,447 sncRNA sequences, from a wide range of organisms, obtained from Rfam 14.3 was built. Performance evaluation demonstrated that our proposed method, NCYPred ( N on -C oding/ Y RNA Pred iction ), can accurately predict Y RNA sequences and their homologs, as well as 11 additional classes, achieving results comparable with state-of-the-art methods. We also demonstrate that applying t-SNE on learned sequence representations could be useful for sequence analysis. Our model is freely available as a web-server ( https://www.gpea.uem.br/ncypred/ ).</description><subject>Animals</subject><subject>Bacteria - genetics</subject><subject>Biological system modeling</subject><subject>Biomarkers</subject><subject>Classification</subject><subject>Classification algorithms</subject><subject>Computers</subject><subject>DNA biosynthesis</subject><subject>Encoding</subject><subject>Feature extraction</subject><subject>Gene sequencing</subject><subject>Homology</subject><subject>Insects</subject><subject>Nematodes</subject><subject>Non-coding RNA</subject><subject>Nucleotides</subject><subject>Performance evaluation</subject><subject>Predictive models</subject><subject>recurrent neural network</subject><subject>Replication initiation</subject><subject>Ribonucleic acid</subject><subject>RNA</subject><subject>RNA, Small Untranslated - genetics</subject><subject>Sequence analysis</subject><subject>Sequence Analysis, RNA</subject><subject>sequence classification</subject><subject>Training</subject><subject>Vertebrates</subject><subject>web server</subject><subject>Y RNA</subject><issn>1545-5963</issn><issn>1557-9964</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNpdkU1P4zAQhi3Eio_CD0BIyBIXLil2HH9xayNgV-oWBEWIU-QkYzCkMdipVvvvSbZdDpxmpPd5ZzTzInREyZhSos8X-XQ6TklKx4wySpnYQnuUc5loLbLtoc94wrVgu2g_xldC0kyTbAftskylItVyD9l5_nQboL7AEzx1tQtQdc63psGz-8VvPIfujw9v-NF1L3jSddAOKrY-4Cd8N59g09b4_sWHDs99m-S-du3zPyFvTIzOusoMjgP0w5omwuGmjtDD1eUi_5nMbq5_5ZNZUrFMdwmlSqhKECItVxZqY7nRqgQlmWFCEltrXWpTUlEZoqBUsu-VhJIZbjhoNkJn67nvwX-sIHbF0sUKmsa04FexSAXJSMqHZ43Q6Tf01a9Cf3lPSckyzoRIe4quqSr4GAPY4j24pQl_C0qKIYRiCKEYQig2IfSek83kVbmE-svx_-s9cLwGHAB8yVr0e6VmnxRQiMs</recordid><startdate>202301</startdate><enddate>202301</enddate><creator>Lima, Diego de S.</creator><creator>Amichi, Luiz J. A.</creator><creator>Fernandez, Maria A.</creator><creator>Constantino, Ademir A.</creator><creator>Seixas, Flavio A. V.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7QF</scope><scope>7QO</scope><scope>7QQ</scope><scope>7SC</scope><scope>7SE</scope><scope>7SP</scope><scope>7SR</scope><scope>7TA</scope><scope>7TB</scope><scope>7U5</scope><scope>8BQ</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JG9</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>P64</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-0660-2390</orcidid><orcidid>https://orcid.org/0000-0002-7696-5680</orcidid><orcidid>https://orcid.org/0000-0002-0117-6919</orcidid></search><sort><creationdate>202301</creationdate><title>NCYPred: A Bidirectional LSTM Network With Attention for Y RNA and Short Non-Coding RNA Classification</title><author>Lima, Diego de S. ; Amichi, Luiz J. A. ; Fernandez, Maria A. ; Constantino, Ademir A. ; Seixas, Flavio A. V.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c349t-11868c6007f58fedaf5a98be873a3670fd99b9ab16ca08eb87ab187eb3a5a5e93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Animals</topic><topic>Bacteria - genetics</topic><topic>Biological system modeling</topic><topic>Biomarkers</topic><topic>Classification</topic><topic>Classification algorithms</topic><topic>Computers</topic><topic>DNA biosynthesis</topic><topic>Encoding</topic><topic>Feature extraction</topic><topic>Gene sequencing</topic><topic>Homology</topic><topic>Insects</topic><topic>Nematodes</topic><topic>Non-coding RNA</topic><topic>Nucleotides</topic><topic>Performance evaluation</topic><topic>Predictive models</topic><topic>recurrent neural network</topic><topic>Replication initiation</topic><topic>Ribonucleic acid</topic><topic>RNA</topic><topic>RNA, Small Untranslated - genetics</topic><topic>Sequence analysis</topic><topic>Sequence Analysis, RNA</topic><topic>sequence classification</topic><topic>Training</topic><topic>Vertebrates</topic><topic>web server</topic><topic>Y RNA</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lima, Diego de S.</creatorcontrib><creatorcontrib>Amichi, Luiz J. A.</creatorcontrib><creatorcontrib>Fernandez, Maria A.</creatorcontrib><creatorcontrib>Constantino, Ademir A.</creatorcontrib><creatorcontrib>Seixas, Flavio A. V.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE/IET Electronic Library</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Aluminium Industry Abstracts</collection><collection>Biotechnology Research Abstracts</collection><collection>Ceramic Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>Corrosion Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>Materials Business File</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Solid State and Superconductivity Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Lima, Diego de S.</au><au>Amichi, Luiz J. A.</au><au>Fernandez, Maria A.</au><au>Constantino, Ademir A.</au><au>Seixas, Flavio A. V.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>NCYPred: A Bidirectional LSTM Network With Attention for Y RNA and Short Non-Coding RNA Classification</atitle><jtitle>IEEE/ACM transactions on computational biology and bioinformatics</jtitle><stitle>TCBB</stitle><addtitle>IEEE/ACM Trans Comput Biol Bioinform</addtitle><date>2023-01</date><risdate>2023</risdate><volume>20</volume><issue>1</issue><spage>557</spage><epage>565</epage><pages>557-565</pages><issn>1545-5963</issn><eissn>1557-9964</eissn><coden>ITCBCY</coden><abstract>Short non-coding RNAs (sncRNAs) are involved in multiple cellular processes and can be divided into dozens of classes. Among such classes, Y RNAs have been gaining attention, being essential factors for the initiation of DNA replication on vertebrates, as well as potential tumor biomarkers. Homologs have also been described in nematodes and insects, as well as related sequences in bacteria. Methods capable of accurately predicting Y RNA transcripts are lacking. In this work, we developed an attention-based LSTM network and built a classification model able to classify sncRNAs (including Y RNA) directly from nucleotide sequences. A dataset consisting of 45,447 sncRNA sequences, from a wide range of organisms, obtained from Rfam 14.3 was built. Performance evaluation demonstrated that our proposed method, NCYPred ( N on -C oding/ Y RNA Pred iction ), can accurately predict Y RNA sequences and their homologs, as well as 11 additional classes, achieving results comparable with state-of-the-art methods. We also demonstrate that applying t-SNE on learned sequence representations could be useful for sequence analysis. Our model is freely available as a web-server ( https://www.gpea.uem.br/ncypred/ ).</abstract><cop>United States</cop><pub>IEEE</pub><pmid>34826297</pmid><doi>10.1109/TCBB.2021.3131136</doi><tpages>9</tpages><orcidid>https://orcid.org/0000-0002-0660-2390</orcidid><orcidid>https://orcid.org/0000-0002-7696-5680</orcidid><orcidid>https://orcid.org/0000-0002-0117-6919</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1545-5963 |
ispartof | IEEE/ACM transactions on computational biology and bioinformatics, 2023-01, Vol.20 (1), p.557-565 |
issn | 1545-5963 1557-9964 |
language | eng |
recordid | cdi_proquest_journals_2773453662 |
source | Association for Computing Machinery:Jisc Collections:ACM OPEN Journals 2023-2025 (reading list); IEEE Xplore (Online service) |
subjects | Animals Bacteria - genetics Biological system modeling Biomarkers Classification Classification algorithms Computers DNA biosynthesis Encoding Feature extraction Gene sequencing Homology Insects Nematodes Non-coding RNA Nucleotides Performance evaluation Predictive models recurrent neural network Replication initiation Ribonucleic acid RNA RNA, Small Untranslated - genetics Sequence analysis Sequence Analysis, RNA sequence classification Training Vertebrates web server Y RNA |
title | NCYPred: A Bidirectional LSTM Network With Attention for Y RNA and Short Non-Coding RNA Classification |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T06%3A55%3A58IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=NCYPred:%20A%20Bidirectional%20LSTM%20Network%20With%20Attention%20for%20Y%20RNA%20and%20Short%20Non-Coding%20RNA%20Classification&rft.jtitle=IEEE/ACM%20transactions%20on%20computational%20biology%20and%20bioinformatics&rft.au=Lima,%20Diego%20de%20S.&rft.date=2023-01&rft.volume=20&rft.issue=1&rft.spage=557&rft.epage=565&rft.pages=557-565&rft.issn=1545-5963&rft.eissn=1557-9964&rft.coden=ITCBCY&rft_id=info:doi/10.1109/TCBB.2021.3131136&rft_dat=%3Cproquest_pubme%3E2604025311%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c349t-11868c6007f58fedaf5a98be873a3670fd99b9ab16ca08eb87ab187eb3a5a5e93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2773453662&rft_id=info:pmid/34826297&rft_ieee_id=9627779&rfr_iscdi=true |