Loading…

A novel two-way rebalancing strategy for identifying carbonylation sites

As an irreversible post-translational modification, protein carbonylation is closely related to many diseases and aging. Protein carbonylation prediction for related patients is significant, which can help clinicians make appropriate therapeutic schemes. Because carbonylation sites can be used to in...

Full description

Saved in:
Bibliographic Details
Published in:BMC bioinformatics 2023-11, Vol.24 (1), p.1-429, Article 429
Main Authors: Chen, Linjun, Jing, Xiao-Yuan, Hao, Yaru, Liu, Wei, Zhu, Xiaoke, Han, Wei
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c519t-77ce4cde02f7321870a945301fe72db9cee228d989d572b68503c37eb47bcf353
cites cdi_FETCH-LOGICAL-c519t-77ce4cde02f7321870a945301fe72db9cee228d989d572b68503c37eb47bcf353
container_end_page 429
container_issue 1
container_start_page 1
container_title BMC bioinformatics
container_volume 24
creator Chen, Linjun
Jing, Xiao-Yuan
Hao, Yaru
Liu, Wei
Zhu, Xiaoke
Han, Wei
description As an irreversible post-translational modification, protein carbonylation is closely related to many diseases and aging. Protein carbonylation prediction for related patients is significant, which can help clinicians make appropriate therapeutic schemes. Because carbonylation sites can be used to indicate change or loss of protein function, integrating these protein carbonylation site data has been a promising method in prediction. Based on these protein carbonylation site data, some protein carbonylation prediction methods have been proposed. However, most data is highly class imbalanced, and the number of un-carbonylation sites greatly exceeds that of carbonylation sites. Unfortunately, existing methods have not addressed this issue adequately. In this work, we propose a novel two-way rebalancing strategy based on the attention technique and generative adversarial network (Carsite_AGan) for identifying protein carbonylation sites. Specifically, Carsite_AGan proposes a novel undersampling method based on attention technology that allows sites with high importance value to be selected from un-carbonylation sites. The attention technique can obtain the value of each sample's importance. In the meanwhile, Carsite_AGan designs a generative adversarial network-based oversampling method to generate high-feasibility carbonylation sites. The generative adversarial network can generate high-feasibility samples through its generator and discriminator. Finally, we use a classifier like a nonlinear support vector machine to identify protein carbonylation sites. Experimental results demonstrate that our approach significantly outperforms other resampling methods. Using our approach to resampling carbonylation data can significantly improve the effect of identifying protein carbonylation sites.
doi_str_mv 10.1186/s12859-023-05551-2
format article
fullrecord <record><control><sourceid>gale_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_47d06a01a29f473384ef47c76069df02</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A772743381</galeid><doaj_id>oai_doaj_org_article_47d06a01a29f473384ef47c76069df02</doaj_id><sourcerecordid>A772743381</sourcerecordid><originalsourceid>FETCH-LOGICAL-c519t-77ce4cde02f7321870a945301fe72db9cee228d989d572b68503c37eb47bcf353</originalsourceid><addsrcrecordid>eNptkt9rFDEQxxdRsJ7-Az4t-KIP2-bnJnk8irYHhUKrzyGbTJYce5ua5Kz73zfXE_VE8jDD5DOTfIdv07zH6Bxj2V9kTCRXHSK0Q5xz3JEXzRlmoiYY8Zd_5a-bNzlvEcJCIn7WXK_bOf6AqS2PsXs0S5tgMJOZbZjHNpdkCoxL62Nqg4O5BL8cLqxJQ5yXyZQQ5zaHAvlt88qbKcO7X3HVfPvy-evldXdze7W5XN90lmNVOiEsMOsAES8owVIgoxinCHsQxA3KAhAinZLKcUGGXnJELRUwMDFYTzldNZvjXBfNVj-ksDNp0dEE_VyIadQmlWAn0Ew41BuEDVGeCUolgxqt6FGvnK-7WjUfj7MeUvy-h1z0LmQLU9UPcZ81kVIpJQVjFf3wD7qN-zRXpZVSCPGeE_qHGk19P8w-1g3aw1C9FoIIVj-BK3X-H6oeB7tg4ww-1PpJw6eThsoU-FlGs89Zb-7vTllyZG2KOSfwv3eEkT54RR-9oqt-_ewVTegThO-tlA</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2890056523</pqid></control><display><type>article</type><title>A novel two-way rebalancing strategy for identifying carbonylation sites</title><source>Publicly Available Content Database</source><source>PubMed Central(OpenAccess)</source><creator>Chen, Linjun ; Jing, Xiao-Yuan ; Hao, Yaru ; Liu, Wei ; Zhu, Xiaoke ; Han, Wei</creator><creatorcontrib>Chen, Linjun ; Jing, Xiao-Yuan ; Hao, Yaru ; Liu, Wei ; Zhu, Xiaoke ; Han, Wei</creatorcontrib><description>As an irreversible post-translational modification, protein carbonylation is closely related to many diseases and aging. Protein carbonylation prediction for related patients is significant, which can help clinicians make appropriate therapeutic schemes. Because carbonylation sites can be used to indicate change or loss of protein function, integrating these protein carbonylation site data has been a promising method in prediction. Based on these protein carbonylation site data, some protein carbonylation prediction methods have been proposed. However, most data is highly class imbalanced, and the number of un-carbonylation sites greatly exceeds that of carbonylation sites. Unfortunately, existing methods have not addressed this issue adequately. In this work, we propose a novel two-way rebalancing strategy based on the attention technique and generative adversarial network (Carsite_AGan) for identifying protein carbonylation sites. Specifically, Carsite_AGan proposes a novel undersampling method based on attention technology that allows sites with high importance value to be selected from un-carbonylation sites. The attention technique can obtain the value of each sample's importance. In the meanwhile, Carsite_AGan designs a generative adversarial network-based oversampling method to generate high-feasibility carbonylation sites. The generative adversarial network can generate high-feasibility samples through its generator and discriminator. Finally, we use a classifier like a nonlinear support vector machine to identify protein carbonylation sites. Experimental results demonstrate that our approach significantly outperforms other resampling methods. Using our approach to resampling carbonylation data can significantly improve the effect of identifying protein carbonylation sites.</description><identifier>ISSN: 1471-2105</identifier><identifier>EISSN: 1471-2105</identifier><identifier>DOI: 10.1186/s12859-023-05551-2</identifier><language>eng</language><publisher>London: BioMed Central Ltd</publisher><subject>Aging ; Amino acids ; Analysis ; Attention technique ; Carbonylation ; Carbonyls ; Deep learning ; Disease ; Evaluation ; Feasibility ; Generative adversarial networks ; Health aspects ; Identifying protein carbonylation sites ; Imports ; Laws, regulations, etc ; Multiple sclerosis ; Post-translation ; Post-translational modification ; Predictions ; Protein carbonylation ; Proteins ; Rebalance ; Resampling ; Simulation ; Support vector machines</subject><ispartof>BMC bioinformatics, 2023-11, Vol.24 (1), p.1-429, Article 429</ispartof><rights>COPYRIGHT 2023 BioMed Central Ltd.</rights><rights>2023. This work is licensed under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c519t-77ce4cde02f7321870a945301fe72db9cee228d989d572b68503c37eb47bcf353</citedby><cites>FETCH-LOGICAL-c519t-77ce4cde02f7321870a945301fe72db9cee228d989d572b68503c37eb47bcf353</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.proquest.com/docview/2890056523?pq-origsite=primo$$EHTML$$P50$$Gproquest$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,25732,27903,27904,36991,36992,44569</link.rule.ids></links><search><creatorcontrib>Chen, Linjun</creatorcontrib><creatorcontrib>Jing, Xiao-Yuan</creatorcontrib><creatorcontrib>Hao, Yaru</creatorcontrib><creatorcontrib>Liu, Wei</creatorcontrib><creatorcontrib>Zhu, Xiaoke</creatorcontrib><creatorcontrib>Han, Wei</creatorcontrib><title>A novel two-way rebalancing strategy for identifying carbonylation sites</title><title>BMC bioinformatics</title><description>As an irreversible post-translational modification, protein carbonylation is closely related to many diseases and aging. Protein carbonylation prediction for related patients is significant, which can help clinicians make appropriate therapeutic schemes. Because carbonylation sites can be used to indicate change or loss of protein function, integrating these protein carbonylation site data has been a promising method in prediction. Based on these protein carbonylation site data, some protein carbonylation prediction methods have been proposed. However, most data is highly class imbalanced, and the number of un-carbonylation sites greatly exceeds that of carbonylation sites. Unfortunately, existing methods have not addressed this issue adequately. In this work, we propose a novel two-way rebalancing strategy based on the attention technique and generative adversarial network (Carsite_AGan) for identifying protein carbonylation sites. Specifically, Carsite_AGan proposes a novel undersampling method based on attention technology that allows sites with high importance value to be selected from un-carbonylation sites. The attention technique can obtain the value of each sample's importance. In the meanwhile, Carsite_AGan designs a generative adversarial network-based oversampling method to generate high-feasibility carbonylation sites. The generative adversarial network can generate high-feasibility samples through its generator and discriminator. Finally, we use a classifier like a nonlinear support vector machine to identify protein carbonylation sites. Experimental results demonstrate that our approach significantly outperforms other resampling methods. Using our approach to resampling carbonylation data can significantly improve the effect of identifying protein carbonylation sites.</description><subject>Aging</subject><subject>Amino acids</subject><subject>Analysis</subject><subject>Attention technique</subject><subject>Carbonylation</subject><subject>Carbonyls</subject><subject>Deep learning</subject><subject>Disease</subject><subject>Evaluation</subject><subject>Feasibility</subject><subject>Generative adversarial networks</subject><subject>Health aspects</subject><subject>Identifying protein carbonylation sites</subject><subject>Imports</subject><subject>Laws, regulations, etc</subject><subject>Multiple sclerosis</subject><subject>Post-translation</subject><subject>Post-translational modification</subject><subject>Predictions</subject><subject>Protein carbonylation</subject><subject>Proteins</subject><subject>Rebalance</subject><subject>Resampling</subject><subject>Simulation</subject><subject>Support vector machines</subject><issn>1471-2105</issn><issn>1471-2105</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>PIMPY</sourceid><sourceid>DOA</sourceid><recordid>eNptkt9rFDEQxxdRsJ7-Az4t-KIP2-bnJnk8irYHhUKrzyGbTJYce5ua5Kz73zfXE_VE8jDD5DOTfIdv07zH6Bxj2V9kTCRXHSK0Q5xz3JEXzRlmoiYY8Zd_5a-bNzlvEcJCIn7WXK_bOf6AqS2PsXs0S5tgMJOZbZjHNpdkCoxL62Nqg4O5BL8cLqxJQ5yXyZQQ5zaHAvlt88qbKcO7X3HVfPvy-evldXdze7W5XN90lmNVOiEsMOsAES8owVIgoxinCHsQxA3KAhAinZLKcUGGXnJELRUwMDFYTzldNZvjXBfNVj-ksDNp0dEE_VyIadQmlWAn0Ew41BuEDVGeCUolgxqt6FGvnK-7WjUfj7MeUvy-h1z0LmQLU9UPcZ81kVIpJQVjFf3wD7qN-zRXpZVSCPGeE_qHGk19P8w-1g3aw1C9FoIIVj-BK3X-H6oeB7tg4ww-1PpJw6eThsoU-FlGs89Zb-7vTllyZG2KOSfwv3eEkT54RR-9oqt-_ewVTegThO-tlA</recordid><startdate>20231113</startdate><enddate>20231113</enddate><creator>Chen, Linjun</creator><creator>Jing, Xiao-Yuan</creator><creator>Hao, Yaru</creator><creator>Liu, Wei</creator><creator>Zhu, Xiaoke</creator><creator>Han, Wei</creator><general>BioMed Central Ltd</general><general>BioMed Central</general><general>BMC</general><scope>AAYXX</scope><scope>CITATION</scope><scope>ISR</scope><scope>3V.</scope><scope>7QO</scope><scope>7SC</scope><scope>7X7</scope><scope>7XB</scope><scope>88E</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FH</scope><scope>8FI</scope><scope>8FJ</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AEUYN</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BBNVY</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>BHPHI</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FR3</scope><scope>FYUFA</scope><scope>GHDGH</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>K9.</scope><scope>L7M</scope><scope>LK8</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M0S</scope><scope>M1P</scope><scope>M7P</scope><scope>P5Z</scope><scope>P62</scope><scope>P64</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>7X8</scope><scope>DOA</scope></search><sort><creationdate>20231113</creationdate><title>A novel two-way rebalancing strategy for identifying carbonylation sites</title><author>Chen, Linjun ; Jing, Xiao-Yuan ; Hao, Yaru ; Liu, Wei ; Zhu, Xiaoke ; Han, Wei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c519t-77ce4cde02f7321870a945301fe72db9cee228d989d572b68503c37eb47bcf353</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Aging</topic><topic>Amino acids</topic><topic>Analysis</topic><topic>Attention technique</topic><topic>Carbonylation</topic><topic>Carbonyls</topic><topic>Deep learning</topic><topic>Disease</topic><topic>Evaluation</topic><topic>Feasibility</topic><topic>Generative adversarial networks</topic><topic>Health aspects</topic><topic>Identifying protein carbonylation sites</topic><topic>Imports</topic><topic>Laws, regulations, etc</topic><topic>Multiple sclerosis</topic><topic>Post-translation</topic><topic>Post-translational modification</topic><topic>Predictions</topic><topic>Protein carbonylation</topic><topic>Proteins</topic><topic>Rebalance</topic><topic>Resampling</topic><topic>Simulation</topic><topic>Support vector machines</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Linjun</creatorcontrib><creatorcontrib>Jing, Xiao-Yuan</creatorcontrib><creatorcontrib>Hao, Yaru</creatorcontrib><creatorcontrib>Liu, Wei</creatorcontrib><creatorcontrib>Zhu, Xiaoke</creatorcontrib><creatorcontrib>Han, Wei</creatorcontrib><collection>CrossRef</collection><collection>Gale In Context: Science</collection><collection>ProQuest Central (Corporate)</collection><collection>Biotechnology Research Abstracts</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest_Health &amp; Medical Collection</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Medical Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>Hospital Premium Collection</collection><collection>Hospital Premium Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest One Sustainability</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Database‎ (1962 - current)</collection><collection>ProQuest Central Essentials</collection><collection>Biological Science Collection</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest Natural Science Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Engineering Research Database</collection><collection>Health Research Premium Collection</collection><collection>Health Research Premium Collection (Alumni)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer science database</collection><collection>ProQuest Health &amp; Medical Complete (Alumni)</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>ProQuest Biological Science Collection</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>Health &amp; Medical Collection (Alumni Edition)</collection><collection>PML(ProQuest Medical Library)</collection><collection>ProQuest Biological Science Journals</collection><collection>ProQuest advanced technologies &amp; aerospace journals</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>Biotechnology and BioEngineering Abstracts</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>MEDLINE - Academic</collection><collection>Directory of Open Access Journals(OpenAccess)</collection><jtitle>BMC bioinformatics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Linjun</au><au>Jing, Xiao-Yuan</au><au>Hao, Yaru</au><au>Liu, Wei</au><au>Zhu, Xiaoke</au><au>Han, Wei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A novel two-way rebalancing strategy for identifying carbonylation sites</atitle><jtitle>BMC bioinformatics</jtitle><date>2023-11-13</date><risdate>2023</risdate><volume>24</volume><issue>1</issue><spage>1</spage><epage>429</epage><pages>1-429</pages><artnum>429</artnum><issn>1471-2105</issn><eissn>1471-2105</eissn><abstract>As an irreversible post-translational modification, protein carbonylation is closely related to many diseases and aging. Protein carbonylation prediction for related patients is significant, which can help clinicians make appropriate therapeutic schemes. Because carbonylation sites can be used to indicate change or loss of protein function, integrating these protein carbonylation site data has been a promising method in prediction. Based on these protein carbonylation site data, some protein carbonylation prediction methods have been proposed. However, most data is highly class imbalanced, and the number of un-carbonylation sites greatly exceeds that of carbonylation sites. Unfortunately, existing methods have not addressed this issue adequately. In this work, we propose a novel two-way rebalancing strategy based on the attention technique and generative adversarial network (Carsite_AGan) for identifying protein carbonylation sites. Specifically, Carsite_AGan proposes a novel undersampling method based on attention technology that allows sites with high importance value to be selected from un-carbonylation sites. The attention technique can obtain the value of each sample's importance. In the meanwhile, Carsite_AGan designs a generative adversarial network-based oversampling method to generate high-feasibility carbonylation sites. The generative adversarial network can generate high-feasibility samples through its generator and discriminator. Finally, we use a classifier like a nonlinear support vector machine to identify protein carbonylation sites. Experimental results demonstrate that our approach significantly outperforms other resampling methods. Using our approach to resampling carbonylation data can significantly improve the effect of identifying protein carbonylation sites.</abstract><cop>London</cop><pub>BioMed Central Ltd</pub><doi>10.1186/s12859-023-05551-2</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1471-2105
ispartof BMC bioinformatics, 2023-11, Vol.24 (1), p.1-429, Article 429
issn 1471-2105
1471-2105
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_47d06a01a29f473384ef47c76069df02
source Publicly Available Content Database; PubMed Central(OpenAccess)
subjects Aging
Amino acids
Analysis
Attention technique
Carbonylation
Carbonyls
Deep learning
Disease
Evaluation
Feasibility
Generative adversarial networks
Health aspects
Identifying protein carbonylation sites
Imports
Laws, regulations, etc
Multiple sclerosis
Post-translation
Post-translational modification
Predictions
Protein carbonylation
Proteins
Rebalance
Resampling
Simulation
Support vector machines
title A novel two-way rebalancing strategy for identifying carbonylation sites
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-21T09%3A43%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20novel%20two-way%20rebalancing%20strategy%20for%20identifying%20carbonylation%20sites&rft.jtitle=BMC%20bioinformatics&rft.au=Chen,%20Linjun&rft.date=2023-11-13&rft.volume=24&rft.issue=1&rft.spage=1&rft.epage=429&rft.pages=1-429&rft.artnum=429&rft.issn=1471-2105&rft.eissn=1471-2105&rft_id=info:doi/10.1186/s12859-023-05551-2&rft_dat=%3Cgale_doaj_%3EA772743381%3C/gale_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c519t-77ce4cde02f7321870a945301fe72db9cee228d989d572b68503c37eb47bcf353%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2890056523&rft_id=info:pmid/&rft_galeid=A772743381&rfr_iscdi=true