Loading…

RDDL: A systematic ensemble pipeline tool that streamlines balancing training schemes to reduce the effects of data imbalance in rare-disease-related deep-learning applications

Identifying lowly prevalent diseases, or rare diseases, in their early stages is key to disease treatment in the medical field. Deep learning techniques now provide promising tools for this purpose. Nevertheless, the low prevalence of rare diseases entangles the proper application of deep networks f...

Full description

Saved in:
Bibliographic Details
Published in:Computational biology and chemistry 2023-10, Vol.106, p.107929-107929, Article 107929
Main Authors: Yang, Tzu-Hsien, Liao, Zhan-Yi, Yu, Yu-Huai, Hsia, Min
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c380t-7248f2fee0d2050fdf61530dc76420473a1cb2564e45f186ac707cb1d121eeff3
cites cdi_FETCH-LOGICAL-c380t-7248f2fee0d2050fdf61530dc76420473a1cb2564e45f186ac707cb1d121eeff3
container_end_page 107929
container_issue
container_start_page 107929
container_title Computational biology and chemistry
container_volume 106
creator Yang, Tzu-Hsien
Liao, Zhan-Yi
Yu, Yu-Huai
Hsia, Min
description Identifying lowly prevalent diseases, or rare diseases, in their early stages is key to disease treatment in the medical field. Deep learning techniques now provide promising tools for this purpose. Nevertheless, the low prevalence of rare diseases entangles the proper application of deep networks for disease identification due to the severe class-imbalance issue. In the past decades, some balancing methods have been studied to handle the data-imbalance issue. The bad news is that it is verified that none of these methods guarantees superior performance to others. This performance variation causes the need to formulate a systematic pipeline with a comprehensive software tool for enhancing deep-learning applications in rare disease identification. We reviewed the existing balancing schemes and summarized a systematic deep ensemble pipeline with a constructed tool called RDDL for handling the data imbalance issue. Through two real case studies, we showed that rare disease identification could be boosted with this systematic RDDL pipeline tool by lessening the data imbalance problem during model training. The RDDL pipeline tool is available at https://github.com/cobisLab/RDDL/. [Display omitted] •The low prevalence of rare diseases complicates related deep network applications.•Solving data imbalance in rare diseases requires considering all balancing schemes.•A tool called RDDL was implemented to help users handle the issue systematically.•RDDL applies to diverse rare disease data that bear different modalities.
doi_str_mv 10.1016/j.compbiolchem.2023.107929
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2844090117</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S1476927123001202</els_id><sourcerecordid>2844090117</sourcerecordid><originalsourceid>FETCH-LOGICAL-c380t-7248f2fee0d2050fdf61530dc76420473a1cb2564e45f186ac707cb1d121eeff3</originalsourceid><addsrcrecordid>eNqNUU1vFDEMHSEqWgp_AUWcuMySZD4y21vV5UtaqVIFErcokzhsVslkiLNI_Vf8RDJMqTj2ZMt-z372q6q3jG4YZf3740bHMI8uen2AsOGUN6Uhtnz7rLpgrejrLR--P3_MBTuvXiIeaQFS2r2ozhvRMcFpf1H9vtvt9lfkmuA9ZggqO01gQgijBzK7GbybgOQYPckHlQnmBCosRSSj8mrSbvpBclJuWhJcFJVWjiSBOelCPQABa0FnJNESo7IiLqxUIG4iSSWojUNQCHUCrzIYYgDm2oNKf6eqefZOF21xwlfVmVUe4fVDvKy-ffzw9eZzvb_99OXmel_rZqC5FrwdLLcA1HDaUWtsz7qGGi36ltNWNIrpkXd9C21n2dArLajQIzOMMyh6m8vq3Tp3TvHnCTDL4FCDL7ohnlDyoW3pljImCvRqheoUERNYOScXVLqXjMrFMXmU_zsmF8fk6lghv3nYcxoDmEfqP4sKYLcCoFz7y0GSqB2U5xmXylelie4pe_4AFcKyQg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2844090117</pqid></control><display><type>article</type><title>RDDL: A systematic ensemble pipeline tool that streamlines balancing training schemes to reduce the effects of data imbalance in rare-disease-related deep-learning applications</title><source>ScienceDirect Freedom Collection 2022-2024</source><creator>Yang, Tzu-Hsien ; Liao, Zhan-Yi ; Yu, Yu-Huai ; Hsia, Min</creator><creatorcontrib>Yang, Tzu-Hsien ; Liao, Zhan-Yi ; Yu, Yu-Huai ; Hsia, Min</creatorcontrib><description>Identifying lowly prevalent diseases, or rare diseases, in their early stages is key to disease treatment in the medical field. Deep learning techniques now provide promising tools for this purpose. Nevertheless, the low prevalence of rare diseases entangles the proper application of deep networks for disease identification due to the severe class-imbalance issue. In the past decades, some balancing methods have been studied to handle the data-imbalance issue. The bad news is that it is verified that none of these methods guarantees superior performance to others. This performance variation causes the need to formulate a systematic pipeline with a comprehensive software tool for enhancing deep-learning applications in rare disease identification. We reviewed the existing balancing schemes and summarized a systematic deep ensemble pipeline with a constructed tool called RDDL for handling the data imbalance issue. Through two real case studies, we showed that rare disease identification could be boosted with this systematic RDDL pipeline tool by lessening the data imbalance problem during model training. The RDDL pipeline tool is available at https://github.com/cobisLab/RDDL/. [Display omitted] •The low prevalence of rare diseases complicates related deep network applications.•Solving data imbalance in rare diseases requires considering all balancing schemes.•A tool called RDDL was implemented to help users handle the issue systematically.•RDDL applies to diverse rare disease data that bear different modalities.</description><identifier>ISSN: 1476-9271</identifier><identifier>EISSN: 1476-928X</identifier><identifier>DOI: 10.1016/j.compbiolchem.2023.107929</identifier><identifier>PMID: 37517206</identifier><language>eng</language><publisher>England: Elsevier Ltd</publisher><subject>Identification of lowly prevalent diseases ; Imbalanced data classification ; Model ensemble ; Rare disease</subject><ispartof>Computational biology and chemistry, 2023-10, Vol.106, p.107929-107929, Article 107929</ispartof><rights>2023 Elsevier Ltd</rights><rights>Copyright © 2023 Elsevier Ltd. All rights reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c380t-7248f2fee0d2050fdf61530dc76420473a1cb2564e45f186ac707cb1d121eeff3</citedby><cites>FETCH-LOGICAL-c380t-7248f2fee0d2050fdf61530dc76420473a1cb2564e45f186ac707cb1d121eeff3</cites><orcidid>0000-0001-9420-196X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37517206$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Yang, Tzu-Hsien</creatorcontrib><creatorcontrib>Liao, Zhan-Yi</creatorcontrib><creatorcontrib>Yu, Yu-Huai</creatorcontrib><creatorcontrib>Hsia, Min</creatorcontrib><title>RDDL: A systematic ensemble pipeline tool that streamlines balancing training schemes to reduce the effects of data imbalance in rare-disease-related deep-learning applications</title><title>Computational biology and chemistry</title><addtitle>Comput Biol Chem</addtitle><description>Identifying lowly prevalent diseases, or rare diseases, in their early stages is key to disease treatment in the medical field. Deep learning techniques now provide promising tools for this purpose. Nevertheless, the low prevalence of rare diseases entangles the proper application of deep networks for disease identification due to the severe class-imbalance issue. In the past decades, some balancing methods have been studied to handle the data-imbalance issue. The bad news is that it is verified that none of these methods guarantees superior performance to others. This performance variation causes the need to formulate a systematic pipeline with a comprehensive software tool for enhancing deep-learning applications in rare disease identification. We reviewed the existing balancing schemes and summarized a systematic deep ensemble pipeline with a constructed tool called RDDL for handling the data imbalance issue. Through two real case studies, we showed that rare disease identification could be boosted with this systematic RDDL pipeline tool by lessening the data imbalance problem during model training. The RDDL pipeline tool is available at https://github.com/cobisLab/RDDL/. [Display omitted] •The low prevalence of rare diseases complicates related deep network applications.•Solving data imbalance in rare diseases requires considering all balancing schemes.•A tool called RDDL was implemented to help users handle the issue systematically.•RDDL applies to diverse rare disease data that bear different modalities.</description><subject>Identification of lowly prevalent diseases</subject><subject>Imbalanced data classification</subject><subject>Model ensemble</subject><subject>Rare disease</subject><issn>1476-9271</issn><issn>1476-928X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><recordid>eNqNUU1vFDEMHSEqWgp_AUWcuMySZD4y21vV5UtaqVIFErcokzhsVslkiLNI_Vf8RDJMqTj2ZMt-z372q6q3jG4YZf3740bHMI8uen2AsOGUN6Uhtnz7rLpgrejrLR--P3_MBTuvXiIeaQFS2r2ozhvRMcFpf1H9vtvt9lfkmuA9ZggqO01gQgijBzK7GbybgOQYPckHlQnmBCosRSSj8mrSbvpBclJuWhJcFJVWjiSBOelCPQABa0FnJNESo7IiLqxUIG4iSSWojUNQCHUCrzIYYgDm2oNKf6eqefZOF21xwlfVmVUe4fVDvKy-ffzw9eZzvb_99OXmel_rZqC5FrwdLLcA1HDaUWtsz7qGGi36ltNWNIrpkXd9C21n2dArLajQIzOMMyh6m8vq3Tp3TvHnCTDL4FCDL7ohnlDyoW3pljImCvRqheoUERNYOScXVLqXjMrFMXmU_zsmF8fk6lghv3nYcxoDmEfqP4sKYLcCoFz7y0GSqB2U5xmXylelie4pe_4AFcKyQg</recordid><startdate>20231001</startdate><enddate>20231001</enddate><creator>Yang, Tzu-Hsien</creator><creator>Liao, Zhan-Yi</creator><creator>Yu, Yu-Huai</creator><creator>Hsia, Min</creator><general>Elsevier Ltd</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-9420-196X</orcidid></search><sort><creationdate>20231001</creationdate><title>RDDL: A systematic ensemble pipeline tool that streamlines balancing training schemes to reduce the effects of data imbalance in rare-disease-related deep-learning applications</title><author>Yang, Tzu-Hsien ; Liao, Zhan-Yi ; Yu, Yu-Huai ; Hsia, Min</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c380t-7248f2fee0d2050fdf61530dc76420473a1cb2564e45f186ac707cb1d121eeff3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Identification of lowly prevalent diseases</topic><topic>Imbalanced data classification</topic><topic>Model ensemble</topic><topic>Rare disease</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Tzu-Hsien</creatorcontrib><creatorcontrib>Liao, Zhan-Yi</creatorcontrib><creatorcontrib>Yu, Yu-Huai</creatorcontrib><creatorcontrib>Hsia, Min</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Computational biology and chemistry</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Tzu-Hsien</au><au>Liao, Zhan-Yi</au><au>Yu, Yu-Huai</au><au>Hsia, Min</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>RDDL: A systematic ensemble pipeline tool that streamlines balancing training schemes to reduce the effects of data imbalance in rare-disease-related deep-learning applications</atitle><jtitle>Computational biology and chemistry</jtitle><addtitle>Comput Biol Chem</addtitle><date>2023-10-01</date><risdate>2023</risdate><volume>106</volume><spage>107929</spage><epage>107929</epage><pages>107929-107929</pages><artnum>107929</artnum><issn>1476-9271</issn><eissn>1476-928X</eissn><abstract>Identifying lowly prevalent diseases, or rare diseases, in their early stages is key to disease treatment in the medical field. Deep learning techniques now provide promising tools for this purpose. Nevertheless, the low prevalence of rare diseases entangles the proper application of deep networks for disease identification due to the severe class-imbalance issue. In the past decades, some balancing methods have been studied to handle the data-imbalance issue. The bad news is that it is verified that none of these methods guarantees superior performance to others. This performance variation causes the need to formulate a systematic pipeline with a comprehensive software tool for enhancing deep-learning applications in rare disease identification. We reviewed the existing balancing schemes and summarized a systematic deep ensemble pipeline with a constructed tool called RDDL for handling the data imbalance issue. Through two real case studies, we showed that rare disease identification could be boosted with this systematic RDDL pipeline tool by lessening the data imbalance problem during model training. The RDDL pipeline tool is available at https://github.com/cobisLab/RDDL/. [Display omitted] •The low prevalence of rare diseases complicates related deep network applications.•Solving data imbalance in rare diseases requires considering all balancing schemes.•A tool called RDDL was implemented to help users handle the issue systematically.•RDDL applies to diverse rare disease data that bear different modalities.</abstract><cop>England</cop><pub>Elsevier Ltd</pub><pmid>37517206</pmid><doi>10.1016/j.compbiolchem.2023.107929</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0001-9420-196X</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1476-9271
ispartof Computational biology and chemistry, 2023-10, Vol.106, p.107929-107929, Article 107929
issn 1476-9271
1476-928X
language eng
recordid cdi_proquest_miscellaneous_2844090117
source ScienceDirect Freedom Collection 2022-2024
subjects Identification of lowly prevalent diseases
Imbalanced data classification
Model ensemble
Rare disease
title RDDL: A systematic ensemble pipeline tool that streamlines balancing training schemes to reduce the effects of data imbalance in rare-disease-related deep-learning applications
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T09%3A38%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=RDDL:%20A%20systematic%20ensemble%20pipeline%20tool%20that%20streamlines%20balancing%20training%20schemes%20to%20reduce%20the%20effects%20of%20data%20imbalance%20in%20rare-disease-related%20deep-learning%20applications&rft.jtitle=Computational%20biology%20and%20chemistry&rft.au=Yang,%20Tzu-Hsien&rft.date=2023-10-01&rft.volume=106&rft.spage=107929&rft.epage=107929&rft.pages=107929-107929&rft.artnum=107929&rft.issn=1476-9271&rft.eissn=1476-928X&rft_id=info:doi/10.1016/j.compbiolchem.2023.107929&rft_dat=%3Cproquest_cross%3E2844090117%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c380t-7248f2fee0d2050fdf61530dc76420473a1cb2564e45f186ac707cb1d121eeff3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2844090117&rft_id=info:pmid/37517206&rfr_iscdi=true