Loading…

OCRFinder: a noise-tolerance machine learning method for accurately estimating open chromatin regions

Open chromatin regions are the genomic regions associated with basic cellular physiological activities, while chromatin accessibility is reported to affect gene expressions and functions. A basic computational problem is to efficiently estimate open chromatin regions, which could facilitate both gen...

Full description

Saved in:
Bibliographic Details
Published in:Frontiers in genetics 2023-06, Vol.14, p.1184744
Main Authors: Ren, Jiayi, Liu, Yuqian, Zhu, Xiaoyan, Wang, Xuwen, Li, Yifei, Liu, Yuxin, Hu, Wenqing, Zhang, Xuanping, Wang, Jiayin
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c469t-ee22c62e10de71de95090f9f61a4b0afba21e145ecde105fa44aca52f770ff5f3
cites cdi_FETCH-LOGICAL-c469t-ee22c62e10de71de95090f9f61a4b0afba21e145ecde105fa44aca52f770ff5f3
container_end_page
container_issue
container_start_page 1184744
container_title Frontiers in genetics
container_volume 14
creator Ren, Jiayi
Liu, Yuqian
Zhu, Xiaoyan
Wang, Xuwen
Li, Yifei
Liu, Yuxin
Hu, Wenqing
Zhang, Xuanping
Wang, Jiayin
description Open chromatin regions are the genomic regions associated with basic cellular physiological activities, while chromatin accessibility is reported to affect gene expressions and functions. A basic computational problem is to efficiently estimate open chromatin regions, which could facilitate both genomic and epigenetic studies. Currently, ATAC-seq and cfDNA-seq (plasma cell-free DNA sequencing) are two popular strategies to detect OCRs. As cfDNA-seq can obtain more biomarkers in one round of sequencing, it is considered more effective and convenient. However, in processing cfDNA-seq data, due to the dynamically variable chromatin accessibility, it is quite difficult to obtain the training data with pure OCRs or non-OCRs, and leads to a noise problem for either feature-based approaches or learning-based approaches. In this paper, we propose a learning-based OCR estimation approach with a noise-tolerance design. The proposed approach, named OCRFinder, incorporates the ideas of ensemble learning framework and semi-supervised strategy to avoid potential overfitting of noisy labels, which are the false positives on OCRs and non-OCRs. Compared to different noise control strategies and state-of-the-art approaches, OCRFinder achieved higher accuracies and sensitivities in the experiments. In addition, OCRFinder also has an excellent performance in ATAC-seq or DNase-seq comparison experiments.
doi_str_mv 10.3389/fgene.2023.1184744
format article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_0c2a3d3b7d33405ab22e00fbbb4dd384</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_0c2a3d3b7d33405ab22e00fbbb4dd384</doaj_id><sourcerecordid>2827254779</sourcerecordid><originalsourceid>FETCH-LOGICAL-c469t-ee22c62e10de71de95090f9f61a4b0afba21e145ecde105fa44aca52f770ff5f3</originalsourceid><addsrcrecordid>eNpVkUtr3DAURk1oSUKaP5BF0bIbT_WyZXdTytC0gUCgNGtxLV15FGxpKnkK-ffVPBoSbfT67tFFp6puGF0J0fWf3YgBV5xysWKsk0rKs-qSta2sO8rZu1fri-o65ydahuyFEPK8uhBKcNE23WWFD-tftz5YTF8IkBB9xnqJEyYIBskMZuMDkgkhBR9GMuOyiZa4mAgYs0uw4PRMMC9-hmUfiFsMxGxSPOxJwtHHkD9U7x1MGa9P81X1ePv99_pnff_w42797b42su2XGpFz03Jk1KJiFvuG9tT1rmUgBwpuAM6QyQaNLZnGgZRgoOFOKepc48RVdXfk2ghPeptKV-lZR_D6cBDTqCEt3kyoqeEgrBiULV9CGxg4R0rdMAzSWtHJwvp6ZG13w4zWYFgSTG-gb2-C3-gx_tWM8rbooIXw6URI8c-ufJKefTY4TRAw7rLmHVe8kUr1JcqPUZNizgndyzuM6r1vffCt9771yXcp-vi6w5eS_3bFP6AFqls</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2827254779</pqid></control><display><type>article</type><title>OCRFinder: a noise-tolerance machine learning method for accurately estimating open chromatin regions</title><source>PubMed Central</source><creator>Ren, Jiayi ; Liu, Yuqian ; Zhu, Xiaoyan ; Wang, Xuwen ; Li, Yifei ; Liu, Yuxin ; Hu, Wenqing ; Zhang, Xuanping ; Wang, Jiayin</creator><creatorcontrib>Ren, Jiayi ; Liu, Yuqian ; Zhu, Xiaoyan ; Wang, Xuwen ; Li, Yifei ; Liu, Yuxin ; Hu, Wenqing ; Zhang, Xuanping ; Wang, Jiayin</creatorcontrib><description>Open chromatin regions are the genomic regions associated with basic cellular physiological activities, while chromatin accessibility is reported to affect gene expressions and functions. A basic computational problem is to efficiently estimate open chromatin regions, which could facilitate both genomic and epigenetic studies. Currently, ATAC-seq and cfDNA-seq (plasma cell-free DNA sequencing) are two popular strategies to detect OCRs. As cfDNA-seq can obtain more biomarkers in one round of sequencing, it is considered more effective and convenient. However, in processing cfDNA-seq data, due to the dynamically variable chromatin accessibility, it is quite difficult to obtain the training data with pure OCRs or non-OCRs, and leads to a noise problem for either feature-based approaches or learning-based approaches. In this paper, we propose a learning-based OCR estimation approach with a noise-tolerance design. The proposed approach, named OCRFinder, incorporates the ideas of ensemble learning framework and semi-supervised strategy to avoid potential overfitting of noisy labels, which are the false positives on OCRs and non-OCRs. Compared to different noise control strategies and state-of-the-art approaches, OCRFinder achieved higher accuracies and sensitivities in the experiments. In addition, OCRFinder also has an excellent performance in ATAC-seq or DNase-seq comparison experiments.</description><identifier>ISSN: 1664-8021</identifier><identifier>EISSN: 1664-8021</identifier><identifier>DOI: 10.3389/fgene.2023.1184744</identifier><identifier>PMID: 37323658</identifier><language>eng</language><publisher>Switzerland: Frontiers Media S.A</publisher><subject>cell-free DNA - cfDNA ; chromatin accessibility ; Genetics ; noisy label learning ; open chromatin region ; sequencing data analyses</subject><ispartof>Frontiers in genetics, 2023-06, Vol.14, p.1184744</ispartof><rights>Copyright © 2023 Ren, Liu, Zhu, Wang, Li, Liu, Hu, Zhang and Wang.</rights><rights>Copyright © 2023 Ren, Liu, Zhu, Wang, Li, Liu, Hu, Zhang and Wang. 2023 Ren, Liu, Zhu, Wang, Li, Liu, Hu, Zhang and Wang</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c469t-ee22c62e10de71de95090f9f61a4b0afba21e145ecde105fa44aca52f770ff5f3</citedby><cites>FETCH-LOGICAL-c469t-ee22c62e10de71de95090f9f61a4b0afba21e145ecde105fa44aca52f770ff5f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC10267440/pdf/$$EPDF$$P50$$Gpubmedcentral$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC10267440/$$EHTML$$P50$$Gpubmedcentral$$Hfree_for_read</linktohtml><link.rule.ids>230,314,723,776,780,881,27901,27902,53766,53768</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37323658$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Ren, Jiayi</creatorcontrib><creatorcontrib>Liu, Yuqian</creatorcontrib><creatorcontrib>Zhu, Xiaoyan</creatorcontrib><creatorcontrib>Wang, Xuwen</creatorcontrib><creatorcontrib>Li, Yifei</creatorcontrib><creatorcontrib>Liu, Yuxin</creatorcontrib><creatorcontrib>Hu, Wenqing</creatorcontrib><creatorcontrib>Zhang, Xuanping</creatorcontrib><creatorcontrib>Wang, Jiayin</creatorcontrib><title>OCRFinder: a noise-tolerance machine learning method for accurately estimating open chromatin regions</title><title>Frontiers in genetics</title><addtitle>Front Genet</addtitle><description>Open chromatin regions are the genomic regions associated with basic cellular physiological activities, while chromatin accessibility is reported to affect gene expressions and functions. A basic computational problem is to efficiently estimate open chromatin regions, which could facilitate both genomic and epigenetic studies. Currently, ATAC-seq and cfDNA-seq (plasma cell-free DNA sequencing) are two popular strategies to detect OCRs. As cfDNA-seq can obtain more biomarkers in one round of sequencing, it is considered more effective and convenient. However, in processing cfDNA-seq data, due to the dynamically variable chromatin accessibility, it is quite difficult to obtain the training data with pure OCRs or non-OCRs, and leads to a noise problem for either feature-based approaches or learning-based approaches. In this paper, we propose a learning-based OCR estimation approach with a noise-tolerance design. The proposed approach, named OCRFinder, incorporates the ideas of ensemble learning framework and semi-supervised strategy to avoid potential overfitting of noisy labels, which are the false positives on OCRs and non-OCRs. Compared to different noise control strategies and state-of-the-art approaches, OCRFinder achieved higher accuracies and sensitivities in the experiments. In addition, OCRFinder also has an excellent performance in ATAC-seq or DNase-seq comparison experiments.</description><subject>cell-free DNA - cfDNA</subject><subject>chromatin accessibility</subject><subject>Genetics</subject><subject>noisy label learning</subject><subject>open chromatin region</subject><subject>sequencing data analyses</subject><issn>1664-8021</issn><issn>1664-8021</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>DOA</sourceid><recordid>eNpVkUtr3DAURk1oSUKaP5BF0bIbT_WyZXdTytC0gUCgNGtxLV15FGxpKnkK-ffVPBoSbfT67tFFp6puGF0J0fWf3YgBV5xysWKsk0rKs-qSta2sO8rZu1fri-o65ydahuyFEPK8uhBKcNE23WWFD-tftz5YTF8IkBB9xnqJEyYIBskMZuMDkgkhBR9GMuOyiZa4mAgYs0uw4PRMMC9-hmUfiFsMxGxSPOxJwtHHkD9U7x1MGa9P81X1ePv99_pnff_w42797b42su2XGpFz03Jk1KJiFvuG9tT1rmUgBwpuAM6QyQaNLZnGgZRgoOFOKepc48RVdXfk2ghPeptKV-lZR_D6cBDTqCEt3kyoqeEgrBiULV9CGxg4R0rdMAzSWtHJwvp6ZG13w4zWYFgSTG-gb2-C3-gx_tWM8rbooIXw6URI8c-ufJKefTY4TRAw7rLmHVe8kUr1JcqPUZNizgndyzuM6r1vffCt9771yXcp-vi6w5eS_3bFP6AFqls</recordid><startdate>20230601</startdate><enddate>20230601</enddate><creator>Ren, Jiayi</creator><creator>Liu, Yuqian</creator><creator>Zhu, Xiaoyan</creator><creator>Wang, Xuwen</creator><creator>Li, Yifei</creator><creator>Liu, Yuxin</creator><creator>Hu, Wenqing</creator><creator>Zhang, Xuanping</creator><creator>Wang, Jiayin</creator><general>Frontiers Media S.A</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>5PM</scope><scope>DOA</scope></search><sort><creationdate>20230601</creationdate><title>OCRFinder: a noise-tolerance machine learning method for accurately estimating open chromatin regions</title><author>Ren, Jiayi ; Liu, Yuqian ; Zhu, Xiaoyan ; Wang, Xuwen ; Li, Yifei ; Liu, Yuxin ; Hu, Wenqing ; Zhang, Xuanping ; Wang, Jiayin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c469t-ee22c62e10de71de95090f9f61a4b0afba21e145ecde105fa44aca52f770ff5f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>cell-free DNA - cfDNA</topic><topic>chromatin accessibility</topic><topic>Genetics</topic><topic>noisy label learning</topic><topic>open chromatin region</topic><topic>sequencing data analyses</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ren, Jiayi</creatorcontrib><creatorcontrib>Liu, Yuqian</creatorcontrib><creatorcontrib>Zhu, Xiaoyan</creatorcontrib><creatorcontrib>Wang, Xuwen</creatorcontrib><creatorcontrib>Li, Yifei</creatorcontrib><creatorcontrib>Liu, Yuxin</creatorcontrib><creatorcontrib>Hu, Wenqing</creatorcontrib><creatorcontrib>Zhang, Xuanping</creatorcontrib><creatorcontrib>Wang, Jiayin</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>PubMed Central (Full Participant titles)</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Frontiers in genetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ren, Jiayi</au><au>Liu, Yuqian</au><au>Zhu, Xiaoyan</au><au>Wang, Xuwen</au><au>Li, Yifei</au><au>Liu, Yuxin</au><au>Hu, Wenqing</au><au>Zhang, Xuanping</au><au>Wang, Jiayin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>OCRFinder: a noise-tolerance machine learning method for accurately estimating open chromatin regions</atitle><jtitle>Frontiers in genetics</jtitle><addtitle>Front Genet</addtitle><date>2023-06-01</date><risdate>2023</risdate><volume>14</volume><spage>1184744</spage><pages>1184744-</pages><issn>1664-8021</issn><eissn>1664-8021</eissn><abstract>Open chromatin regions are the genomic regions associated with basic cellular physiological activities, while chromatin accessibility is reported to affect gene expressions and functions. A basic computational problem is to efficiently estimate open chromatin regions, which could facilitate both genomic and epigenetic studies. Currently, ATAC-seq and cfDNA-seq (plasma cell-free DNA sequencing) are two popular strategies to detect OCRs. As cfDNA-seq can obtain more biomarkers in one round of sequencing, it is considered more effective and convenient. However, in processing cfDNA-seq data, due to the dynamically variable chromatin accessibility, it is quite difficult to obtain the training data with pure OCRs or non-OCRs, and leads to a noise problem for either feature-based approaches or learning-based approaches. In this paper, we propose a learning-based OCR estimation approach with a noise-tolerance design. The proposed approach, named OCRFinder, incorporates the ideas of ensemble learning framework and semi-supervised strategy to avoid potential overfitting of noisy labels, which are the false positives on OCRs and non-OCRs. Compared to different noise control strategies and state-of-the-art approaches, OCRFinder achieved higher accuracies and sensitivities in the experiments. In addition, OCRFinder also has an excellent performance in ATAC-seq or DNase-seq comparison experiments.</abstract><cop>Switzerland</cop><pub>Frontiers Media S.A</pub><pmid>37323658</pmid><doi>10.3389/fgene.2023.1184744</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1664-8021
ispartof Frontiers in genetics, 2023-06, Vol.14, p.1184744
issn 1664-8021
1664-8021
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_0c2a3d3b7d33405ab22e00fbbb4dd384
source PubMed Central
subjects cell-free DNA - cfDNA
chromatin accessibility
Genetics
noisy label learning
open chromatin region
sequencing data analyses
title OCRFinder: a noise-tolerance machine learning method for accurately estimating open chromatin regions
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T04%3A56%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=OCRFinder:%20a%20noise-tolerance%20machine%20learning%20method%20for%20accurately%20estimating%20open%20chromatin%20regions&rft.jtitle=Frontiers%20in%20genetics&rft.au=Ren,%20Jiayi&rft.date=2023-06-01&rft.volume=14&rft.spage=1184744&rft.pages=1184744-&rft.issn=1664-8021&rft.eissn=1664-8021&rft_id=info:doi/10.3389/fgene.2023.1184744&rft_dat=%3Cproquest_doaj_%3E2827254779%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c469t-ee22c62e10de71de95090f9f61a4b0afba21e145ecde105fa44aca52f770ff5f3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2827254779&rft_id=info:pmid/37323658&rfr_iscdi=true