Loading…

Benchmarking and building DNA binding affinity models using allele-specific and allele-agnostic transcription factor binding data

Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity manifests itself in vivo as differences in TF occupancy between the two alleles at heterozygous loci. Genome-scale assays such as ChIP-seq currently are limited in their power to detect allele-specific bin...

Full description

Saved in:
Bibliographic Details
Published in:Genome biology 2024-10, Vol.25 (1), p.284-284, Article 284
Main Authors: Li, Xiaoting, Melo, Lucas A N, Bussemaker, Harmen J
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c462t-b52d9161b8ed494fd58770e7b5ef231918f2f36aa843af36157f48a04d2ee3f63
container_end_page 284
container_issue 1
container_start_page 284
container_title Genome biology
container_volume 25
creator Li, Xiaoting
Melo, Lucas A N
Bussemaker, Harmen J
description Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity manifests itself in vivo as differences in TF occupancy between the two alleles at heterozygous loci. Genome-scale assays such as ChIP-seq currently are limited in their power to detect allele-specific binding (ASB) both in terms of read coverage and representation of individual variants in the cell lines used. This makes prediction of allelic differences in TF binding from sequence alone desirable, provided that the reliability of such predictions can be quantitatively assessed. We here propose methods for benchmarking sequence-to-affinity models for TF binding in terms of their ability to predict allelic imbalances in ChIP-seq counts. We use a likelihood function based on an over-dispersed binomial distribution to aggregate evidence for allelic preference across the genome without requiring statistical significance for individual variants. This allows us to systematically compare predictive performance when multiple binding models for the same TF are available. To facilitate the de novo inference of high-quality models from paired-end in vivo binding data such as ChIP-seq, ChIP-exo, and CUT&Tag without read mapping or peak calling, we introduce an extensible reimplementation of our biophysically interpretable machine learning framework named PyProBound. Explicitly accounting for assay-specific bias in DNA fragmentation rate when training on ChIP-seq yields improved TF binding models. Moreover, we show how PyProBound can leverage our threshold-free ASB likelihood function to perform de novo motif discovery using allele-specific ChIP-seq counts. Our work provides new strategies for predicting the functional impact of non-coding variants.
doi_str_mv 10.1186/s13059-024-03424-2
format article
fullrecord <record><control><sourceid>proquest_doaj_</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_203b6acd35344567af1a6e1d3e57ec39</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_203b6acd35344567af1a6e1d3e57ec39</doaj_id><sourcerecordid>3123074040</sourcerecordid><originalsourceid>FETCH-LOGICAL-c462t-b52d9161b8ed494fd58770e7b5ef231918f2f36aa843af36157f48a04d2ee3f63</originalsourceid><addsrcrecordid>eNqNUU1v1DAUjBAVLYU_wAHlyCXgbyfH0haoVJULSNysF_t5cfHGi50ceuw_x80uK45c3huPZkbPmqZ5Q8l7Snv1oVBO5NARJjrCRZ3sWXNGhRadVuTH83_wafOylHtC6CCYetGc8kH0THNx1jx-xMn-3EL-FaZNC5NrxyVE9_S4urtoxzCtGLwPU5gf2m1yGEu7lJWNESN2ZYc2-GBX-4GDzZTKXLk5w1RsDrs5pKn1YOeUj7EOZnjVnHiIBV8f9nnz_dP1t8sv3e3XzzeXF7edFYrN3SiZG6iiY49ODMI72WtNUI8SPeN0oL1nniuAXnCogErtRQ9EOIbIveLnzc0-1yW4N7sc6qcfTIJgViLljYFcL45oGOGjAuu45EJIpcFTUEgdR6nR8qFmvdtn7XL6vWCZzTYUizHChGkphlMpaC81p_8hZZxoQQSpUraX2pxKyeiPV1Jinho3-8ZNbdysjRtWTW8P-cu4RXe0_K2Y_wHl5Ke1</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3123074040</pqid></control><display><type>article</type><title>Benchmarking and building DNA binding affinity models using allele-specific and allele-agnostic transcription factor binding data</title><source>Publicly Available Content Database</source><source>PubMed Central</source><creator>Li, Xiaoting ; Melo, Lucas A N ; Bussemaker, Harmen J</creator><creatorcontrib>Li, Xiaoting ; Melo, Lucas A N ; Bussemaker, Harmen J</creatorcontrib><description>Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity manifests itself in vivo as differences in TF occupancy between the two alleles at heterozygous loci. Genome-scale assays such as ChIP-seq currently are limited in their power to detect allele-specific binding (ASB) both in terms of read coverage and representation of individual variants in the cell lines used. This makes prediction of allelic differences in TF binding from sequence alone desirable, provided that the reliability of such predictions can be quantitatively assessed. We here propose methods for benchmarking sequence-to-affinity models for TF binding in terms of their ability to predict allelic imbalances in ChIP-seq counts. We use a likelihood function based on an over-dispersed binomial distribution to aggregate evidence for allelic preference across the genome without requiring statistical significance for individual variants. This allows us to systematically compare predictive performance when multiple binding models for the same TF are available. To facilitate the de novo inference of high-quality models from paired-end in vivo binding data such as ChIP-seq, ChIP-exo, and CUT&amp;Tag without read mapping or peak calling, we introduce an extensible reimplementation of our biophysically interpretable machine learning framework named PyProBound. Explicitly accounting for assay-specific bias in DNA fragmentation rate when training on ChIP-seq yields improved TF binding models. Moreover, we show how PyProBound can leverage our threshold-free ASB likelihood function to perform de novo motif discovery using allele-specific ChIP-seq counts. Our work provides new strategies for predicting the functional impact of non-coding variants.</description><identifier>ISSN: 1474-760X</identifier><identifier>EISSN: 1474-760X</identifier><identifier>DOI: 10.1186/s13059-024-03424-2</identifier><identifier>PMID: 39482734</identifier><language>eng</language><publisher>England: BMC</publisher><subject>Allele-specific binding ; Alleles ; Benchmarking ; Binding Sites ; binomial distribution ; ChIP-seq, ChIP-exo, CUT&amp;Tag ; chromatin immunoprecipitation ; Chromatin Immunoprecipitation Sequencing ; CTCF, EBF1, PU.1/SPI1 ; DNA ; DNA - genetics ; DNA - metabolism ; DNA fragmentation ; Gene expression regulation ; genome ; heterozygosity ; Humans ; Non-coding variants ; prediction ; probability ; Protein Binding ; Transcription factors ; Transcription Factors - genetics ; Transcription Factors - metabolism</subject><ispartof>Genome biology, 2024-10, Vol.25 (1), p.284-284, Article 284</ispartof><rights>2024. The Author(s).</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c462t-b52d9161b8ed494fd58770e7b5ef231918f2f36aa843af36157f48a04d2ee3f63</cites><orcidid>0000-0002-7563-7554 ; 0000-0002-7274-5277 ; 0000-0003-4938-7587</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,777,781,27905,27906,36994</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/39482734$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Xiaoting</creatorcontrib><creatorcontrib>Melo, Lucas A N</creatorcontrib><creatorcontrib>Bussemaker, Harmen J</creatorcontrib><title>Benchmarking and building DNA binding affinity models using allele-specific and allele-agnostic transcription factor binding data</title><title>Genome biology</title><addtitle>Genome Biol</addtitle><description>Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity manifests itself in vivo as differences in TF occupancy between the two alleles at heterozygous loci. Genome-scale assays such as ChIP-seq currently are limited in their power to detect allele-specific binding (ASB) both in terms of read coverage and representation of individual variants in the cell lines used. This makes prediction of allelic differences in TF binding from sequence alone desirable, provided that the reliability of such predictions can be quantitatively assessed. We here propose methods for benchmarking sequence-to-affinity models for TF binding in terms of their ability to predict allelic imbalances in ChIP-seq counts. We use a likelihood function based on an over-dispersed binomial distribution to aggregate evidence for allelic preference across the genome without requiring statistical significance for individual variants. This allows us to systematically compare predictive performance when multiple binding models for the same TF are available. To facilitate the de novo inference of high-quality models from paired-end in vivo binding data such as ChIP-seq, ChIP-exo, and CUT&amp;Tag without read mapping or peak calling, we introduce an extensible reimplementation of our biophysically interpretable machine learning framework named PyProBound. Explicitly accounting for assay-specific bias in DNA fragmentation rate when training on ChIP-seq yields improved TF binding models. Moreover, we show how PyProBound can leverage our threshold-free ASB likelihood function to perform de novo motif discovery using allele-specific ChIP-seq counts. Our work provides new strategies for predicting the functional impact of non-coding variants.</description><subject>Allele-specific binding</subject><subject>Alleles</subject><subject>Benchmarking</subject><subject>Binding Sites</subject><subject>binomial distribution</subject><subject>ChIP-seq, ChIP-exo, CUT&amp;Tag</subject><subject>chromatin immunoprecipitation</subject><subject>Chromatin Immunoprecipitation Sequencing</subject><subject>CTCF, EBF1, PU.1/SPI1</subject><subject>DNA</subject><subject>DNA - genetics</subject><subject>DNA - metabolism</subject><subject>DNA fragmentation</subject><subject>Gene expression regulation</subject><subject>genome</subject><subject>heterozygosity</subject><subject>Humans</subject><subject>Non-coding variants</subject><subject>prediction</subject><subject>probability</subject><subject>Protein Binding</subject><subject>Transcription factors</subject><subject>Transcription Factors - genetics</subject><subject>Transcription Factors - metabolism</subject><issn>1474-760X</issn><issn>1474-760X</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>DOA</sourceid><recordid>eNqNUU1v1DAUjBAVLYU_wAHlyCXgbyfH0haoVJULSNysF_t5cfHGi50ceuw_x80uK45c3huPZkbPmqZ5Q8l7Snv1oVBO5NARJjrCRZ3sWXNGhRadVuTH83_wafOylHtC6CCYetGc8kH0THNx1jx-xMn-3EL-FaZNC5NrxyVE9_S4urtoxzCtGLwPU5gf2m1yGEu7lJWNESN2ZYc2-GBX-4GDzZTKXLk5w1RsDrs5pKn1YOeUj7EOZnjVnHiIBV8f9nnz_dP1t8sv3e3XzzeXF7edFYrN3SiZG6iiY49ODMI72WtNUI8SPeN0oL1nniuAXnCogErtRQ9EOIbIveLnzc0-1yW4N7sc6qcfTIJgViLljYFcL45oGOGjAuu45EJIpcFTUEgdR6nR8qFmvdtn7XL6vWCZzTYUizHChGkphlMpaC81p_8hZZxoQQSpUraX2pxKyeiPV1Jinho3-8ZNbdysjRtWTW8P-cu4RXe0_K2Y_wHl5Ke1</recordid><startdate>20241031</startdate><enddate>20241031</enddate><creator>Li, Xiaoting</creator><creator>Melo, Lucas A N</creator><creator>Bussemaker, Harmen J</creator><general>BMC</general><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><scope>7S9</scope><scope>L.6</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0002-7563-7554</orcidid><orcidid>https://orcid.org/0000-0002-7274-5277</orcidid><orcidid>https://orcid.org/0000-0003-4938-7587</orcidid></search><sort><creationdate>20241031</creationdate><title>Benchmarking and building DNA binding affinity models using allele-specific and allele-agnostic transcription factor binding data</title><author>Li, Xiaoting ; Melo, Lucas A N ; Bussemaker, Harmen J</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c462t-b52d9161b8ed494fd58770e7b5ef231918f2f36aa843af36157f48a04d2ee3f63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Allele-specific binding</topic><topic>Alleles</topic><topic>Benchmarking</topic><topic>Binding Sites</topic><topic>binomial distribution</topic><topic>ChIP-seq, ChIP-exo, CUT&amp;Tag</topic><topic>chromatin immunoprecipitation</topic><topic>Chromatin Immunoprecipitation Sequencing</topic><topic>CTCF, EBF1, PU.1/SPI1</topic><topic>DNA</topic><topic>DNA - genetics</topic><topic>DNA - metabolism</topic><topic>DNA fragmentation</topic><topic>Gene expression regulation</topic><topic>genome</topic><topic>heterozygosity</topic><topic>Humans</topic><topic>Non-coding variants</topic><topic>prediction</topic><topic>probability</topic><topic>Protein Binding</topic><topic>Transcription factors</topic><topic>Transcription Factors - genetics</topic><topic>Transcription Factors - metabolism</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Xiaoting</creatorcontrib><creatorcontrib>Melo, Lucas A N</creatorcontrib><creatorcontrib>Bussemaker, Harmen J</creatorcontrib><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><collection>AGRICOLA</collection><collection>AGRICOLA - Academic</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Genome biology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Xiaoting</au><au>Melo, Lucas A N</au><au>Bussemaker, Harmen J</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Benchmarking and building DNA binding affinity models using allele-specific and allele-agnostic transcription factor binding data</atitle><jtitle>Genome biology</jtitle><addtitle>Genome Biol</addtitle><date>2024-10-31</date><risdate>2024</risdate><volume>25</volume><issue>1</issue><spage>284</spage><epage>284</epage><pages>284-284</pages><artnum>284</artnum><issn>1474-760X</issn><eissn>1474-760X</eissn><abstract>Transcription factors (TFs) bind to DNA in a highly sequence-specific manner. This specificity manifests itself in vivo as differences in TF occupancy between the two alleles at heterozygous loci. Genome-scale assays such as ChIP-seq currently are limited in their power to detect allele-specific binding (ASB) both in terms of read coverage and representation of individual variants in the cell lines used. This makes prediction of allelic differences in TF binding from sequence alone desirable, provided that the reliability of such predictions can be quantitatively assessed. We here propose methods for benchmarking sequence-to-affinity models for TF binding in terms of their ability to predict allelic imbalances in ChIP-seq counts. We use a likelihood function based on an over-dispersed binomial distribution to aggregate evidence for allelic preference across the genome without requiring statistical significance for individual variants. This allows us to systematically compare predictive performance when multiple binding models for the same TF are available. To facilitate the de novo inference of high-quality models from paired-end in vivo binding data such as ChIP-seq, ChIP-exo, and CUT&amp;Tag without read mapping or peak calling, we introduce an extensible reimplementation of our biophysically interpretable machine learning framework named PyProBound. Explicitly accounting for assay-specific bias in DNA fragmentation rate when training on ChIP-seq yields improved TF binding models. Moreover, we show how PyProBound can leverage our threshold-free ASB likelihood function to perform de novo motif discovery using allele-specific ChIP-seq counts. Our work provides new strategies for predicting the functional impact of non-coding variants.</abstract><cop>England</cop><pub>BMC</pub><pmid>39482734</pmid><doi>10.1186/s13059-024-03424-2</doi><tpages>1</tpages><orcidid>https://orcid.org/0000-0002-7563-7554</orcidid><orcidid>https://orcid.org/0000-0002-7274-5277</orcidid><orcidid>https://orcid.org/0000-0003-4938-7587</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1474-760X
ispartof Genome biology, 2024-10, Vol.25 (1), p.284-284, Article 284
issn 1474-760X
1474-760X
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_203b6acd35344567af1a6e1d3e57ec39
source Publicly Available Content Database; PubMed Central
subjects Allele-specific binding
Alleles
Benchmarking
Binding Sites
binomial distribution
ChIP-seq, ChIP-exo, CUT&Tag
chromatin immunoprecipitation
Chromatin Immunoprecipitation Sequencing
CTCF, EBF1, PU.1/SPI1
DNA
DNA - genetics
DNA - metabolism
DNA fragmentation
Gene expression regulation
genome
heterozygosity
Humans
Non-coding variants
prediction
probability
Protein Binding
Transcription factors
Transcription Factors - genetics
Transcription Factors - metabolism
title Benchmarking and building DNA binding affinity models using allele-specific and allele-agnostic transcription factor binding data
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T20%3A48%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_doaj_&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Benchmarking%20and%20building%20DNA%20binding%20affinity%20models%20using%20allele-specific%20and%20allele-agnostic%20transcription%20factor%20binding%20data&rft.jtitle=Genome%20biology&rft.au=Li,%20Xiaoting&rft.date=2024-10-31&rft.volume=25&rft.issue=1&rft.spage=284&rft.epage=284&rft.pages=284-284&rft.artnum=284&rft.issn=1474-760X&rft.eissn=1474-760X&rft_id=info:doi/10.1186/s13059-024-03424-2&rft_dat=%3Cproquest_doaj_%3E3123074040%3C/proquest_doaj_%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c462t-b52d9161b8ed494fd58770e7b5ef231918f2f36aa843af36157f48a04d2ee3f63%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3123074040&rft_id=info:pmid/39482734&rfr_iscdi=true