Loading…

Interpretable Machine Learning for Finding Intermediate-mass Black Holes

Definitive evidence that globular clusters (GCs) host intermediate-mass black holes (IMBHs) is elusive. Machine-learning (ML) models trained on GC simulations can in principle predict IMBH host candidates based on observable features. This approach has two limitations: first, an accurate ML model is...

Full description

Saved in:
Bibliographic Details
Published in:The Astrophysical journal 2024-04, Vol.965 (1), p.89
Main Authors: Pasquato, Mario, Trevisan, Piero, Askar, Abbas, Lemos, Pablo, Carenini, Gaia, Mapelli, Michela, Hezaveh, Yashar
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by
cites cdi_FETCH-LOGICAL-c398t-1eb62a942a67d26324e6cf90945d40c9afb6c9f261616b9ba848a0f6e56997843
container_end_page
container_issue 1
container_start_page 89
container_title The Astrophysical journal
container_volume 965
creator Pasquato, Mario
Trevisan, Piero
Askar, Abbas
Lemos, Pablo
Carenini, Gaia
Mapelli, Michela
Hezaveh, Yashar
description Definitive evidence that globular clusters (GCs) host intermediate-mass black holes (IMBHs) is elusive. Machine-learning (ML) models trained on GC simulations can in principle predict IMBH host candidates based on observable features. This approach has two limitations: first, an accurate ML model is expected to be a black box due to complexity; second, despite our efforts to simulate GCs realistically, the simulation physics or initial conditions may fail to reflect reality fully. Therefore our training data may be biased, leading to a failure in generalization to observational data. Both the first issue—explainability/interpretability—and the second—out of distribution generalization and fairness—are active areas of research in ML. Here we employ techniques from these fields to address them: we use the anchors method to explain an Extreme Gradient Boosting (XGBoost) classifier; we also independently train a natively interpretable model using Certifiably Optimal RulE ListS (CORELS). The resulting model has a clear physical meaning, but loses some performance with respect to XGBoost. We evaluate potential candidates in real data based not only on classifier predictions but also on their similarity to the training data, measured by the likelihood of a kernel density estimation model. This measures the realism of our simulated data and mitigates the risk that our models may produce biased predictions by working in extrapolation. We apply our classifiers to real GCs, obtaining a predicted classification, a measure of the confidence of the prediction, an out-of-distribution flag, a local rule explaining the prediction of XGBoost, and a global rule from CORELS.
doi_str_mv 10.3847/1538-4357/ad2261
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_crossref_primary_10_3847_1538_4357_ad2261</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_73f2a60e17a546a780ac2b84f071c5e0</doaj_id><sourcerecordid>3035157681</sourcerecordid><originalsourceid>FETCH-LOGICAL-c398t-1eb62a942a67d26324e6cf90945d40c9afb6c9f261616b9ba848a0f6e56997843</originalsourceid><addsrcrecordid>eNp9kM1P3DAQxS1UJLbAnWOk9tiAHX8fW1TYlRZxAYmbNXHGNNtsnNrhwH_fhCC4IOSDPaPfvDd-hJwxes6N0BdMclMKLvUFNFWl2AFZvbW-kBWlVJSK64cj8jXn3VxW1q7IetOPmIaEI9QdFjfg_7Q9FluE1Lf9YxFiKq7avpnfL-gemxZGLPeQc_GrA_-3WMcO8wk5DNBlPH29j8n91e-7y3W5vb3eXP7clp5bM5YMa1WBFRUo3VSKVwKVD5ZaIRtBvYVQK2_D9IHp1LYGIwzQoFAqa7UR_JhsFt0mws4Nqd1DenYRWvfSiOnRQRpb36HTPEw2FJkGKRRoQ8FXtRGBauYl0knr26I1pPjvCfPodvEp9dP6jlMumdTKsImiC-VTzDlheHNl1M3ZuzloNwftluynkR_LSBuHd81P8O8f4DDsnFXSMWesG5rA_wOxio9j</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3035157681</pqid></control><display><type>article</type><title>Interpretable Machine Learning for Finding Intermediate-mass Black Holes</title><source>EZB-FREE-00999 freely available EZB journals</source><creator>Pasquato, Mario ; Trevisan, Piero ; Askar, Abbas ; Lemos, Pablo ; Carenini, Gaia ; Mapelli, Michela ; Hezaveh, Yashar</creator><creatorcontrib>Pasquato, Mario ; Trevisan, Piero ; Askar, Abbas ; Lemos, Pablo ; Carenini, Gaia ; Mapelli, Michela ; Hezaveh, Yashar</creatorcontrib><description>Definitive evidence that globular clusters (GCs) host intermediate-mass black holes (IMBHs) is elusive. Machine-learning (ML) models trained on GC simulations can in principle predict IMBH host candidates based on observable features. This approach has two limitations: first, an accurate ML model is expected to be a black box due to complexity; second, despite our efforts to simulate GCs realistically, the simulation physics or initial conditions may fail to reflect reality fully. Therefore our training data may be biased, leading to a failure in generalization to observational data. Both the first issue—explainability/interpretability—and the second—out of distribution generalization and fairness—are active areas of research in ML. Here we employ techniques from these fields to address them: we use the anchors method to explain an Extreme Gradient Boosting (XGBoost) classifier; we also independently train a natively interpretable model using Certifiably Optimal RulE ListS (CORELS). The resulting model has a clear physical meaning, but loses some performance with respect to XGBoost. We evaluate potential candidates in real data based not only on classifier predictions but also on their similarity to the training data, measured by the likelihood of a kernel density estimation model. This measures the realism of our simulated data and mitigates the risk that our models may produce biased predictions by working in extrapolation. We apply our classifiers to real GCs, obtaining a predicted classification, a measure of the confidence of the prediction, an out-of-distribution flag, a local rule explaining the prediction of XGBoost, and a global rule from CORELS.</description><identifier>ISSN: 0004-637X</identifier><identifier>EISSN: 1538-4357</identifier><identifier>DOI: 10.3847/1538-4357/ad2261</identifier><language>eng</language><publisher>Philadelphia: The American Astronomical Society</publisher><subject>Astrophysical black holes ; Black holes ; Classifiers ; Globular clusters ; Initial conditions ; Intermediate-mass black holes ; Machine learning ; Physics ; Predictions ; Risk reduction ; Simulation ; Training</subject><ispartof>The Astrophysical journal, 2024-04, Vol.965 (1), p.89</ispartof><rights>2024. The Author(s). Published by the American Astronomical Society.</rights><rights>2024. The Author(s). Published by the American Astronomical Society. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c398t-1eb62a942a67d26324e6cf90945d40c9afb6c9f261616b9ba848a0f6e56997843</cites><orcidid>0000-0001-9511-4649 ; 0000-0001-9688-3458 ; 0000-0001-8799-2548 ; 0000-0003-3784-5245 ; 0000-0002-4728-8473 ; 0000-0002-8669-5733</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Pasquato, Mario</creatorcontrib><creatorcontrib>Trevisan, Piero</creatorcontrib><creatorcontrib>Askar, Abbas</creatorcontrib><creatorcontrib>Lemos, Pablo</creatorcontrib><creatorcontrib>Carenini, Gaia</creatorcontrib><creatorcontrib>Mapelli, Michela</creatorcontrib><creatorcontrib>Hezaveh, Yashar</creatorcontrib><title>Interpretable Machine Learning for Finding Intermediate-mass Black Holes</title><title>The Astrophysical journal</title><addtitle>APJ</addtitle><addtitle>Astrophys. J</addtitle><description>Definitive evidence that globular clusters (GCs) host intermediate-mass black holes (IMBHs) is elusive. Machine-learning (ML) models trained on GC simulations can in principle predict IMBH host candidates based on observable features. This approach has two limitations: first, an accurate ML model is expected to be a black box due to complexity; second, despite our efforts to simulate GCs realistically, the simulation physics or initial conditions may fail to reflect reality fully. Therefore our training data may be biased, leading to a failure in generalization to observational data. Both the first issue—explainability/interpretability—and the second—out of distribution generalization and fairness—are active areas of research in ML. Here we employ techniques from these fields to address them: we use the anchors method to explain an Extreme Gradient Boosting (XGBoost) classifier; we also independently train a natively interpretable model using Certifiably Optimal RulE ListS (CORELS). The resulting model has a clear physical meaning, but loses some performance with respect to XGBoost. We evaluate potential candidates in real data based not only on classifier predictions but also on their similarity to the training data, measured by the likelihood of a kernel density estimation model. This measures the realism of our simulated data and mitigates the risk that our models may produce biased predictions by working in extrapolation. We apply our classifiers to real GCs, obtaining a predicted classification, a measure of the confidence of the prediction, an out-of-distribution flag, a local rule explaining the prediction of XGBoost, and a global rule from CORELS.</description><subject>Astrophysical black holes</subject><subject>Black holes</subject><subject>Classifiers</subject><subject>Globular clusters</subject><subject>Initial conditions</subject><subject>Intermediate-mass black holes</subject><subject>Machine learning</subject><subject>Physics</subject><subject>Predictions</subject><subject>Risk reduction</subject><subject>Simulation</subject><subject>Training</subject><issn>0004-637X</issn><issn>1538-4357</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>DOA</sourceid><recordid>eNp9kM1P3DAQxS1UJLbAnWOk9tiAHX8fW1TYlRZxAYmbNXHGNNtsnNrhwH_fhCC4IOSDPaPfvDd-hJwxes6N0BdMclMKLvUFNFWl2AFZvbW-kBWlVJSK64cj8jXn3VxW1q7IetOPmIaEI9QdFjfg_7Q9FluE1Lf9YxFiKq7avpnfL-gemxZGLPeQc_GrA_-3WMcO8wk5DNBlPH29j8n91e-7y3W5vb3eXP7clp5bM5YMa1WBFRUo3VSKVwKVD5ZaIRtBvYVQK2_D9IHp1LYGIwzQoFAqa7UR_JhsFt0mws4Nqd1DenYRWvfSiOnRQRpb36HTPEw2FJkGKRRoQ8FXtRGBauYl0knr26I1pPjvCfPodvEp9dP6jlMumdTKsImiC-VTzDlheHNl1M3ZuzloNwftluynkR_LSBuHd81P8O8f4DDsnFXSMWesG5rA_wOxio9j</recordid><startdate>20240401</startdate><enddate>20240401</enddate><creator>Pasquato, Mario</creator><creator>Trevisan, Piero</creator><creator>Askar, Abbas</creator><creator>Lemos, Pablo</creator><creator>Carenini, Gaia</creator><creator>Mapelli, Michela</creator><creator>Hezaveh, Yashar</creator><general>The American Astronomical Society</general><general>IOP Publishing</general><scope>O3W</scope><scope>TSCCA</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7TG</scope><scope>8FD</scope><scope>H8D</scope><scope>KL.</scope><scope>L7M</scope><scope>DOA</scope><orcidid>https://orcid.org/0000-0001-9511-4649</orcidid><orcidid>https://orcid.org/0000-0001-9688-3458</orcidid><orcidid>https://orcid.org/0000-0001-8799-2548</orcidid><orcidid>https://orcid.org/0000-0003-3784-5245</orcidid><orcidid>https://orcid.org/0000-0002-4728-8473</orcidid><orcidid>https://orcid.org/0000-0002-8669-5733</orcidid></search><sort><creationdate>20240401</creationdate><title>Interpretable Machine Learning for Finding Intermediate-mass Black Holes</title><author>Pasquato, Mario ; Trevisan, Piero ; Askar, Abbas ; Lemos, Pablo ; Carenini, Gaia ; Mapelli, Michela ; Hezaveh, Yashar</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c398t-1eb62a942a67d26324e6cf90945d40c9afb6c9f261616b9ba848a0f6e56997843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Astrophysical black holes</topic><topic>Black holes</topic><topic>Classifiers</topic><topic>Globular clusters</topic><topic>Initial conditions</topic><topic>Intermediate-mass black holes</topic><topic>Machine learning</topic><topic>Physics</topic><topic>Predictions</topic><topic>Risk reduction</topic><topic>Simulation</topic><topic>Training</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pasquato, Mario</creatorcontrib><creatorcontrib>Trevisan, Piero</creatorcontrib><creatorcontrib>Askar, Abbas</creatorcontrib><creatorcontrib>Lemos, Pablo</creatorcontrib><creatorcontrib>Carenini, Gaia</creatorcontrib><creatorcontrib>Mapelli, Michela</creatorcontrib><creatorcontrib>Hezaveh, Yashar</creatorcontrib><collection>Institute of Physics Open Access Journal Titles</collection><collection>IOPscience (Open Access)</collection><collection>CrossRef</collection><collection>Meteorological &amp; Geoastrophysical Abstracts</collection><collection>Technology Research Database</collection><collection>Aerospace Database</collection><collection>Meteorological &amp; Geoastrophysical Abstracts - Academic</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>The Astrophysical journal</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Pasquato, Mario</au><au>Trevisan, Piero</au><au>Askar, Abbas</au><au>Lemos, Pablo</au><au>Carenini, Gaia</au><au>Mapelli, Michela</au><au>Hezaveh, Yashar</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Interpretable Machine Learning for Finding Intermediate-mass Black Holes</atitle><jtitle>The Astrophysical journal</jtitle><stitle>APJ</stitle><addtitle>Astrophys. J</addtitle><date>2024-04-01</date><risdate>2024</risdate><volume>965</volume><issue>1</issue><spage>89</spage><pages>89-</pages><issn>0004-637X</issn><eissn>1538-4357</eissn><abstract>Definitive evidence that globular clusters (GCs) host intermediate-mass black holes (IMBHs) is elusive. Machine-learning (ML) models trained on GC simulations can in principle predict IMBH host candidates based on observable features. This approach has two limitations: first, an accurate ML model is expected to be a black box due to complexity; second, despite our efforts to simulate GCs realistically, the simulation physics or initial conditions may fail to reflect reality fully. Therefore our training data may be biased, leading to a failure in generalization to observational data. Both the first issue—explainability/interpretability—and the second—out of distribution generalization and fairness—are active areas of research in ML. Here we employ techniques from these fields to address them: we use the anchors method to explain an Extreme Gradient Boosting (XGBoost) classifier; we also independently train a natively interpretable model using Certifiably Optimal RulE ListS (CORELS). The resulting model has a clear physical meaning, but loses some performance with respect to XGBoost. We evaluate potential candidates in real data based not only on classifier predictions but also on their similarity to the training data, measured by the likelihood of a kernel density estimation model. This measures the realism of our simulated data and mitigates the risk that our models may produce biased predictions by working in extrapolation. We apply our classifiers to real GCs, obtaining a predicted classification, a measure of the confidence of the prediction, an out-of-distribution flag, a local rule explaining the prediction of XGBoost, and a global rule from CORELS.</abstract><cop>Philadelphia</cop><pub>The American Astronomical Society</pub><doi>10.3847/1538-4357/ad2261</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0001-9511-4649</orcidid><orcidid>https://orcid.org/0000-0001-9688-3458</orcidid><orcidid>https://orcid.org/0000-0001-8799-2548</orcidid><orcidid>https://orcid.org/0000-0003-3784-5245</orcidid><orcidid>https://orcid.org/0000-0002-4728-8473</orcidid><orcidid>https://orcid.org/0000-0002-8669-5733</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0004-637X
ispartof The Astrophysical journal, 2024-04, Vol.965 (1), p.89
issn 0004-637X
1538-4357
language eng
recordid cdi_crossref_primary_10_3847_1538_4357_ad2261
source EZB-FREE-00999 freely available EZB journals
subjects Astrophysical black holes
Black holes
Classifiers
Globular clusters
Initial conditions
Intermediate-mass black holes
Machine learning
Physics
Predictions
Risk reduction
Simulation
Training
title Interpretable Machine Learning for Finding Intermediate-mass Black Holes
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-26T21%3A17%3A18IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Interpretable%20Machine%20Learning%20for%20Finding%20Intermediate-mass%20Black%20Holes&rft.jtitle=The%20Astrophysical%20journal&rft.au=Pasquato,%20Mario&rft.date=2024-04-01&rft.volume=965&rft.issue=1&rft.spage=89&rft.pages=89-&rft.issn=0004-637X&rft.eissn=1538-4357&rft_id=info:doi/10.3847/1538-4357/ad2261&rft_dat=%3Cproquest_cross%3E3035157681%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c398t-1eb62a942a67d26324e6cf90945d40c9afb6c9f261616b9ba848a0f6e56997843%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=3035157681&rft_id=info:pmid/&rfr_iscdi=true