Loading…

Comparison of strategies for scalable causal discovery of latent variable models from mixed data

Modern technologies allow large, complex biomedical datasets to be collected from patient cohorts. These datasets are comprised of both continuous and categorical data (“Mixed Data”), and essential variables may be unobserved in this data due to the complex nature of biomedical phenomena. Causal inf...

Full description

Saved in:
Bibliographic Details
Published in:International journal of data science and analytics 2018, Vol.6 (1), p.33-45
Main Authors: Raghu, Vineet K., Ramsey, Joseph D., Morris, Alison, Manatakis, Dimitrios V., Sprites, Peter, Chrysanthis, Panos K., Glymour, Clark, Benos, Panayiotis V.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c442t-493473a5b3998a1f1b380be6e8110a906c4fc93045449279d11c87cd8708a55e3
cites cdi_FETCH-LOGICAL-c442t-493473a5b3998a1f1b380be6e8110a906c4fc93045449279d11c87cd8708a55e3
container_end_page 45
container_issue 1
container_start_page 33
container_title International journal of data science and analytics
container_volume 6
creator Raghu, Vineet K.
Ramsey, Joseph D.
Morris, Alison
Manatakis, Dimitrios V.
Sprites, Peter
Chrysanthis, Panos K.
Glymour, Clark
Benos, Panayiotis V.
description Modern technologies allow large, complex biomedical datasets to be collected from patient cohorts. These datasets are comprised of both continuous and categorical data (“Mixed Data”), and essential variables may be unobserved in this data due to the complex nature of biomedical phenomena. Causal inference algorithms can identify important relationships from biomedical data; however, handling the challenges of causal inference over mixed data with unmeasured confounders in a scalable way is still an open problem. Despite recent advances into causal discovery strategies that could potentially handle these challenges; individually, no study currently exists that comprehensively compares these approaches in this setting. In this paper, we present a comparative study that addresses this problem by comparing the accuracy and efficiency of different strategies in large, mixed datasets with latent confounders. We experiment with two extensions of the Fast Causal Inference algorithm: a maximum probability search procedure we recently developed to identify causal orientations more accurately, and a strategy which quickly eliminates unlikely adjacencies in order to achieve scalability to high-dimensional data. We demonstrate that these methods significantly outperform the state of the art in the field by achieving both accurate edge orientations and tractable running time in simulation experiments on datasets with up to 500 variables. Finally, we demonstrate the usability of the best performing approach on real data by applying it to a biomedical dataset of HIV-infected individuals.
doi_str_mv 10.1007/s41060-018-0104-3
format article
fullrecord <record><control><sourceid>pubmedcentral_cross</sourceid><recordid>TN_cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_6096780</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>pubmedcentral_primary_oai_pubmedcentral_nih_gov_6096780</sourcerecordid><originalsourceid>FETCH-LOGICAL-c442t-493473a5b3998a1f1b380be6e8110a906c4fc93045449279d11c87cd8708a55e3</originalsourceid><addsrcrecordid>eNp9kM9KAzEQxoMotmgfwFteYHWyyWaTiyDFf1DwouAtzmazdcvupiTbYt_e1ErBi4dhBub7fcx8hFwxuGYA5U0UDCRkwFQqEBk_IdOcS5EJJtXpcS7eJ2QW4woAWCl5IdU5mXBgQuWQT8nH3PdrDG30A_UNjWPA0S1bF2njA40WO6w6Ry1uIna0bqP1Wxd2e22XlMNIt4n-0fS-dl3igu9p3365mtY44iU5a7CLbvbbL8jbw_3r_ClbvDw-z-8WmRUiHzOhuSg5FhXXWiFrWMUVVE46xRigBmlFYzUHUQih81LXjFlV2lqVoLAoHL8gtwff9abqXW3TaQE7sw5tj2FnPLbm72ZoP83Sb40ELUsFyYAdDGzwMQbXHFkGZp-4OSRuUuJmn7jhickPTEzaYemCWflNGNKf_0Df85WDWQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Comparison of strategies for scalable causal discovery of latent variable models from mixed data</title><source>Springer Link</source><creator>Raghu, Vineet K. ; Ramsey, Joseph D. ; Morris, Alison ; Manatakis, Dimitrios V. ; Sprites, Peter ; Chrysanthis, Panos K. ; Glymour, Clark ; Benos, Panayiotis V.</creator><creatorcontrib>Raghu, Vineet K. ; Ramsey, Joseph D. ; Morris, Alison ; Manatakis, Dimitrios V. ; Sprites, Peter ; Chrysanthis, Panos K. ; Glymour, Clark ; Benos, Panayiotis V.</creatorcontrib><description>Modern technologies allow large, complex biomedical datasets to be collected from patient cohorts. These datasets are comprised of both continuous and categorical data (“Mixed Data”), and essential variables may be unobserved in this data due to the complex nature of biomedical phenomena. Causal inference algorithms can identify important relationships from biomedical data; however, handling the challenges of causal inference over mixed data with unmeasured confounders in a scalable way is still an open problem. Despite recent advances into causal discovery strategies that could potentially handle these challenges; individually, no study currently exists that comprehensively compares these approaches in this setting. In this paper, we present a comparative study that addresses this problem by comparing the accuracy and efficiency of different strategies in large, mixed datasets with latent confounders. We experiment with two extensions of the Fast Causal Inference algorithm: a maximum probability search procedure we recently developed to identify causal orientations more accurately, and a strategy which quickly eliminates unlikely adjacencies in order to achieve scalability to high-dimensional data. We demonstrate that these methods significantly outperform the state of the art in the field by achieving both accurate edge orientations and tractable running time in simulation experiments on datasets with up to 500 variables. Finally, we demonstrate the usability of the best performing approach on real data by applying it to a biomedical dataset of HIV-infected individuals.</description><identifier>ISSN: 2364-415X</identifier><identifier>EISSN: 2364-4168</identifier><identifier>DOI: 10.1007/s41060-018-0104-3</identifier><identifier>PMID: 30148202</identifier><language>eng</language><publisher>Cham: Springer International Publishing</publisher><subject>Artificial Intelligence ; Business Information Systems ; Computational Biology/Bioinformatics ; Computer Science ; Data Mining and Knowledge Discovery ; Database Management ; Regular Paper</subject><ispartof>International journal of data science and analytics, 2018, Vol.6 (1), p.33-45</ispartof><rights>The Author(s) 2018</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c442t-493473a5b3998a1f1b380be6e8110a906c4fc93045449279d11c87cd8708a55e3</citedby><cites>FETCH-LOGICAL-c442t-493473a5b3998a1f1b380be6e8110a906c4fc93045449279d11c87cd8708a55e3</cites><orcidid>0000-0003-3524-3945</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>230,314,780,784,885,27922,27923</link.rule.ids></links><search><creatorcontrib>Raghu, Vineet K.</creatorcontrib><creatorcontrib>Ramsey, Joseph D.</creatorcontrib><creatorcontrib>Morris, Alison</creatorcontrib><creatorcontrib>Manatakis, Dimitrios V.</creatorcontrib><creatorcontrib>Sprites, Peter</creatorcontrib><creatorcontrib>Chrysanthis, Panos K.</creatorcontrib><creatorcontrib>Glymour, Clark</creatorcontrib><creatorcontrib>Benos, Panayiotis V.</creatorcontrib><title>Comparison of strategies for scalable causal discovery of latent variable models from mixed data</title><title>International journal of data science and analytics</title><addtitle>Int J Data Sci Anal</addtitle><description>Modern technologies allow large, complex biomedical datasets to be collected from patient cohorts. These datasets are comprised of both continuous and categorical data (“Mixed Data”), and essential variables may be unobserved in this data due to the complex nature of biomedical phenomena. Causal inference algorithms can identify important relationships from biomedical data; however, handling the challenges of causal inference over mixed data with unmeasured confounders in a scalable way is still an open problem. Despite recent advances into causal discovery strategies that could potentially handle these challenges; individually, no study currently exists that comprehensively compares these approaches in this setting. In this paper, we present a comparative study that addresses this problem by comparing the accuracy and efficiency of different strategies in large, mixed datasets with latent confounders. We experiment with two extensions of the Fast Causal Inference algorithm: a maximum probability search procedure we recently developed to identify causal orientations more accurately, and a strategy which quickly eliminates unlikely adjacencies in order to achieve scalability to high-dimensional data. We demonstrate that these methods significantly outperform the state of the art in the field by achieving both accurate edge orientations and tractable running time in simulation experiments on datasets with up to 500 variables. Finally, we demonstrate the usability of the best performing approach on real data by applying it to a biomedical dataset of HIV-infected individuals.</description><subject>Artificial Intelligence</subject><subject>Business Information Systems</subject><subject>Computational Biology/Bioinformatics</subject><subject>Computer Science</subject><subject>Data Mining and Knowledge Discovery</subject><subject>Database Management</subject><subject>Regular Paper</subject><issn>2364-415X</issn><issn>2364-4168</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNp9kM9KAzEQxoMotmgfwFteYHWyyWaTiyDFf1DwouAtzmazdcvupiTbYt_e1ErBi4dhBub7fcx8hFwxuGYA5U0UDCRkwFQqEBk_IdOcS5EJJtXpcS7eJ2QW4woAWCl5IdU5mXBgQuWQT8nH3PdrDG30A_UNjWPA0S1bF2njA40WO6w6Ry1uIna0bqP1Wxd2e22XlMNIt4n-0fS-dl3igu9p3365mtY44iU5a7CLbvbbL8jbw_3r_ClbvDw-z-8WmRUiHzOhuSg5FhXXWiFrWMUVVE46xRigBmlFYzUHUQih81LXjFlV2lqVoLAoHL8gtwff9abqXW3TaQE7sw5tj2FnPLbm72ZoP83Sb40ELUsFyYAdDGzwMQbXHFkGZp-4OSRuUuJmn7jhickPTEzaYemCWflNGNKf_0Df85WDWQ</recordid><startdate>2018</startdate><enddate>2018</enddate><creator>Raghu, Vineet K.</creator><creator>Ramsey, Joseph D.</creator><creator>Morris, Alison</creator><creator>Manatakis, Dimitrios V.</creator><creator>Sprites, Peter</creator><creator>Chrysanthis, Panos K.</creator><creator>Glymour, Clark</creator><creator>Benos, Panayiotis V.</creator><general>Springer International Publishing</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>5PM</scope><orcidid>https://orcid.org/0000-0003-3524-3945</orcidid></search><sort><creationdate>2018</creationdate><title>Comparison of strategies for scalable causal discovery of latent variable models from mixed data</title><author>Raghu, Vineet K. ; Ramsey, Joseph D. ; Morris, Alison ; Manatakis, Dimitrios V. ; Sprites, Peter ; Chrysanthis, Panos K. ; Glymour, Clark ; Benos, Panayiotis V.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c442t-493473a5b3998a1f1b380be6e8110a906c4fc93045449279d11c87cd8708a55e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Artificial Intelligence</topic><topic>Business Information Systems</topic><topic>Computational Biology/Bioinformatics</topic><topic>Computer Science</topic><topic>Data Mining and Knowledge Discovery</topic><topic>Database Management</topic><topic>Regular Paper</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Raghu, Vineet K.</creatorcontrib><creatorcontrib>Ramsey, Joseph D.</creatorcontrib><creatorcontrib>Morris, Alison</creatorcontrib><creatorcontrib>Manatakis, Dimitrios V.</creatorcontrib><creatorcontrib>Sprites, Peter</creatorcontrib><creatorcontrib>Chrysanthis, Panos K.</creatorcontrib><creatorcontrib>Glymour, Clark</creatorcontrib><creatorcontrib>Benos, Panayiotis V.</creatorcontrib><collection>Springer_OA刊</collection><collection>CrossRef</collection><collection>PubMed Central (Full Participant titles)</collection><jtitle>International journal of data science and analytics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Raghu, Vineet K.</au><au>Ramsey, Joseph D.</au><au>Morris, Alison</au><au>Manatakis, Dimitrios V.</au><au>Sprites, Peter</au><au>Chrysanthis, Panos K.</au><au>Glymour, Clark</au><au>Benos, Panayiotis V.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Comparison of strategies for scalable causal discovery of latent variable models from mixed data</atitle><jtitle>International journal of data science and analytics</jtitle><stitle>Int J Data Sci Anal</stitle><date>2018</date><risdate>2018</risdate><volume>6</volume><issue>1</issue><spage>33</spage><epage>45</epage><pages>33-45</pages><issn>2364-415X</issn><eissn>2364-4168</eissn><abstract>Modern technologies allow large, complex biomedical datasets to be collected from patient cohorts. These datasets are comprised of both continuous and categorical data (“Mixed Data”), and essential variables may be unobserved in this data due to the complex nature of biomedical phenomena. Causal inference algorithms can identify important relationships from biomedical data; however, handling the challenges of causal inference over mixed data with unmeasured confounders in a scalable way is still an open problem. Despite recent advances into causal discovery strategies that could potentially handle these challenges; individually, no study currently exists that comprehensively compares these approaches in this setting. In this paper, we present a comparative study that addresses this problem by comparing the accuracy and efficiency of different strategies in large, mixed datasets with latent confounders. We experiment with two extensions of the Fast Causal Inference algorithm: a maximum probability search procedure we recently developed to identify causal orientations more accurately, and a strategy which quickly eliminates unlikely adjacencies in order to achieve scalability to high-dimensional data. We demonstrate that these methods significantly outperform the state of the art in the field by achieving both accurate edge orientations and tractable running time in simulation experiments on datasets with up to 500 variables. Finally, we demonstrate the usability of the best performing approach on real data by applying it to a biomedical dataset of HIV-infected individuals.</abstract><cop>Cham</cop><pub>Springer International Publishing</pub><pmid>30148202</pmid><doi>10.1007/s41060-018-0104-3</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0003-3524-3945</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 2364-415X
ispartof International journal of data science and analytics, 2018, Vol.6 (1), p.33-45
issn 2364-415X
2364-4168
language eng
recordid cdi_pubmedcentral_primary_oai_pubmedcentral_nih_gov_6096780
source Springer Link
subjects Artificial Intelligence
Business Information Systems
Computational Biology/Bioinformatics
Computer Science
Data Mining and Knowledge Discovery
Database Management
Regular Paper
title Comparison of strategies for scalable causal discovery of latent variable models from mixed data
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T13%3A30%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-pubmedcentral_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Comparison%20of%20strategies%20for%20scalable%20causal%20discovery%20of%20latent%20variable%20models%20from%20mixed%20data&rft.jtitle=International%20journal%20of%20data%20science%20and%20analytics&rft.au=Raghu,%20Vineet%20K.&rft.date=2018&rft.volume=6&rft.issue=1&rft.spage=33&rft.epage=45&rft.pages=33-45&rft.issn=2364-415X&rft.eissn=2364-4168&rft_id=info:doi/10.1007/s41060-018-0104-3&rft_dat=%3Cpubmedcentral_cross%3Epubmedcentral_primary_oai_pubmedcentral_nih_gov_6096780%3C/pubmedcentral_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c442t-493473a5b3998a1f1b380be6e8110a906c4fc93045449279d11c87cd8708a55e3%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/30148202&rfr_iscdi=true