Loading…

Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package

It is well known in the literature that the problem of learning the structure of Bayesian networks is very hard to tackle: Its computational complexity is super-exponential in the number of nodes in the worst case and polynomial in most real-world scenarios. Efficient implementations of score-based...

Full description

Saved in:
Bibliographic Details
Published in:Journal of statistical software 2017-03, Vol.77 (2), p.1-20
Main Author: Scutari, Marco
Format: Article
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c303t-eb7e7fef34a512b89736ddf4f314572a018e43f5198fa0f5a98799aa49ac18223
cites
container_end_page 20
container_issue 2
container_start_page 1
container_title Journal of statistical software
container_volume 77
creator Scutari, Marco
description It is well known in the literature that the problem of learning the structure of Bayesian networks is very hard to tackle: Its computational complexity is super-exponential in the number of nodes in the worst case and polynomial in most real-world scenarios. Efficient implementations of score-based structure learning benefit from past and current research in optimization theory, which can be adapted to the task by using the network score as the objective function to maximize. This is not true for approaches based on conditional independence tests, called constraint-based learning algorithms. The only optimization in widespread use, backtracking, leverages the symmetries implied by the definitions of neighborhood and Markov blanket. In this paper we illustrate how backtracking is implemented in recent versions of the bnlearn R package, and how it degrades the stability of Bayesian network structure learning for little gain in terms of speed. As an alternative, we describe a software architecture and framework that can be used to parallelize constraint-based structure learning algorithms (also implemented in bnlearn) and we demonstrate its performance using four reference networks and two real-world data sets from genetics and systems biology. We show that on modern multi-core or multiprocessor hardware parallel implementations are preferable over backtracking, which was developed when single-processor machines were the norm.
doi_str_mv 10.18637/jss.v077.i02
format article
fullrecord <record><control><sourceid>doaj_cross</sourceid><recordid>TN_cdi_doaj_primary_oai_doaj_org_article_d9cadf69eb4444f2a29048f8edb92ca0</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><doaj_id>oai_doaj_org_article_d9cadf69eb4444f2a29048f8edb92ca0</doaj_id><sourcerecordid>oai_doaj_org_article_d9cadf69eb4444f2a29048f8edb92ca0</sourcerecordid><originalsourceid>FETCH-LOGICAL-c303t-eb7e7fef34a512b89736ddf4f314572a018e43f5198fa0f5a98799aa49ac18223</originalsourceid><addsrcrecordid>eNpNkctO5DAQRSMEEs8le_9AGr8S2-ygBUxLLUA81lYlKTduEqdlGxCs5tMJMBpxN3VV0j2bUxTHjM6YroU6Wac0e6VKzTzlW8Ueq6QuVV3T7V99t9hPaU0pp9JUe8Xfc3jH5CGQa8xvY3wm8zGkHMGHXJ5Dwo7c5_jS5peIZIkQgw8rctavxujz05BOyS1E6HvsCYSO3GyyH_zHtFoMmx4HDBmyn4jEB5KfkDSh_4KQu2nXPsMKD4sdB33Co3_3oHi8vHiY_ymXN1eL-dmybAUVucRGoXLohISK8UYbJequc9IJJivFgTKNUriKGe2AugqMVsYASAMt05yLg2Lxw-1GWNtN9APEdzuCt9-PMa4sxOzbHm1nWuhcbbCRUxwHbqjUTmPXGN4CnVjlD6uNY0oR3X8eo_ZbhZ1U2C8VdlIhPgGHgYA8</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package</title><source>DOAJ Directory of Open Access Journals</source><creator>Scutari, Marco</creator><creatorcontrib>Scutari, Marco</creatorcontrib><description>It is well known in the literature that the problem of learning the structure of Bayesian networks is very hard to tackle: Its computational complexity is super-exponential in the number of nodes in the worst case and polynomial in most real-world scenarios. Efficient implementations of score-based structure learning benefit from past and current research in optimization theory, which can be adapted to the task by using the network score as the objective function to maximize. This is not true for approaches based on conditional independence tests, called constraint-based learning algorithms. The only optimization in widespread use, backtracking, leverages the symmetries implied by the definitions of neighborhood and Markov blanket. In this paper we illustrate how backtracking is implemented in recent versions of the bnlearn R package, and how it degrades the stability of Bayesian network structure learning for little gain in terms of speed. As an alternative, we describe a software architecture and framework that can be used to parallelize constraint-based structure learning algorithms (also implemented in bnlearn) and we demonstrate its performance using four reference networks and two real-world data sets from genetics and systems biology. We show that on modern multi-core or multiprocessor hardware parallel implementations are preferable over backtracking, which was developed when single-processor machines were the norm.</description><identifier>ISSN: 1548-7660</identifier><identifier>EISSN: 1548-7660</identifier><identifier>DOI: 10.18637/jss.v077.i02</identifier><language>eng</language><publisher>Foundation for Open Access Statistics</publisher><subject>Bayesian networks ; parallel programming ; structure learning</subject><ispartof>Journal of statistical software, 2017-03, Vol.77 (2), p.1-20</ispartof><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c303t-eb7e7fef34a512b89736ddf4f314572a018e43f5198fa0f5a98799aa49ac18223</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,864,2102,27924,27925</link.rule.ids></links><search><creatorcontrib>Scutari, Marco</creatorcontrib><title>Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package</title><title>Journal of statistical software</title><description>It is well known in the literature that the problem of learning the structure of Bayesian networks is very hard to tackle: Its computational complexity is super-exponential in the number of nodes in the worst case and polynomial in most real-world scenarios. Efficient implementations of score-based structure learning benefit from past and current research in optimization theory, which can be adapted to the task by using the network score as the objective function to maximize. This is not true for approaches based on conditional independence tests, called constraint-based learning algorithms. The only optimization in widespread use, backtracking, leverages the symmetries implied by the definitions of neighborhood and Markov blanket. In this paper we illustrate how backtracking is implemented in recent versions of the bnlearn R package, and how it degrades the stability of Bayesian network structure learning for little gain in terms of speed. As an alternative, we describe a software architecture and framework that can be used to parallelize constraint-based structure learning algorithms (also implemented in bnlearn) and we demonstrate its performance using four reference networks and two real-world data sets from genetics and systems biology. We show that on modern multi-core or multiprocessor hardware parallel implementations are preferable over backtracking, which was developed when single-processor machines were the norm.</description><subject>Bayesian networks</subject><subject>parallel programming</subject><subject>structure learning</subject><issn>1548-7660</issn><issn>1548-7660</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>DOA</sourceid><recordid>eNpNkctO5DAQRSMEEs8le_9AGr8S2-ygBUxLLUA81lYlKTduEqdlGxCs5tMJMBpxN3VV0j2bUxTHjM6YroU6Wac0e6VKzTzlW8Ueq6QuVV3T7V99t9hPaU0pp9JUe8Xfc3jH5CGQa8xvY3wm8zGkHMGHXJ5Dwo7c5_jS5peIZIkQgw8rctavxujz05BOyS1E6HvsCYSO3GyyH_zHtFoMmx4HDBmyn4jEB5KfkDSh_4KQu2nXPsMKD4sdB33Co3_3oHi8vHiY_ymXN1eL-dmybAUVucRGoXLohISK8UYbJequc9IJJivFgTKNUriKGe2AugqMVsYASAMt05yLg2Lxw-1GWNtN9APEdzuCt9-PMa4sxOzbHm1nWuhcbbCRUxwHbqjUTmPXGN4CnVjlD6uNY0oR3X8eo_ZbhZ1U2C8VdlIhPgGHgYA8</recordid><startdate>20170301</startdate><enddate>20170301</enddate><creator>Scutari, Marco</creator><general>Foundation for Open Access Statistics</general><scope>AAYXX</scope><scope>CITATION</scope><scope>DOA</scope></search><sort><creationdate>20170301</creationdate><title>Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package</title><author>Scutari, Marco</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c303t-eb7e7fef34a512b89736ddf4f314572a018e43f5198fa0f5a98799aa49ac18223</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Bayesian networks</topic><topic>parallel programming</topic><topic>structure learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Scutari, Marco</creatorcontrib><collection>CrossRef</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>Journal of statistical software</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Scutari, Marco</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package</atitle><jtitle>Journal of statistical software</jtitle><date>2017-03-01</date><risdate>2017</risdate><volume>77</volume><issue>2</issue><spage>1</spage><epage>20</epage><pages>1-20</pages><issn>1548-7660</issn><eissn>1548-7660</eissn><abstract>It is well known in the literature that the problem of learning the structure of Bayesian networks is very hard to tackle: Its computational complexity is super-exponential in the number of nodes in the worst case and polynomial in most real-world scenarios. Efficient implementations of score-based structure learning benefit from past and current research in optimization theory, which can be adapted to the task by using the network score as the objective function to maximize. This is not true for approaches based on conditional independence tests, called constraint-based learning algorithms. The only optimization in widespread use, backtracking, leverages the symmetries implied by the definitions of neighborhood and Markov blanket. In this paper we illustrate how backtracking is implemented in recent versions of the bnlearn R package, and how it degrades the stability of Bayesian network structure learning for little gain in terms of speed. As an alternative, we describe a software architecture and framework that can be used to parallelize constraint-based structure learning algorithms (also implemented in bnlearn) and we demonstrate its performance using four reference networks and two real-world data sets from genetics and systems biology. We show that on modern multi-core or multiprocessor hardware parallel implementations are preferable over backtracking, which was developed when single-processor machines were the norm.</abstract><pub>Foundation for Open Access Statistics</pub><doi>10.18637/jss.v077.i02</doi><tpages>20</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1548-7660
ispartof Journal of statistical software, 2017-03, Vol.77 (2), p.1-20
issn 1548-7660
1548-7660
language eng
recordid cdi_doaj_primary_oai_doaj_org_article_d9cadf69eb4444f2a29048f8edb92ca0
source DOAJ Directory of Open Access Journals
subjects Bayesian networks
parallel programming
structure learning
title Bayesian Network Constraint-Based Structure Learning Algorithms: Parallel and Optimized Implementations in the bnlearn R Package
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T07%3A59%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-doaj_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Bayesian%20Network%20Constraint-Based%20Structure%20Learning%20Algorithms:%20Parallel%20and%20Optimized%20Implementations%20in%20the%20bnlearn%20R%20Package&rft.jtitle=Journal%20of%20statistical%20software&rft.au=Scutari,%20Marco&rft.date=2017-03-01&rft.volume=77&rft.issue=2&rft.spage=1&rft.epage=20&rft.pages=1-20&rft.issn=1548-7660&rft.eissn=1548-7660&rft_id=info:doi/10.18637/jss.v077.i02&rft_dat=%3Cdoaj_cross%3Eoai_doaj_org_article_d9cadf69eb4444f2a29048f8edb92ca0%3C/doaj_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c303t-eb7e7fef34a512b89736ddf4f314572a018e43f5198fa0f5a98799aa49ac18223%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true