Loading…

Accurate Markov Boundary Discovery for Causal Feature Selection

Causal feature selection has achieved much attention in recent years, which discovers a Markov boundary (MB) of the class attribute. The MB of the class attribute implies local causal relations between the class attribute and the features, thus leading to more interpretable and robust prediction mod...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on cybernetics 2020-12, Vol.50 (12), p.4983-4996
Main Authors:	Wu, Xingyu, Jiang, Bingbing, Yu, Kui, Miao, chunyan, Chen, Huanhuan
Format:	Article
Language:	English
Subjects:	Accuracy Algorithms Bayes methods Bayesian analysis Bayesian network (BN) Benchmark testing causal feature selection Computer science Cybernetics Datasets Feature extraction Feature selection Markov boundary (MB) Markov processes PCMasking Pipeline design Prediction algorithms Prediction models
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

cited_by	cdi_FETCH-LOGICAL-c415t-2c966db8b92c52bc824fdbd37049974f7d111d946103e244d04e3d0de2ebf2253
cites	cdi_FETCH-LOGICAL-c415t-2c966db8b92c52bc824fdbd37049974f7d111d946103e244d04e3d0de2ebf2253
container_end_page	4996
container_issue	12
container_start_page	4983
container_title	IEEE transactions on cybernetics
container_volume	50
creator	Wu, Xingyu Jiang, Bingbing Yu, Kui Miao, chunyan Chen, Huanhuan
description	Causal feature selection has achieved much attention in recent years, which discovers a Markov boundary (MB) of the class attribute. The MB of the class attribute implies local causal relations between the class attribute and the features, thus leading to more interpretable and robust prediction models than the features selected by the traditional feature selection algorithms. Many causal feature selection methods have been proposed, and almost all of them employ conditional independence (CI) tests to identify MBs. However, many datasets from real-world applications may suffer from incorrect CI tests due to noise or small-sized samples, resulting in lower MB discovery accuracy for these existing algorithms. To tackle this issue, in this article, we first introduce a new concept of PCMasking to explain a type of incorrect CI tests in the MB discovery, then propose a cross-check and complement MB discovery (CCMB) algorithm to repair this type of incorrect CI tests for accurate MB discovery. To improve the efficiency of CCMB, we further design a pipeline machine-based CCMB (PM-CCMB) algorithm. Using benchmark Bayesian network datasets, the experiments demonstrate that both CCMB and PM-CCMB achieve significant improvements on the MB discovery accuracy compared with the existing methods, and PM-CCMB further improves the computational efficiency. The empirical study in the real-world datasets validates the effectiveness of CCMB and PM-CCMB against the state-of-the-art causal and traditional feature selection algorithms.
doi_str_mv	10.1109/TCYB.2019.2940509
format	article
fullrecord	<record><control><sourceid>proquest_pubme</sourceid><recordid>TN_cdi_pubmed_primary_31634853</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8877773</ieee_id><sourcerecordid>2468752830</sourcerecordid><originalsourceid>FETCH-LOGICAL-c415t-2c966db8b92c52bc824fdbd37049974f7d111d946103e244d04e3d0de2ebf2253</originalsourceid><addsrcrecordid>eNpdkE1LAzEQQIMottT-ABFkwYuX1nztJjlJW60KFQ_Wg6ewm8xC63ZTk92C_94srT04lxkyb4bJQ-iS4DEhWN0tZ5_TMcVEjaniOMXqBPUpyeSIUpGeHutM9NAwhDWOIeOTkueox0jGuExZH91PjGl93kDymvsvt0umrq1t7n-Sh1UwbgexKp1PZnkb8iqZQ960HpJ3qMA0K1dfoLMyrwIMD3mAPuaPy9nzaPH29DKbLEaGk7QZUaOyzBayUNSktDCS8tIWlgnMlRK8FJYQYhXPCGZAObeYA7PYAoWipDRlA3S737v17ruF0OhNvA-qKq_BtUFThoVgBKcdevMPXbvW1_E6TXkmRUolw5Eie8p4F4KHUm_9ahM_rgnWnWDdCdadYH0QHGeuD5vbYgP2OPGnMwJXe2AFAMe2lCIGY79Np3zI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2468752830</pqid></control><display><type>article</type><title>Accurate Markov Boundary Discovery for Causal Feature Selection</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Wu, Xingyu ; Jiang, Bingbing ; Yu, Kui ; Miao, chunyan ; Chen, Huanhuan</creator><creatorcontrib>Wu, Xingyu ; Jiang, Bingbing ; Yu, Kui ; Miao, chunyan ; Chen, Huanhuan</creatorcontrib><description>Causal feature selection has achieved much attention in recent years, which discovers a Markov boundary (MB) of the class attribute. The MB of the class attribute implies local causal relations between the class attribute and the features, thus leading to more interpretable and robust prediction models than the features selected by the traditional feature selection algorithms. Many causal feature selection methods have been proposed, and almost all of them employ conditional independence (CI) tests to identify MBs. However, many datasets from real-world applications may suffer from incorrect CI tests due to noise or small-sized samples, resulting in lower MB discovery accuracy for these existing algorithms. To tackle this issue, in this article, we first introduce a new concept of PCMasking to explain a type of incorrect CI tests in the MB discovery, then propose a cross-check and complement MB discovery (CCMB) algorithm to repair this type of incorrect CI tests for accurate MB discovery. To improve the efficiency of CCMB, we further design a pipeline machine-based CCMB (PM-CCMB) algorithm. Using benchmark Bayesian network datasets, the experiments demonstrate that both CCMB and PM-CCMB achieve significant improvements on the MB discovery accuracy compared with the existing methods, and PM-CCMB further improves the computational efficiency. The empirical study in the real-world datasets validates the effectiveness of CCMB and PM-CCMB against the state-of-the-art causal and traditional feature selection algorithms.</description><identifier>ISSN: 2168-2267</identifier><identifier>EISSN: 2168-2275</identifier><identifier>DOI: 10.1109/TCYB.2019.2940509</identifier><identifier>PMID: 31634853</identifier><identifier>CODEN: ITCEB8</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Accuracy ; Algorithms ; Bayes methods ; Bayesian analysis ; Bayesian network (BN) ; Benchmark testing ; causal feature selection ; Computer science ; Cybernetics ; Datasets ; Feature extraction ; Feature selection ; Markov boundary (MB) ; Markov processes ; PCMasking ; Pipeline design ; Prediction algorithms ; Prediction models</subject><ispartof>IEEE transactions on cybernetics, 2020-12, Vol.50 (12), p.4983-4996</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c415t-2c966db8b92c52bc824fdbd37049974f7d111d946103e244d04e3d0de2ebf2253</citedby><cites>FETCH-LOGICAL-c415t-2c966db8b92c52bc824fdbd37049974f7d111d946103e244d04e3d0de2ebf2253</cites><orcidid>0000-0002-0300-3448 ; 0000-0002-3918-384X ; 0000-0003-2442-4572 ; 0000-0002-8204-6197 ; 0000-0003-2217-6202</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8877773$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,54796</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/31634853$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Wu, Xingyu</creatorcontrib><creatorcontrib>Jiang, Bingbing</creatorcontrib><creatorcontrib>Yu, Kui</creatorcontrib><creatorcontrib>Miao, chunyan</creatorcontrib><creatorcontrib>Chen, Huanhuan</creatorcontrib><title>Accurate Markov Boundary Discovery for Causal Feature Selection</title><title>IEEE transactions on cybernetics</title><addtitle>TCYB</addtitle><addtitle>IEEE Trans Cybern</addtitle><description>Causal feature selection has achieved much attention in recent years, which discovers a Markov boundary (MB) of the class attribute. The MB of the class attribute implies local causal relations between the class attribute and the features, thus leading to more interpretable and robust prediction models than the features selected by the traditional feature selection algorithms. Many causal feature selection methods have been proposed, and almost all of them employ conditional independence (CI) tests to identify MBs. However, many datasets from real-world applications may suffer from incorrect CI tests due to noise or small-sized samples, resulting in lower MB discovery accuracy for these existing algorithms. To tackle this issue, in this article, we first introduce a new concept of PCMasking to explain a type of incorrect CI tests in the MB discovery, then propose a cross-check and complement MB discovery (CCMB) algorithm to repair this type of incorrect CI tests for accurate MB discovery. To improve the efficiency of CCMB, we further design a pipeline machine-based CCMB (PM-CCMB) algorithm. Using benchmark Bayesian network datasets, the experiments demonstrate that both CCMB and PM-CCMB achieve significant improvements on the MB discovery accuracy compared with the existing methods, and PM-CCMB further improves the computational efficiency. The empirical study in the real-world datasets validates the effectiveness of CCMB and PM-CCMB against the state-of-the-art causal and traditional feature selection algorithms.</description><subject>Accuracy</subject><subject>Algorithms</subject><subject>Bayes methods</subject><subject>Bayesian analysis</subject><subject>Bayesian network (BN)</subject><subject>Benchmark testing</subject><subject>causal feature selection</subject><subject>Computer science</subject><subject>Cybernetics</subject><subject>Datasets</subject><subject>Feature extraction</subject><subject>Feature selection</subject><subject>Markov boundary (MB)</subject><subject>Markov processes</subject><subject>PCMasking</subject><subject>Pipeline design</subject><subject>Prediction algorithms</subject><subject>Prediction models</subject><issn>2168-2267</issn><issn>2168-2275</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><recordid>eNpdkE1LAzEQQIMottT-ABFkwYuX1nztJjlJW60KFQ_Wg6ewm8xC63ZTk92C_94srT04lxkyb4bJQ-iS4DEhWN0tZ5_TMcVEjaniOMXqBPUpyeSIUpGeHutM9NAwhDWOIeOTkueox0jGuExZH91PjGl93kDymvsvt0umrq1t7n-Sh1UwbgexKp1PZnkb8iqZQ960HpJ3qMA0K1dfoLMyrwIMD3mAPuaPy9nzaPH29DKbLEaGk7QZUaOyzBayUNSktDCS8tIWlgnMlRK8FJYQYhXPCGZAObeYA7PYAoWipDRlA3S737v17ruF0OhNvA-qKq_BtUFThoVgBKcdevMPXbvW1_E6TXkmRUolw5Eie8p4F4KHUm_9ahM_rgnWnWDdCdadYH0QHGeuD5vbYgP2OPGnMwJXe2AFAMe2lCIGY79Np3zI</recordid><startdate>20201201</startdate><enddate>20201201</enddate><creator>Wu, Xingyu</creator><creator>Jiang, Bingbing</creator><creator>Yu, Kui</creator><creator>Miao, chunyan</creator><creator>Chen, Huanhuan</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-0300-3448</orcidid><orcidid>https://orcid.org/0000-0002-3918-384X</orcidid><orcidid>https://orcid.org/0000-0003-2442-4572</orcidid><orcidid>https://orcid.org/0000-0002-8204-6197</orcidid><orcidid>https://orcid.org/0000-0003-2217-6202</orcidid></search><sort><creationdate>20201201</creationdate><title>Accurate Markov Boundary Discovery for Causal Feature Selection</title><author>Wu, Xingyu ; Jiang, Bingbing ; Yu, Kui ; Miao, chunyan ; Chen, Huanhuan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c415t-2c966db8b92c52bc824fdbd37049974f7d111d946103e244d04e3d0de2ebf2253</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Accuracy</topic><topic>Algorithms</topic><topic>Bayes methods</topic><topic>Bayesian analysis</topic><topic>Bayesian network (BN)</topic><topic>Benchmark testing</topic><topic>causal feature selection</topic><topic>Computer science</topic><topic>Cybernetics</topic><topic>Datasets</topic><topic>Feature extraction</topic><topic>Feature selection</topic><topic>Markov boundary (MB)</topic><topic>Markov processes</topic><topic>PCMasking</topic><topic>Pipeline design</topic><topic>Prediction algorithms</topic><topic>Prediction models</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Wu, Xingyu</creatorcontrib><creatorcontrib>Jiang, Bingbing</creatorcontrib><creatorcontrib>Yu, Kui</creatorcontrib><creatorcontrib>Miao, chunyan</creatorcontrib><creatorcontrib>Chen, Huanhuan</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Wu, Xingyu</au><au>Jiang, Bingbing</au><au>Yu, Kui</au><au>Miao, chunyan</au><au>Chen, Huanhuan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Accurate Markov Boundary Discovery for Causal Feature Selection</atitle><jtitle>IEEE transactions on cybernetics</jtitle><stitle>TCYB</stitle><addtitle>IEEE Trans Cybern</addtitle><date>2020-12-01</date><risdate>2020</risdate><volume>50</volume><issue>12</issue><spage>4983</spage><epage>4996</epage><pages>4983-4996</pages><issn>2168-2267</issn><eissn>2168-2275</eissn><coden>ITCEB8</coden><abstract>Causal feature selection has achieved much attention in recent years, which discovers a Markov boundary (MB) of the class attribute. The MB of the class attribute implies local causal relations between the class attribute and the features, thus leading to more interpretable and robust prediction models than the features selected by the traditional feature selection algorithms. Many causal feature selection methods have been proposed, and almost all of them employ conditional independence (CI) tests to identify MBs. However, many datasets from real-world applications may suffer from incorrect CI tests due to noise or small-sized samples, resulting in lower MB discovery accuracy for these existing algorithms. To tackle this issue, in this article, we first introduce a new concept of PCMasking to explain a type of incorrect CI tests in the MB discovery, then propose a cross-check and complement MB discovery (CCMB) algorithm to repair this type of incorrect CI tests for accurate MB discovery. To improve the efficiency of CCMB, we further design a pipeline machine-based CCMB (PM-CCMB) algorithm. Using benchmark Bayesian network datasets, the experiments demonstrate that both CCMB and PM-CCMB achieve significant improvements on the MB discovery accuracy compared with the existing methods, and PM-CCMB further improves the computational efficiency. The empirical study in the real-world datasets validates the effectiveness of CCMB and PM-CCMB against the state-of-the-art causal and traditional feature selection algorithms.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>31634853</pmid><doi>10.1109/TCYB.2019.2940509</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-0300-3448</orcidid><orcidid>https://orcid.org/0000-0002-3918-384X</orcidid><orcidid>https://orcid.org/0000-0003-2442-4572</orcidid><orcidid>https://orcid.org/0000-0002-8204-6197</orcidid><orcidid>https://orcid.org/0000-0003-2217-6202</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 2168-2267
ispartof	IEEE transactions on cybernetics, 2020-12, Vol.50 (12), p.4983-4996
issn	2168-2267 2168-2275
language	eng
recordid	cdi_pubmed_primary_31634853
source	IEEE Electronic Library (IEL) Journals
subjects	Accuracy Algorithms Bayes methods Bayesian analysis Bayesian network (BN) Benchmark testing causal feature selection Computer science Cybernetics Datasets Feature extraction Feature selection Markov boundary (MB) Markov processes PCMasking Pipeline design Prediction algorithms Prediction models
title	Accurate Markov Boundary Discovery for Causal Feature Selection
url	http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T05%3A32%3A02IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_pubme&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Accurate%20Markov%20Boundary%20Discovery%20for%20Causal%20Feature%20Selection&rft.jtitle=IEEE%20transactions%20on%20cybernetics&rft.au=Wu,%20Xingyu&rft.date=2020-12-01&rft.volume=50&rft.issue=12&rft.spage=4983&rft.epage=4996&rft.pages=4983-4996&rft.issn=2168-2267&rft.eissn=2168-2275&rft.coden=ITCEB8&rft_id=info:doi/10.1109/TCYB.2019.2940509&rft_dat=%3Cproquest_pubme%3E2468752830%3C/proquest_pubme%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c415t-2c966db8b92c52bc824fdbd37049974f7d111d946103e244d04e3d0de2ebf2253%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2468752830&rft_id=info:pmid/31634853&rft_ieee_id=8877773&rfr_iscdi=true