Loading…

Wallenius Bayes

This paper introduces a new event model appropriate for classifying (binary) data generated by a “destructive choice” process, such as certain human behavior. In such a process, making a choice removes that choice from future consideration yet does not influence the relative probability of other cho...

Full description

Saved in:
Bibliographic Details
Published in:Machine learning 2018-06, Vol.107 (6), p.1013-1037
Main Authors: Junqué de Fortuny, Enric, Martens, David, Provost, Foster
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c359t-4b20dbd36d5572d920eb492096f3d5681a0f3a47927cc68be8c1122606c691b73
cites cdi_FETCH-LOGICAL-c359t-4b20dbd36d5572d920eb492096f3d5681a0f3a47927cc68be8c1122606c691b73
container_end_page 1037
container_issue 6
container_start_page 1013
container_title Machine learning
container_volume 107
creator Junqué de Fortuny, Enric
Martens, David
Provost, Foster
description This paper introduces a new event model appropriate for classifying (binary) data generated by a “destructive choice” process, such as certain human behavior. In such a process, making a choice removes that choice from future consideration yet does not influence the relative probability of other choices in the choice set. The proposed Wallenius event model is based on a somewhat forgotten non-central hypergeometric distribution introduced by Wallenius (Biased sampling: the non-central hypergeometric probability distribution. Ph.D. thesis, Stanford University, 1963 ). We discuss its relationship with models of how human choice behavior is generated, highlighting a key (simple) mathematical property. We use this background to describe specifically why traditional multivariate Bernoulli naive Bayes and multinomial naive Bayes each are suboptimal for such data. We then present an implementation of naive Bayes based on the Wallenius event model, and show experimentally that for data where we would expect the features to be generated via destructive choice behavior Wallenius Bayes indeed outperforms the traditional versions of naive Bayes for prediction based on these features. Furthermore, we also show that it is competitive with non-naive methods (in particular, support-vector machines). In contrast, we also show that Wallenius Bayes underperforms when the data generating process is not based on destructive choice.
doi_str_mv 10.1007/s10994-018-5699-z
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2010929887</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2010929887</sourcerecordid><originalsourceid>FETCH-LOGICAL-c359t-4b20dbd36d5572d920eb492096f3d5681a0f3a47927cc68be8c1122606c691b73</originalsourceid><addsrcrecordid>eNp1jzFPwzAQhS0EEqEgZjYkZsOdHZ_tESoKSJVYQIyW4zioVWiK3QztrydRkJhY7i3ve6ePsSuEWwTQdxnB2pIDGq7IWn44YgUqLTkoUsesAGMUJxTqlJ3lvAYAQYYKdvnh2zZuVn2-fvD7mM_ZSePbHC9-c8beF49v82e-fH16md8veZDK7nhZCairWlKtlBa1FRCrcriWGlkrMuihkb7UVugQyFTRBEQhCCiQxUrLGbuZdrep--5j3rl116fN8NIJGFyENWZs4dQKqcs5xcZt0-rLp71DcKO3m7zd4O1Gb3cYGDExeehuPmP6W_4f-gG6o1g-</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2010929887</pqid></control><display><type>article</type><title>Wallenius Bayes</title><source>Springer Nature</source><creator>Junqué de Fortuny, Enric ; Martens, David ; Provost, Foster</creator><creatorcontrib>Junqué de Fortuny, Enric ; Martens, David ; Provost, Foster</creatorcontrib><description>This paper introduces a new event model appropriate for classifying (binary) data generated by a “destructive choice” process, such as certain human behavior. In such a process, making a choice removes that choice from future consideration yet does not influence the relative probability of other choices in the choice set. The proposed Wallenius event model is based on a somewhat forgotten non-central hypergeometric distribution introduced by Wallenius (Biased sampling: the non-central hypergeometric probability distribution. Ph.D. thesis, Stanford University, 1963 ). We discuss its relationship with models of how human choice behavior is generated, highlighting a key (simple) mathematical property. We use this background to describe specifically why traditional multivariate Bernoulli naive Bayes and multinomial naive Bayes each are suboptimal for such data. We then present an implementation of naive Bayes based on the Wallenius event model, and show experimentally that for data where we would expect the features to be generated via destructive choice behavior Wallenius Bayes indeed outperforms the traditional versions of naive Bayes for prediction based on these features. Furthermore, we also show that it is competitive with non-naive methods (in particular, support-vector machines). In contrast, we also show that Wallenius Bayes underperforms when the data generating process is not based on destructive choice.</description><identifier>ISSN: 0885-6125</identifier><identifier>EISSN: 1573-0565</identifier><identifier>DOI: 10.1007/s10994-018-5699-z</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Bayesian analysis ; Computer Science ; Control ; Human behavior ; Mathematical models ; Mechatronics ; Natural Language Processing (NLP) ; Robotics ; Simulation and Modeling ; Text editing</subject><ispartof>Machine learning, 2018-06, Vol.107 (6), p.1013-1037</ispartof><rights>The Author(s) 2018</rights><rights>Machine Learning is a copyright of Springer, (2018). All Rights Reserved.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c359t-4b20dbd36d5572d920eb492096f3d5681a0f3a47927cc68be8c1122606c691b73</citedby><cites>FETCH-LOGICAL-c359t-4b20dbd36d5572d920eb492096f3d5681a0f3a47927cc68be8c1122606c691b73</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27924,27925</link.rule.ids></links><search><creatorcontrib>Junqué de Fortuny, Enric</creatorcontrib><creatorcontrib>Martens, David</creatorcontrib><creatorcontrib>Provost, Foster</creatorcontrib><title>Wallenius Bayes</title><title>Machine learning</title><addtitle>Mach Learn</addtitle><description>This paper introduces a new event model appropriate for classifying (binary) data generated by a “destructive choice” process, such as certain human behavior. In such a process, making a choice removes that choice from future consideration yet does not influence the relative probability of other choices in the choice set. The proposed Wallenius event model is based on a somewhat forgotten non-central hypergeometric distribution introduced by Wallenius (Biased sampling: the non-central hypergeometric probability distribution. Ph.D. thesis, Stanford University, 1963 ). We discuss its relationship with models of how human choice behavior is generated, highlighting a key (simple) mathematical property. We use this background to describe specifically why traditional multivariate Bernoulli naive Bayes and multinomial naive Bayes each are suboptimal for such data. We then present an implementation of naive Bayes based on the Wallenius event model, and show experimentally that for data where we would expect the features to be generated via destructive choice behavior Wallenius Bayes indeed outperforms the traditional versions of naive Bayes for prediction based on these features. Furthermore, we also show that it is competitive with non-naive methods (in particular, support-vector machines). In contrast, we also show that Wallenius Bayes underperforms when the data generating process is not based on destructive choice.</description><subject>Artificial Intelligence</subject><subject>Bayesian analysis</subject><subject>Computer Science</subject><subject>Control</subject><subject>Human behavior</subject><subject>Mathematical models</subject><subject>Mechatronics</subject><subject>Natural Language Processing (NLP)</subject><subject>Robotics</subject><subject>Simulation and Modeling</subject><subject>Text editing</subject><issn>0885-6125</issn><issn>1573-0565</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2018</creationdate><recordtype>article</recordtype><recordid>eNp1jzFPwzAQhS0EEqEgZjYkZsOdHZ_tESoKSJVYQIyW4zioVWiK3QztrydRkJhY7i3ve6ePsSuEWwTQdxnB2pIDGq7IWn44YgUqLTkoUsesAGMUJxTqlJ3lvAYAQYYKdvnh2zZuVn2-fvD7mM_ZSePbHC9-c8beF49v82e-fH16md8veZDK7nhZCairWlKtlBa1FRCrcriWGlkrMuihkb7UVugQyFTRBEQhCCiQxUrLGbuZdrep--5j3rl116fN8NIJGFyENWZs4dQKqcs5xcZt0-rLp71DcKO3m7zd4O1Gb3cYGDExeehuPmP6W_4f-gG6o1g-</recordid><startdate>20180601</startdate><enddate>20180601</enddate><creator>Junqué de Fortuny, Enric</creator><creator>Martens, David</creator><creator>Provost, Foster</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>3V.</scope><scope>7SC</scope><scope>7XB</scope><scope>88I</scope><scope>8AL</scope><scope>8AO</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K7-</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0N</scope><scope>M2P</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>Q9U</scope></search><sort><creationdate>20180601</creationdate><title>Wallenius Bayes</title><author>Junqué de Fortuny, Enric ; Martens, David ; Provost, Foster</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c359t-4b20dbd36d5572d920eb492096f3d5681a0f3a47927cc68be8c1122606c691b73</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2018</creationdate><topic>Artificial Intelligence</topic><topic>Bayesian analysis</topic><topic>Computer Science</topic><topic>Control</topic><topic>Human behavior</topic><topic>Mathematical models</topic><topic>Mechatronics</topic><topic>Natural Language Processing (NLP)</topic><topic>Robotics</topic><topic>Simulation and Modeling</topic><topic>Text editing</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Junqué de Fortuny, Enric</creatorcontrib><creatorcontrib>Martens, David</creatorcontrib><creatorcontrib>Provost, Foster</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>Science Database (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>ProQuest Pharma Collection</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>AUTh Library subscriptions: ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>Computer Science Database</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Computing Database</collection><collection>ProQuest Science Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>ProQuest Central Basic</collection><jtitle>Machine learning</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Junqué de Fortuny, Enric</au><au>Martens, David</au><au>Provost, Foster</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Wallenius Bayes</atitle><jtitle>Machine learning</jtitle><stitle>Mach Learn</stitle><date>2018-06-01</date><risdate>2018</risdate><volume>107</volume><issue>6</issue><spage>1013</spage><epage>1037</epage><pages>1013-1037</pages><issn>0885-6125</issn><eissn>1573-0565</eissn><abstract>This paper introduces a new event model appropriate for classifying (binary) data generated by a “destructive choice” process, such as certain human behavior. In such a process, making a choice removes that choice from future consideration yet does not influence the relative probability of other choices in the choice set. The proposed Wallenius event model is based on a somewhat forgotten non-central hypergeometric distribution introduced by Wallenius (Biased sampling: the non-central hypergeometric probability distribution. Ph.D. thesis, Stanford University, 1963 ). We discuss its relationship with models of how human choice behavior is generated, highlighting a key (simple) mathematical property. We use this background to describe specifically why traditional multivariate Bernoulli naive Bayes and multinomial naive Bayes each are suboptimal for such data. We then present an implementation of naive Bayes based on the Wallenius event model, and show experimentally that for data where we would expect the features to be generated via destructive choice behavior Wallenius Bayes indeed outperforms the traditional versions of naive Bayes for prediction based on these features. Furthermore, we also show that it is competitive with non-naive methods (in particular, support-vector machines). In contrast, we also show that Wallenius Bayes underperforms when the data generating process is not based on destructive choice.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10994-018-5699-z</doi><tpages>25</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0885-6125
ispartof Machine learning, 2018-06, Vol.107 (6), p.1013-1037
issn 0885-6125
1573-0565
language eng
recordid cdi_proquest_journals_2010929887
source Springer Nature
subjects Artificial Intelligence
Bayesian analysis
Computer Science
Control
Human behavior
Mathematical models
Mechatronics
Natural Language Processing (NLP)
Robotics
Simulation and Modeling
Text editing
title Wallenius Bayes
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T08%3A11%3A49IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Wallenius%20Bayes&rft.jtitle=Machine%20learning&rft.au=Junqu%C3%A9%20de%20Fortuny,%20Enric&rft.date=2018-06-01&rft.volume=107&rft.issue=6&rft.spage=1013&rft.epage=1037&rft.pages=1013-1037&rft.issn=0885-6125&rft.eissn=1573-0565&rft_id=info:doi/10.1007/s10994-018-5699-z&rft_dat=%3Cproquest_cross%3E2010929887%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c359t-4b20dbd36d5572d920eb492096f3d5681a0f3a47927cc68be8c1122606c691b73%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2010929887&rft_id=info:pmid/&rfr_iscdi=true