Loading…

Highly scalable and robust rule learner: performance evaluation and comparison

Business intelligence and bioinformatics applications increasingly require the mining of datasets consisting of millions of data points, or crafting real-time enterprise-level decision support systems for large corporations and drug companies. In all cases, there needs to be an underlying data minin...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on cybernetics 2006-02, Vol.36 (1), p.32-53
Main Authors: Kurgan, L.A., Cios, K.J., Dick, S.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c484t-9368bdde4d1782a3a9f58ddc4a9a12fbaaa69a38246e8a29010107be5116bdb33
cites cdi_FETCH-LOGICAL-c484t-9368bdde4d1782a3a9f58ddc4a9a12fbaaa69a38246e8a29010107be5116bdb33
container_end_page 53
container_issue 1
container_start_page 32
container_title IEEE transactions on cybernetics
container_volume 36
creator Kurgan, L.A.
Cios, K.J.
Dick, S.
description Business intelligence and bioinformatics applications increasingly require the mining of datasets consisting of millions of data points, or crafting real-time enterprise-level decision support systems for large corporations and drug companies. In all cases, there needs to be an underlying data mining system, and this mining system must be highly scalable. To this end, we describe a new rule learner called DataSqueezer. The learner belongs to the family of inductive supervised rule extraction algorithms. DataSqueezer is a simple, greedy, rule builder that generates a set of production rules from labeled input data. In spite of its relative simplicity, DataSqueezer is a very effective learner. The rules generated by the algorithm are compact, comprehensible, and have accuracy comparable to rules generated by other state-of-the-art rule extraction algorithms. The main advantages of DataSqueezer are very high efficiency, and missing data resistance. DataSqueezer exhibits log-linear asymptotic complexity with the number of training examples, and it is faster than other state-of-the-art rule learners. The learner is also robust to large quantities of missing data, as verified by extensive experimental comparison with the other learners. DataSqueezer is thus well suited to modern data mining and business intelligence tasks, which commonly involve huge datasets with a large fraction of missing data.
doi_str_mv 10.1109/TSMCB.2005.852983
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_28907273</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1580617</ieee_id><sourcerecordid>2343140671</sourcerecordid><originalsourceid>FETCH-LOGICAL-c484t-9368bdde4d1782a3a9f58ddc4a9a12fbaaa69a38246e8a29010107be5116bdb33</originalsourceid><addsrcrecordid>eNqNkU1v1DAQhi1ERUvhByAkFPWAuGTx-Nvc6AooUoED5WxNkgmkSuKtvUHqv8fbXakSB6h8GFvzzKuRH8ZeAF8BcP_26vuX9flKcK5XTgvv5CN2Al5BzZUXj8udO1krBf6YPc35mnPuubdP2DEYZZw2-oR9vRh-_hpvq9ziiM1IFc5dlWKz5G2VlvIeCdNM6V21odTHNOHcUkW_cVxwO8T5jm_jtME05Dg_Y0c9jpmeH-op-_Hxw9X6or789unz-v1l3SqntrWXxjVdR6oD6wRK9L12Xdcq9AiibxDReJROKEMOhedQjm1IA5ima6Q8Za_3uZsUbxbK2zANuaVxxJnikoOxRglh-X9B4Ty3wsoHgFxyB7qAb_4JgrFQ9nbCFPTsL_Q6LmkuHxOc0bZEwm5D2ENtijkn6sMmDROm2wA87DSHO81hpznsNZeZV4fgpZmou584eC3Ayz0wENF9WztuwMo_r6mqxA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>865728010</pqid></control><display><type>article</type><title>Highly scalable and robust rule learner: performance evaluation and comparison</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Kurgan, L.A. ; Cios, K.J. ; Dick, S.</creator><creatorcontrib>Kurgan, L.A. ; Cios, K.J. ; Dick, S.</creatorcontrib><description>Business intelligence and bioinformatics applications increasingly require the mining of datasets consisting of millions of data points, or crafting real-time enterprise-level decision support systems for large corporations and drug companies. In all cases, there needs to be an underlying data mining system, and this mining system must be highly scalable. To this end, we describe a new rule learner called DataSqueezer. The learner belongs to the family of inductive supervised rule extraction algorithms. DataSqueezer is a simple, greedy, rule builder that generates a set of production rules from labeled input data. In spite of its relative simplicity, DataSqueezer is a very effective learner. The rules generated by the algorithm are compact, comprehensible, and have accuracy comparable to rules generated by other state-of-the-art rule extraction algorithms. The main advantages of DataSqueezer are very high efficiency, and missing data resistance. DataSqueezer exhibits log-linear asymptotic complexity with the number of training examples, and it is faster than other state-of-the-art rule learners. The learner is also robust to large quantities of missing data, as verified by extensive experimental comparison with the other learners. DataSqueezer is thus well suited to modern data mining and business intelligence tasks, which commonly involve huge datasets with a large fraction of missing data.</description><identifier>ISSN: 1083-4419</identifier><identifier>ISSN: 2168-2267</identifier><identifier>EISSN: 1941-0492</identifier><identifier>EISSN: 2168-2275</identifier><identifier>DOI: 10.1109/TSMCB.2005.852983</identifier><identifier>PMID: 16468565</identifier><identifier>CODEN: ITSCFI</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Algorithms ; Artificial Intelligence ; Bioinformatics ; Business ; Companies ; Complexity ; Computer science ; Construction ; Data mining ; Database Management Systems ; Databases, Factual ; DataSqueezer ; Decision support systems ; Decision Support Techniques ; Decision trees ; Drugs ; Extraction ; Information Storage and Retrieval - methods ; Intelligence ; machine learning ; Missing data ; Production ; Real time systems ; Robustness ; rule induction ; rule learner ; State of the art ; Studies</subject><ispartof>IEEE transactions on cybernetics, 2006-02, Vol.36 (1), p.32-53</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2006</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c484t-9368bdde4d1782a3a9f58ddc4a9a12fbaaa69a38246e8a29010107be5116bdb33</citedby><cites>FETCH-LOGICAL-c484t-9368bdde4d1782a3a9f58ddc4a9a12fbaaa69a38246e8a29010107be5116bdb33</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1580617$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27922,27923,54794</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/16468565$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Kurgan, L.A.</creatorcontrib><creatorcontrib>Cios, K.J.</creatorcontrib><creatorcontrib>Dick, S.</creatorcontrib><title>Highly scalable and robust rule learner: performance evaluation and comparison</title><title>IEEE transactions on cybernetics</title><addtitle>TSMCB</addtitle><addtitle>IEEE Trans Syst Man Cybern B Cybern</addtitle><description>Business intelligence and bioinformatics applications increasingly require the mining of datasets consisting of millions of data points, or crafting real-time enterprise-level decision support systems for large corporations and drug companies. In all cases, there needs to be an underlying data mining system, and this mining system must be highly scalable. To this end, we describe a new rule learner called DataSqueezer. The learner belongs to the family of inductive supervised rule extraction algorithms. DataSqueezer is a simple, greedy, rule builder that generates a set of production rules from labeled input data. In spite of its relative simplicity, DataSqueezer is a very effective learner. The rules generated by the algorithm are compact, comprehensible, and have accuracy comparable to rules generated by other state-of-the-art rule extraction algorithms. The main advantages of DataSqueezer are very high efficiency, and missing data resistance. DataSqueezer exhibits log-linear asymptotic complexity with the number of training examples, and it is faster than other state-of-the-art rule learners. The learner is also robust to large quantities of missing data, as verified by extensive experimental comparison with the other learners. DataSqueezer is thus well suited to modern data mining and business intelligence tasks, which commonly involve huge datasets with a large fraction of missing data.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Bioinformatics</subject><subject>Business</subject><subject>Companies</subject><subject>Complexity</subject><subject>Computer science</subject><subject>Construction</subject><subject>Data mining</subject><subject>Database Management Systems</subject><subject>Databases, Factual</subject><subject>DataSqueezer</subject><subject>Decision support systems</subject><subject>Decision Support Techniques</subject><subject>Decision trees</subject><subject>Drugs</subject><subject>Extraction</subject><subject>Information Storage and Retrieval - methods</subject><subject>Intelligence</subject><subject>machine learning</subject><subject>Missing data</subject><subject>Production</subject><subject>Real time systems</subject><subject>Robustness</subject><subject>rule induction</subject><subject>rule learner</subject><subject>State of the art</subject><subject>Studies</subject><issn>1083-4419</issn><issn>2168-2267</issn><issn>1941-0492</issn><issn>2168-2275</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><recordid>eNqNkU1v1DAQhi1ERUvhByAkFPWAuGTx-Nvc6AooUoED5WxNkgmkSuKtvUHqv8fbXakSB6h8GFvzzKuRH8ZeAF8BcP_26vuX9flKcK5XTgvv5CN2Al5BzZUXj8udO1krBf6YPc35mnPuubdP2DEYZZw2-oR9vRh-_hpvq9ziiM1IFc5dlWKz5G2VlvIeCdNM6V21odTHNOHcUkW_cVxwO8T5jm_jtME05Dg_Y0c9jpmeH-op-_Hxw9X6or789unz-v1l3SqntrWXxjVdR6oD6wRK9L12Xdcq9AiibxDReJROKEMOhedQjm1IA5ima6Q8Za_3uZsUbxbK2zANuaVxxJnikoOxRglh-X9B4Ty3wsoHgFxyB7qAb_4JgrFQ9nbCFPTsL_Q6LmkuHxOc0bZEwm5D2ENtijkn6sMmDROm2wA87DSHO81hpznsNZeZV4fgpZmou584eC3Ayz0wENF9WztuwMo_r6mqxA</recordid><startdate>20060201</startdate><enddate>20060201</enddate><creator>Kurgan, L.A.</creator><creator>Cios, K.J.</creator><creator>Dick, S.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope></search><sort><creationdate>20060201</creationdate><title>Highly scalable and robust rule learner: performance evaluation and comparison</title><author>Kurgan, L.A. ; Cios, K.J. ; Dick, S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c484t-9368bdde4d1782a3a9f58ddc4a9a12fbaaa69a38246e8a29010107be5116bdb33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Bioinformatics</topic><topic>Business</topic><topic>Companies</topic><topic>Complexity</topic><topic>Computer science</topic><topic>Construction</topic><topic>Data mining</topic><topic>Database Management Systems</topic><topic>Databases, Factual</topic><topic>DataSqueezer</topic><topic>Decision support systems</topic><topic>Decision Support Techniques</topic><topic>Decision trees</topic><topic>Drugs</topic><topic>Extraction</topic><topic>Information Storage and Retrieval - methods</topic><topic>Intelligence</topic><topic>machine learning</topic><topic>Missing data</topic><topic>Production</topic><topic>Real time systems</topic><topic>Robustness</topic><topic>rule induction</topic><topic>rule learner</topic><topic>State of the art</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kurgan, L.A.</creatorcontrib><creatorcontrib>Cios, K.J.</creatorcontrib><creatorcontrib>Dick, S.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kurgan, L.A.</au><au>Cios, K.J.</au><au>Dick, S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Highly scalable and robust rule learner: performance evaluation and comparison</atitle><jtitle>IEEE transactions on cybernetics</jtitle><stitle>TSMCB</stitle><addtitle>IEEE Trans Syst Man Cybern B Cybern</addtitle><date>2006-02-01</date><risdate>2006</risdate><volume>36</volume><issue>1</issue><spage>32</spage><epage>53</epage><pages>32-53</pages><issn>1083-4419</issn><issn>2168-2267</issn><eissn>1941-0492</eissn><eissn>2168-2275</eissn><coden>ITSCFI</coden><abstract>Business intelligence and bioinformatics applications increasingly require the mining of datasets consisting of millions of data points, or crafting real-time enterprise-level decision support systems for large corporations and drug companies. In all cases, there needs to be an underlying data mining system, and this mining system must be highly scalable. To this end, we describe a new rule learner called DataSqueezer. The learner belongs to the family of inductive supervised rule extraction algorithms. DataSqueezer is a simple, greedy, rule builder that generates a set of production rules from labeled input data. In spite of its relative simplicity, DataSqueezer is a very effective learner. The rules generated by the algorithm are compact, comprehensible, and have accuracy comparable to rules generated by other state-of-the-art rule extraction algorithms. The main advantages of DataSqueezer are very high efficiency, and missing data resistance. DataSqueezer exhibits log-linear asymptotic complexity with the number of training examples, and it is faster than other state-of-the-art rule learners. The learner is also robust to large quantities of missing data, as verified by extensive experimental comparison with the other learners. DataSqueezer is thus well suited to modern data mining and business intelligence tasks, which commonly involve huge datasets with a large fraction of missing data.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>16468565</pmid><doi>10.1109/TSMCB.2005.852983</doi><tpages>22</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1083-4419
ispartof IEEE transactions on cybernetics, 2006-02, Vol.36 (1), p.32-53
issn 1083-4419
2168-2267
1941-0492
2168-2275
language eng
recordid cdi_proquest_miscellaneous_28907273
source IEEE Electronic Library (IEL) Journals
subjects Algorithms
Artificial Intelligence
Bioinformatics
Business
Companies
Complexity
Computer science
Construction
Data mining
Database Management Systems
Databases, Factual
DataSqueezer
Decision support systems
Decision Support Techniques
Decision trees
Drugs
Extraction
Information Storage and Retrieval - methods
Intelligence
machine learning
Missing data
Production
Real time systems
Robustness
rule induction
rule learner
State of the art
Studies
title Highly scalable and robust rule learner: performance evaluation and comparison
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T10%3A36%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Highly%20scalable%20and%20robust%20rule%20learner:%20performance%20evaluation%20and%20comparison&rft.jtitle=IEEE%20transactions%20on%20cybernetics&rft.au=Kurgan,%20L.A.&rft.date=2006-02-01&rft.volume=36&rft.issue=1&rft.spage=32&rft.epage=53&rft.pages=32-53&rft.issn=1083-4419&rft.eissn=1941-0492&rft.coden=ITSCFI&rft_id=info:doi/10.1109/TSMCB.2005.852983&rft_dat=%3Cproquest_cross%3E2343140671%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c484t-9368bdde4d1782a3a9f58ddc4a9a12fbaaa69a38246e8a29010107be5116bdb33%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=865728010&rft_id=info:pmid/16468565&rft_ieee_id=1580617&rfr_iscdi=true