Loading…
Highly scalable and robust rule learner: performance evaluation and comparison
Business intelligence and bioinformatics applications increasingly require the mining of datasets consisting of millions of data points, or crafting real-time enterprise-level decision support systems for large corporations and drug companies. In all cases, there needs to be an underlying data minin...
Saved in:
Published in: | IEEE transactions on cybernetics 2006-02, Vol.36 (1), p.32-53 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c484t-9368bdde4d1782a3a9f58ddc4a9a12fbaaa69a38246e8a29010107be5116bdb33 |
---|---|
cites | cdi_FETCH-LOGICAL-c484t-9368bdde4d1782a3a9f58ddc4a9a12fbaaa69a38246e8a29010107be5116bdb33 |
container_end_page | 53 |
container_issue | 1 |
container_start_page | 32 |
container_title | IEEE transactions on cybernetics |
container_volume | 36 |
creator | Kurgan, L.A. Cios, K.J. Dick, S. |
description | Business intelligence and bioinformatics applications increasingly require the mining of datasets consisting of millions of data points, or crafting real-time enterprise-level decision support systems for large corporations and drug companies. In all cases, there needs to be an underlying data mining system, and this mining system must be highly scalable. To this end, we describe a new rule learner called DataSqueezer. The learner belongs to the family of inductive supervised rule extraction algorithms. DataSqueezer is a simple, greedy, rule builder that generates a set of production rules from labeled input data. In spite of its relative simplicity, DataSqueezer is a very effective learner. The rules generated by the algorithm are compact, comprehensible, and have accuracy comparable to rules generated by other state-of-the-art rule extraction algorithms. The main advantages of DataSqueezer are very high efficiency, and missing data resistance. DataSqueezer exhibits log-linear asymptotic complexity with the number of training examples, and it is faster than other state-of-the-art rule learners. The learner is also robust to large quantities of missing data, as verified by extensive experimental comparison with the other learners. DataSqueezer is thus well suited to modern data mining and business intelligence tasks, which commonly involve huge datasets with a large fraction of missing data. |
doi_str_mv | 10.1109/TSMCB.2005.852983 |
format | article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_28907273</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>1580617</ieee_id><sourcerecordid>2343140671</sourcerecordid><originalsourceid>FETCH-LOGICAL-c484t-9368bdde4d1782a3a9f58ddc4a9a12fbaaa69a38246e8a29010107be5116bdb33</originalsourceid><addsrcrecordid>eNqNkU1v1DAQhi1ERUvhByAkFPWAuGTx-Nvc6AooUoED5WxNkgmkSuKtvUHqv8fbXakSB6h8GFvzzKuRH8ZeAF8BcP_26vuX9flKcK5XTgvv5CN2Al5BzZUXj8udO1krBf6YPc35mnPuubdP2DEYZZw2-oR9vRh-_hpvq9ziiM1IFc5dlWKz5G2VlvIeCdNM6V21odTHNOHcUkW_cVxwO8T5jm_jtME05Dg_Y0c9jpmeH-op-_Hxw9X6or789unz-v1l3SqntrWXxjVdR6oD6wRK9L12Xdcq9AiibxDReJROKEMOhedQjm1IA5ima6Q8Za_3uZsUbxbK2zANuaVxxJnikoOxRglh-X9B4Ty3wsoHgFxyB7qAb_4JgrFQ9nbCFPTsL_Q6LmkuHxOc0bZEwm5D2ENtijkn6sMmDROm2wA87DSHO81hpznsNZeZV4fgpZmou584eC3Ayz0wENF9WztuwMo_r6mqxA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>865728010</pqid></control><display><type>article</type><title>Highly scalable and robust rule learner: performance evaluation and comparison</title><source>IEEE Electronic Library (IEL) Journals</source><creator>Kurgan, L.A. ; Cios, K.J. ; Dick, S.</creator><creatorcontrib>Kurgan, L.A. ; Cios, K.J. ; Dick, S.</creatorcontrib><description>Business intelligence and bioinformatics applications increasingly require the mining of datasets consisting of millions of data points, or crafting real-time enterprise-level decision support systems for large corporations and drug companies. In all cases, there needs to be an underlying data mining system, and this mining system must be highly scalable. To this end, we describe a new rule learner called DataSqueezer. The learner belongs to the family of inductive supervised rule extraction algorithms. DataSqueezer is a simple, greedy, rule builder that generates a set of production rules from labeled input data. In spite of its relative simplicity, DataSqueezer is a very effective learner. The rules generated by the algorithm are compact, comprehensible, and have accuracy comparable to rules generated by other state-of-the-art rule extraction algorithms. The main advantages of DataSqueezer are very high efficiency, and missing data resistance. DataSqueezer exhibits log-linear asymptotic complexity with the number of training examples, and it is faster than other state-of-the-art rule learners. The learner is also robust to large quantities of missing data, as verified by extensive experimental comparison with the other learners. DataSqueezer is thus well suited to modern data mining and business intelligence tasks, which commonly involve huge datasets with a large fraction of missing data.</description><identifier>ISSN: 1083-4419</identifier><identifier>ISSN: 2168-2267</identifier><identifier>EISSN: 1941-0492</identifier><identifier>EISSN: 2168-2275</identifier><identifier>DOI: 10.1109/TSMCB.2005.852983</identifier><identifier>PMID: 16468565</identifier><identifier>CODEN: ITSCFI</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Algorithms ; Artificial Intelligence ; Bioinformatics ; Business ; Companies ; Complexity ; Computer science ; Construction ; Data mining ; Database Management Systems ; Databases, Factual ; DataSqueezer ; Decision support systems ; Decision Support Techniques ; Decision trees ; Drugs ; Extraction ; Information Storage and Retrieval - methods ; Intelligence ; machine learning ; Missing data ; Production ; Real time systems ; Robustness ; rule induction ; rule learner ; State of the art ; Studies</subject><ispartof>IEEE transactions on cybernetics, 2006-02, Vol.36 (1), p.32-53</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2006</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c484t-9368bdde4d1782a3a9f58ddc4a9a12fbaaa69a38246e8a29010107be5116bdb33</citedby><cites>FETCH-LOGICAL-c484t-9368bdde4d1782a3a9f58ddc4a9a12fbaaa69a38246e8a29010107be5116bdb33</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/1580617$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,27922,27923,54794</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/16468565$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Kurgan, L.A.</creatorcontrib><creatorcontrib>Cios, K.J.</creatorcontrib><creatorcontrib>Dick, S.</creatorcontrib><title>Highly scalable and robust rule learner: performance evaluation and comparison</title><title>IEEE transactions on cybernetics</title><addtitle>TSMCB</addtitle><addtitle>IEEE Trans Syst Man Cybern B Cybern</addtitle><description>Business intelligence and bioinformatics applications increasingly require the mining of datasets consisting of millions of data points, or crafting real-time enterprise-level decision support systems for large corporations and drug companies. In all cases, there needs to be an underlying data mining system, and this mining system must be highly scalable. To this end, we describe a new rule learner called DataSqueezer. The learner belongs to the family of inductive supervised rule extraction algorithms. DataSqueezer is a simple, greedy, rule builder that generates a set of production rules from labeled input data. In spite of its relative simplicity, DataSqueezer is a very effective learner. The rules generated by the algorithm are compact, comprehensible, and have accuracy comparable to rules generated by other state-of-the-art rule extraction algorithms. The main advantages of DataSqueezer are very high efficiency, and missing data resistance. DataSqueezer exhibits log-linear asymptotic complexity with the number of training examples, and it is faster than other state-of-the-art rule learners. The learner is also robust to large quantities of missing data, as verified by extensive experimental comparison with the other learners. DataSqueezer is thus well suited to modern data mining and business intelligence tasks, which commonly involve huge datasets with a large fraction of missing data.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Bioinformatics</subject><subject>Business</subject><subject>Companies</subject><subject>Complexity</subject><subject>Computer science</subject><subject>Construction</subject><subject>Data mining</subject><subject>Database Management Systems</subject><subject>Databases, Factual</subject><subject>DataSqueezer</subject><subject>Decision support systems</subject><subject>Decision Support Techniques</subject><subject>Decision trees</subject><subject>Drugs</subject><subject>Extraction</subject><subject>Information Storage and Retrieval - methods</subject><subject>Intelligence</subject><subject>machine learning</subject><subject>Missing data</subject><subject>Production</subject><subject>Real time systems</subject><subject>Robustness</subject><subject>rule induction</subject><subject>rule learner</subject><subject>State of the art</subject><subject>Studies</subject><issn>1083-4419</issn><issn>2168-2267</issn><issn>1941-0492</issn><issn>2168-2275</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2006</creationdate><recordtype>article</recordtype><recordid>eNqNkU1v1DAQhi1ERUvhByAkFPWAuGTx-Nvc6AooUoED5WxNkgmkSuKtvUHqv8fbXakSB6h8GFvzzKuRH8ZeAF8BcP_26vuX9flKcK5XTgvv5CN2Al5BzZUXj8udO1krBf6YPc35mnPuubdP2DEYZZw2-oR9vRh-_hpvq9ziiM1IFc5dlWKz5G2VlvIeCdNM6V21odTHNOHcUkW_cVxwO8T5jm_jtME05Dg_Y0c9jpmeH-op-_Hxw9X6or789unz-v1l3SqntrWXxjVdR6oD6wRK9L12Xdcq9AiibxDReJROKEMOhedQjm1IA5ima6Q8Za_3uZsUbxbK2zANuaVxxJnikoOxRglh-X9B4Ty3wsoHgFxyB7qAb_4JgrFQ9nbCFPTsL_Q6LmkuHxOc0bZEwm5D2ENtijkn6sMmDROm2wA87DSHO81hpznsNZeZV4fgpZmou584eC3Ayz0wENF9WztuwMo_r6mqxA</recordid><startdate>20060201</startdate><enddate>20060201</enddate><creator>Kurgan, L.A.</creator><creator>Cios, K.J.</creator><creator>Dick, S.</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>CGR</scope><scope>CUY</scope><scope>CVF</scope><scope>ECM</scope><scope>EIF</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope></search><sort><creationdate>20060201</creationdate><title>Highly scalable and robust rule learner: performance evaluation and comparison</title><author>Kurgan, L.A. ; Cios, K.J. ; Dick, S.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c484t-9368bdde4d1782a3a9f58ddc4a9a12fbaaa69a38246e8a29010107be5116bdb33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2006</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Bioinformatics</topic><topic>Business</topic><topic>Companies</topic><topic>Complexity</topic><topic>Computer science</topic><topic>Construction</topic><topic>Data mining</topic><topic>Database Management Systems</topic><topic>Databases, Factual</topic><topic>DataSqueezer</topic><topic>Decision support systems</topic><topic>Decision Support Techniques</topic><topic>Decision trees</topic><topic>Drugs</topic><topic>Extraction</topic><topic>Information Storage and Retrieval - methods</topic><topic>Intelligence</topic><topic>machine learning</topic><topic>Missing data</topic><topic>Production</topic><topic>Real time systems</topic><topic>Robustness</topic><topic>rule induction</topic><topic>rule learner</topic><topic>State of the art</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kurgan, L.A.</creatorcontrib><creatorcontrib>Cios, K.J.</creatorcontrib><creatorcontrib>Dick, S.</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>Medline</collection><collection>MEDLINE</collection><collection>MEDLINE (Ovid)</collection><collection>MEDLINE</collection><collection>MEDLINE</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kurgan, L.A.</au><au>Cios, K.J.</au><au>Dick, S.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Highly scalable and robust rule learner: performance evaluation and comparison</atitle><jtitle>IEEE transactions on cybernetics</jtitle><stitle>TSMCB</stitle><addtitle>IEEE Trans Syst Man Cybern B Cybern</addtitle><date>2006-02-01</date><risdate>2006</risdate><volume>36</volume><issue>1</issue><spage>32</spage><epage>53</epage><pages>32-53</pages><issn>1083-4419</issn><issn>2168-2267</issn><eissn>1941-0492</eissn><eissn>2168-2275</eissn><coden>ITSCFI</coden><abstract>Business intelligence and bioinformatics applications increasingly require the mining of datasets consisting of millions of data points, or crafting real-time enterprise-level decision support systems for large corporations and drug companies. In all cases, there needs to be an underlying data mining system, and this mining system must be highly scalable. To this end, we describe a new rule learner called DataSqueezer. The learner belongs to the family of inductive supervised rule extraction algorithms. DataSqueezer is a simple, greedy, rule builder that generates a set of production rules from labeled input data. In spite of its relative simplicity, DataSqueezer is a very effective learner. The rules generated by the algorithm are compact, comprehensible, and have accuracy comparable to rules generated by other state-of-the-art rule extraction algorithms. The main advantages of DataSqueezer are very high efficiency, and missing data resistance. DataSqueezer exhibits log-linear asymptotic complexity with the number of training examples, and it is faster than other state-of-the-art rule learners. The learner is also robust to large quantities of missing data, as verified by extensive experimental comparison with the other learners. DataSqueezer is thus well suited to modern data mining and business intelligence tasks, which commonly involve huge datasets with a large fraction of missing data.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>16468565</pmid><doi>10.1109/TSMCB.2005.852983</doi><tpages>22</tpages><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1083-4419 |
ispartof | IEEE transactions on cybernetics, 2006-02, Vol.36 (1), p.32-53 |
issn | 1083-4419 2168-2267 1941-0492 2168-2275 |
language | eng |
recordid | cdi_proquest_miscellaneous_28907273 |
source | IEEE Electronic Library (IEL) Journals |
subjects | Algorithms Artificial Intelligence Bioinformatics Business Companies Complexity Computer science Construction Data mining Database Management Systems Databases, Factual DataSqueezer Decision support systems Decision Support Techniques Decision trees Drugs Extraction Information Storage and Retrieval - methods Intelligence machine learning Missing data Production Real time systems Robustness rule induction rule learner State of the art Studies |
title | Highly scalable and robust rule learner: performance evaluation and comparison |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T10%3A36%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Highly%20scalable%20and%20robust%20rule%20learner:%20performance%20evaluation%20and%20comparison&rft.jtitle=IEEE%20transactions%20on%20cybernetics&rft.au=Kurgan,%20L.A.&rft.date=2006-02-01&rft.volume=36&rft.issue=1&rft.spage=32&rft.epage=53&rft.pages=32-53&rft.issn=1083-4419&rft.eissn=1941-0492&rft.coden=ITSCFI&rft_id=info:doi/10.1109/TSMCB.2005.852983&rft_dat=%3Cproquest_cross%3E2343140671%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c484t-9368bdde4d1782a3a9f58ddc4a9a12fbaaa69a38246e8a29010107be5116bdb33%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=865728010&rft_id=info:pmid/16468565&rft_ieee_id=1580617&rfr_iscdi=true |