Loading…
A new classification of datasets for frequent itemsets
The discovery of frequent patterns is a famous problem in data mining. While plenty of algorithms have been proposed during the last decade, only a few contributions have tried to understand the influence of datasets on the algorithms behavior. Being able to explain why certain algorithms are likely...
Saved in:
Published in: | Journal of intelligent information systems 2010-02, Vol.34 (1), p.1-19 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
cited_by | cdi_FETCH-LOGICAL-c381t-ec3ca759487555ec78dfd6523430bacf077a34be9ccf7494cde857f16d68e3c93 |
---|---|
cites | cdi_FETCH-LOGICAL-c381t-ec3ca759487555ec78dfd6523430bacf077a34be9ccf7494cde857f16d68e3c93 |
container_end_page | 19 |
container_issue | 1 |
container_start_page | 1 |
container_title | Journal of intelligent information systems |
container_volume | 34 |
creator | Flouvat, Frédéric De Marchi, Fabien Petit, Jean-Marc |
description | The discovery of frequent patterns is a famous problem in data mining. While plenty of algorithms have been proposed during the last decade, only a few contributions have tried to understand the influence of datasets on the algorithms behavior. Being able to explain why certain algorithms are likely to perform very well or very poorly on some datasets is still an open question. In this setting, we describe a thorough experimental study of datasets with respect to frequent itemsets. We study the
distribution
of frequent itemsets with respect to itemsets size together with the distribution of three concise representations: frequent closed, frequent free and frequent essential itemsets. For each of them, we also study the distribution of their positive and negative borders whenever possible. The main outcome of these experiments is a new classification of datasets invariant w.r.t. minsup variations and robust to explain efficiency of several implementations. |
doi_str_mv | 10.1007/s10844-008-0077-0 |
format | article |
fullrecord | <record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_01381427v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1944424631</sourcerecordid><originalsourceid>FETCH-LOGICAL-c381t-ec3ca759487555ec78dfd6523430bacf077a34be9ccf7494cde857f16d68e3c93</originalsourceid><addsrcrecordid>eNp1kEFLAzEQhYMoWKs_wNviRTysTjbJJjmWolYoeNFzSLMT3bLd1GSr-O9NWVEQPAwDw_dm3jxCzilcUwB5kygozksAlUvKEg7IhArJSllLcUgmoCtRag3VMTlJaQ0AWtUwIfWs6PGjcJ1NqfWts0Mb-iL4orGDTTikwodY-IhvO-yHoh1ws5-ekiNvu4Rn331Knu9un-aLcvl4_zCfLUvHFB1KdMxZKTRXUgiBTqrGN7WoGGewss5np5bxFWrnvOSauwaVkJ7WTa2QOc2m5Grc-2o7s43txsZPE2xrFrOl2c-A5kO8ku80s5cju40hu02D2bTJYdfZHsMuGclZTRkTPJMXf8h12MU-P2IqAKpEpVmG6Ai5GFKK6H_uUzD7zM2YucmZm33mBrKmGjUps_0Lxt_F_4u-ANl3gik</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>200185293</pqid></control><display><type>article</type><title>A new classification of datasets for frequent itemsets</title><source>ABI/INFORM Global</source><source>Springer Nature</source><creator>Flouvat, Frédéric ; De Marchi, Fabien ; Petit, Jean-Marc</creator><creatorcontrib>Flouvat, Frédéric ; De Marchi, Fabien ; Petit, Jean-Marc</creatorcontrib><description>The discovery of frequent patterns is a famous problem in data mining. While plenty of algorithms have been proposed during the last decade, only a few contributions have tried to understand the influence of datasets on the algorithms behavior. Being able to explain why certain algorithms are likely to perform very well or very poorly on some datasets is still an open question. In this setting, we describe a thorough experimental study of datasets with respect to frequent itemsets. We study the
distribution
of frequent itemsets with respect to itemsets size together with the distribution of three concise representations: frequent closed, frequent free and frequent essential itemsets. For each of them, we also study the distribution of their positive and negative borders whenever possible. The main outcome of these experiments is a new classification of datasets invariant w.r.t. minsup variations and robust to explain efficiency of several implementations.</description><identifier>ISSN: 0925-9902</identifier><identifier>EISSN: 1573-7675</identifier><identifier>DOI: 10.1007/s10844-008-0077-0</identifier><language>eng</language><publisher>Boston: Springer US</publisher><subject>Algorithms ; Artificial Intelligence ; Classification ; Computer Science ; Data mining ; Data Structures and Information Theory ; Datasets ; Information Storage and Retrieval ; IT in Business ; Natural Language Processing (NLP) ; Studies</subject><ispartof>Journal of intelligent information systems, 2010-02, Vol.34 (1), p.1-19</ispartof><rights>Springer Science+Business Media, LLC 2009</rights><rights>Springer Science+Business Media, LLC 2010</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c381t-ec3ca759487555ec78dfd6523430bacf077a34be9ccf7494cde857f16d68e3c93</citedby><cites>FETCH-LOGICAL-c381t-ec3ca759487555ec78dfd6523430bacf077a34be9ccf7494cde857f16d68e3c93</cites><orcidid>0000-0001-7288-0498 ; 0000-0002-0015-745X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/200185293/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/200185293?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>230,314,780,784,885,11688,27924,27925,36060,36061,44363,74895</link.rule.ids><backlink>$$Uhttps://hal.science/hal-01381427$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Flouvat, Frédéric</creatorcontrib><creatorcontrib>De Marchi, Fabien</creatorcontrib><creatorcontrib>Petit, Jean-Marc</creatorcontrib><title>A new classification of datasets for frequent itemsets</title><title>Journal of intelligent information systems</title><addtitle>J Intell Inf Syst</addtitle><description>The discovery of frequent patterns is a famous problem in data mining. While plenty of algorithms have been proposed during the last decade, only a few contributions have tried to understand the influence of datasets on the algorithms behavior. Being able to explain why certain algorithms are likely to perform very well or very poorly on some datasets is still an open question. In this setting, we describe a thorough experimental study of datasets with respect to frequent itemsets. We study the
distribution
of frequent itemsets with respect to itemsets size together with the distribution of three concise representations: frequent closed, frequent free and frequent essential itemsets. For each of them, we also study the distribution of their positive and negative borders whenever possible. The main outcome of these experiments is a new classification of datasets invariant w.r.t. minsup variations and robust to explain efficiency of several implementations.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Classification</subject><subject>Computer Science</subject><subject>Data mining</subject><subject>Data Structures and Information Theory</subject><subject>Datasets</subject><subject>Information Storage and Retrieval</subject><subject>IT in Business</subject><subject>Natural Language Processing (NLP)</subject><subject>Studies</subject><issn>0925-9902</issn><issn>1573-7675</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><sourceid>M0C</sourceid><recordid>eNp1kEFLAzEQhYMoWKs_wNviRTysTjbJJjmWolYoeNFzSLMT3bLd1GSr-O9NWVEQPAwDw_dm3jxCzilcUwB5kygozksAlUvKEg7IhArJSllLcUgmoCtRag3VMTlJaQ0AWtUwIfWs6PGjcJ1NqfWts0Mb-iL4orGDTTikwodY-IhvO-yHoh1ws5-ekiNvu4Rn331Knu9un-aLcvl4_zCfLUvHFB1KdMxZKTRXUgiBTqrGN7WoGGewss5np5bxFWrnvOSauwaVkJ7WTa2QOc2m5Grc-2o7s43txsZPE2xrFrOl2c-A5kO8ku80s5cju40hu02D2bTJYdfZHsMuGclZTRkTPJMXf8h12MU-P2IqAKpEpVmG6Ai5GFKK6H_uUzD7zM2YucmZm33mBrKmGjUps_0Lxt_F_4u-ANl3gik</recordid><startdate>20100201</startdate><enddate>20100201</enddate><creator>Flouvat, Frédéric</creator><creator>De Marchi, Fabien</creator><creator>Petit, Jean-Marc</creator><general>Springer US</general><general>Springer Nature B.V</general><general>Springer Verlag</general><scope>AAYXX</scope><scope>CITATION</scope><scope>0U~</scope><scope>1-H</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L.0</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>1XC</scope><orcidid>https://orcid.org/0000-0001-7288-0498</orcidid><orcidid>https://orcid.org/0000-0002-0015-745X</orcidid></search><sort><creationdate>20100201</creationdate><title>A new classification of datasets for frequent itemsets</title><author>Flouvat, Frédéric ; De Marchi, Fabien ; Petit, Jean-Marc</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c381t-ec3ca759487555ec78dfd6523430bacf077a34be9ccf7494cde857f16d68e3c93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Classification</topic><topic>Computer Science</topic><topic>Data mining</topic><topic>Data Structures and Information Theory</topic><topic>Datasets</topic><topic>Information Storage and Retrieval</topic><topic>IT in Business</topic><topic>Natural Language Processing (NLP)</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Flouvat, Frédéric</creatorcontrib><creatorcontrib>De Marchi, Fabien</creatorcontrib><creatorcontrib>Petit, Jean-Marc</creatorcontrib><collection>CrossRef</collection><collection>Global News & ABI/Inform Professional</collection><collection>Trade PRO</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection (ProQuest)</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ABI/INFORM Professional Standard</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>ProQuest advanced technologies & aerospace journals</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>One Business (ProQuest)</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>Hyper Article en Ligne (HAL)</collection><jtitle>Journal of intelligent information systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Flouvat, Frédéric</au><au>De Marchi, Fabien</au><au>Petit, Jean-Marc</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A new classification of datasets for frequent itemsets</atitle><jtitle>Journal of intelligent information systems</jtitle><stitle>J Intell Inf Syst</stitle><date>2010-02-01</date><risdate>2010</risdate><volume>34</volume><issue>1</issue><spage>1</spage><epage>19</epage><pages>1-19</pages><issn>0925-9902</issn><eissn>1573-7675</eissn><abstract>The discovery of frequent patterns is a famous problem in data mining. While plenty of algorithms have been proposed during the last decade, only a few contributions have tried to understand the influence of datasets on the algorithms behavior. Being able to explain why certain algorithms are likely to perform very well or very poorly on some datasets is still an open question. In this setting, we describe a thorough experimental study of datasets with respect to frequent itemsets. We study the
distribution
of frequent itemsets with respect to itemsets size together with the distribution of three concise representations: frequent closed, frequent free and frequent essential itemsets. For each of them, we also study the distribution of their positive and negative borders whenever possible. The main outcome of these experiments is a new classification of datasets invariant w.r.t. minsup variations and robust to explain efficiency of several implementations.</abstract><cop>Boston</cop><pub>Springer US</pub><doi>10.1007/s10844-008-0077-0</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0001-7288-0498</orcidid><orcidid>https://orcid.org/0000-0002-0015-745X</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0925-9902 |
ispartof | Journal of intelligent information systems, 2010-02, Vol.34 (1), p.1-19 |
issn | 0925-9902 1573-7675 |
language | eng |
recordid | cdi_hal_primary_oai_HAL_hal_01381427v1 |
source | ABI/INFORM Global; Springer Nature |
subjects | Algorithms Artificial Intelligence Classification Computer Science Data mining Data Structures and Information Theory Datasets Information Storage and Retrieval IT in Business Natural Language Processing (NLP) Studies |
title | A new classification of datasets for frequent itemsets |
url | http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T21%3A55%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20new%20classification%20of%20datasets%20for%20frequent%20itemsets&rft.jtitle=Journal%20of%20intelligent%20information%20systems&rft.au=Flouvat,%20Fr%C3%A9d%C3%A9ric&rft.date=2010-02-01&rft.volume=34&rft.issue=1&rft.spage=1&rft.epage=19&rft.pages=1-19&rft.issn=0925-9902&rft.eissn=1573-7675&rft_id=info:doi/10.1007/s10844-008-0077-0&rft_dat=%3Cproquest_hal_p%3E1944424631%3C/proquest_hal_p%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c381t-ec3ca759487555ec78dfd6523430bacf077a34be9ccf7494cde857f16d68e3c93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=200185293&rft_id=info:pmid/&rfr_iscdi=true |