Loading…

A new classification of datasets for frequent itemsets

The discovery of frequent patterns is a famous problem in data mining. While plenty of algorithms have been proposed during the last decade, only a few contributions have tried to understand the influence of datasets on the algorithms behavior. Being able to explain why certain algorithms are likely...

Full description

Saved in:
Bibliographic Details
Published in:Journal of intelligent information systems 2010-02, Vol.34 (1), p.1-19
Main Authors: Flouvat, Frédéric, De Marchi, Fabien, Petit, Jean-Marc
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-c381t-ec3ca759487555ec78dfd6523430bacf077a34be9ccf7494cde857f16d68e3c93
cites cdi_FETCH-LOGICAL-c381t-ec3ca759487555ec78dfd6523430bacf077a34be9ccf7494cde857f16d68e3c93
container_end_page 19
container_issue 1
container_start_page 1
container_title Journal of intelligent information systems
container_volume 34
creator Flouvat, Frédéric
De Marchi, Fabien
Petit, Jean-Marc
description The discovery of frequent patterns is a famous problem in data mining. While plenty of algorithms have been proposed during the last decade, only a few contributions have tried to understand the influence of datasets on the algorithms behavior. Being able to explain why certain algorithms are likely to perform very well or very poorly on some datasets is still an open question. In this setting, we describe a thorough experimental study of datasets with respect to frequent itemsets. We study the distribution of frequent itemsets with respect to itemsets size together with the distribution of three concise representations: frequent closed, frequent free and frequent essential itemsets. For each of them, we also study the distribution of their positive and negative borders whenever possible. The main outcome of these experiments is a new classification of datasets invariant w.r.t. minsup variations and robust to explain efficiency of several implementations.
doi_str_mv 10.1007/s10844-008-0077-0
format article
fullrecord <record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_01381427v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1944424631</sourcerecordid><originalsourceid>FETCH-LOGICAL-c381t-ec3ca759487555ec78dfd6523430bacf077a34be9ccf7494cde857f16d68e3c93</originalsourceid><addsrcrecordid>eNp1kEFLAzEQhYMoWKs_wNviRTysTjbJJjmWolYoeNFzSLMT3bLd1GSr-O9NWVEQPAwDw_dm3jxCzilcUwB5kygozksAlUvKEg7IhArJSllLcUgmoCtRag3VMTlJaQ0AWtUwIfWs6PGjcJ1NqfWts0Mb-iL4orGDTTikwodY-IhvO-yHoh1ws5-ekiNvu4Rn331Knu9un-aLcvl4_zCfLUvHFB1KdMxZKTRXUgiBTqrGN7WoGGewss5np5bxFWrnvOSauwaVkJ7WTa2QOc2m5Grc-2o7s43txsZPE2xrFrOl2c-A5kO8ku80s5cju40hu02D2bTJYdfZHsMuGclZTRkTPJMXf8h12MU-P2IqAKpEpVmG6Ai5GFKK6H_uUzD7zM2YucmZm33mBrKmGjUps_0Lxt_F_4u-ANl3gik</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>200185293</pqid></control><display><type>article</type><title>A new classification of datasets for frequent itemsets</title><source>ABI/INFORM Global</source><source>Springer Nature</source><creator>Flouvat, Frédéric ; De Marchi, Fabien ; Petit, Jean-Marc</creator><creatorcontrib>Flouvat, Frédéric ; De Marchi, Fabien ; Petit, Jean-Marc</creatorcontrib><description>The discovery of frequent patterns is a famous problem in data mining. While plenty of algorithms have been proposed during the last decade, only a few contributions have tried to understand the influence of datasets on the algorithms behavior. Being able to explain why certain algorithms are likely to perform very well or very poorly on some datasets is still an open question. In this setting, we describe a thorough experimental study of datasets with respect to frequent itemsets. We study the distribution of frequent itemsets with respect to itemsets size together with the distribution of three concise representations: frequent closed, frequent free and frequent essential itemsets. For each of them, we also study the distribution of their positive and negative borders whenever possible. The main outcome of these experiments is a new classification of datasets invariant w.r.t. minsup variations and robust to explain efficiency of several implementations.</description><identifier>ISSN: 0925-9902</identifier><identifier>EISSN: 1573-7675</identifier><identifier>DOI: 10.1007/s10844-008-0077-0</identifier><language>eng</language><publisher>Boston: Springer US</publisher><subject>Algorithms ; Artificial Intelligence ; Classification ; Computer Science ; Data mining ; Data Structures and Information Theory ; Datasets ; Information Storage and Retrieval ; IT in Business ; Natural Language Processing (NLP) ; Studies</subject><ispartof>Journal of intelligent information systems, 2010-02, Vol.34 (1), p.1-19</ispartof><rights>Springer Science+Business Media, LLC 2009</rights><rights>Springer Science+Business Media, LLC 2010</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c381t-ec3ca759487555ec78dfd6523430bacf077a34be9ccf7494cde857f16d68e3c93</citedby><cites>FETCH-LOGICAL-c381t-ec3ca759487555ec78dfd6523430bacf077a34be9ccf7494cde857f16d68e3c93</cites><orcidid>0000-0001-7288-0498 ; 0000-0002-0015-745X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.proquest.com/docview/200185293/fulltextPDF?pq-origsite=primo$$EPDF$$P50$$Gproquest$$H</linktopdf><linktohtml>$$Uhttps://www.proquest.com/docview/200185293?pq-origsite=primo$$EHTML$$P50$$Gproquest$$H</linktohtml><link.rule.ids>230,314,780,784,885,11688,27924,27925,36060,36061,44363,74895</link.rule.ids><backlink>$$Uhttps://hal.science/hal-01381427$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Flouvat, Frédéric</creatorcontrib><creatorcontrib>De Marchi, Fabien</creatorcontrib><creatorcontrib>Petit, Jean-Marc</creatorcontrib><title>A new classification of datasets for frequent itemsets</title><title>Journal of intelligent information systems</title><addtitle>J Intell Inf Syst</addtitle><description>The discovery of frequent patterns is a famous problem in data mining. While plenty of algorithms have been proposed during the last decade, only a few contributions have tried to understand the influence of datasets on the algorithms behavior. Being able to explain why certain algorithms are likely to perform very well or very poorly on some datasets is still an open question. In this setting, we describe a thorough experimental study of datasets with respect to frequent itemsets. We study the distribution of frequent itemsets with respect to itemsets size together with the distribution of three concise representations: frequent closed, frequent free and frequent essential itemsets. For each of them, we also study the distribution of their positive and negative borders whenever possible. The main outcome of these experiments is a new classification of datasets invariant w.r.t. minsup variations and robust to explain efficiency of several implementations.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Classification</subject><subject>Computer Science</subject><subject>Data mining</subject><subject>Data Structures and Information Theory</subject><subject>Datasets</subject><subject>Information Storage and Retrieval</subject><subject>IT in Business</subject><subject>Natural Language Processing (NLP)</subject><subject>Studies</subject><issn>0925-9902</issn><issn>1573-7675</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2010</creationdate><recordtype>article</recordtype><sourceid>M0C</sourceid><recordid>eNp1kEFLAzEQhYMoWKs_wNviRTysTjbJJjmWolYoeNFzSLMT3bLd1GSr-O9NWVEQPAwDw_dm3jxCzilcUwB5kygozksAlUvKEg7IhArJSllLcUgmoCtRag3VMTlJaQ0AWtUwIfWs6PGjcJ1NqfWts0Mb-iL4orGDTTikwodY-IhvO-yHoh1ws5-ekiNvu4Rn331Knu9un-aLcvl4_zCfLUvHFB1KdMxZKTRXUgiBTqrGN7WoGGewss5np5bxFWrnvOSauwaVkJ7WTa2QOc2m5Grc-2o7s43txsZPE2xrFrOl2c-A5kO8ku80s5cju40hu02D2bTJYdfZHsMuGclZTRkTPJMXf8h12MU-P2IqAKpEpVmG6Ai5GFKK6H_uUzD7zM2YucmZm33mBrKmGjUps_0Lxt_F_4u-ANl3gik</recordid><startdate>20100201</startdate><enddate>20100201</enddate><creator>Flouvat, Frédéric</creator><creator>De Marchi, Fabien</creator><creator>Petit, Jean-Marc</creator><general>Springer US</general><general>Springer Nature B.V</general><general>Springer Verlag</general><scope>AAYXX</scope><scope>CITATION</scope><scope>0U~</scope><scope>1-H</scope><scope>3V.</scope><scope>7SC</scope><scope>7WY</scope><scope>7WZ</scope><scope>7XB</scope><scope>87Z</scope><scope>8AL</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>8FK</scope><scope>8FL</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BEZIV</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>FRNLG</scope><scope>F~G</scope><scope>GNUQQ</scope><scope>HCIFZ</scope><scope>JQ2</scope><scope>K60</scope><scope>K6~</scope><scope>K7-</scope><scope>L.-</scope><scope>L.0</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>M0C</scope><scope>M0N</scope><scope>P5Z</scope><scope>P62</scope><scope>PQBIZ</scope><scope>PQBZA</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>Q9U</scope><scope>1XC</scope><orcidid>https://orcid.org/0000-0001-7288-0498</orcidid><orcidid>https://orcid.org/0000-0002-0015-745X</orcidid></search><sort><creationdate>20100201</creationdate><title>A new classification of datasets for frequent itemsets</title><author>Flouvat, Frédéric ; De Marchi, Fabien ; Petit, Jean-Marc</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c381t-ec3ca759487555ec78dfd6523430bacf077a34be9ccf7494cde857f16d68e3c93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2010</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Classification</topic><topic>Computer Science</topic><topic>Data mining</topic><topic>Data Structures and Information Theory</topic><topic>Datasets</topic><topic>Information Storage and Retrieval</topic><topic>IT in Business</topic><topic>Natural Language Processing (NLP)</topic><topic>Studies</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Flouvat, Frédéric</creatorcontrib><creatorcontrib>De Marchi, Fabien</creatorcontrib><creatorcontrib>Petit, Jean-Marc</creatorcontrib><collection>CrossRef</collection><collection>Global News &amp; ABI/Inform Professional</collection><collection>Trade PRO</collection><collection>ProQuest Central (Corporate)</collection><collection>Computer and Information Systems Abstracts</collection><collection>ABI/INFORM Collection (ProQuest)</collection><collection>ABI/INFORM Global (PDF only)</collection><collection>ProQuest Central (purchase pre-March 2016)</collection><collection>ABI/INFORM Global (Alumni Edition)</collection><collection>Computing Database (Alumni Edition)</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni) (purchase pre-March 2016)</collection><collection>ABI/INFORM Collection (Alumni Edition)</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Business Premium Collection</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>Business Premium Collection (Alumni)</collection><collection>ABI/INFORM Global (Corporate)</collection><collection>ProQuest Central Student</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Computer Science Collection</collection><collection>ProQuest Business Collection (Alumni Edition)</collection><collection>ProQuest Business Collection</collection><collection>Computer Science Database</collection><collection>ABI/INFORM Professional Advanced</collection><collection>ABI/INFORM Professional Standard</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>ABI/INFORM Global</collection><collection>Computing Database</collection><collection>ProQuest advanced technologies &amp; aerospace journals</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>One Business (ProQuest)</collection><collection>ProQuest One Business (Alumni)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central Basic</collection><collection>Hyper Article en Ligne (HAL)</collection><jtitle>Journal of intelligent information systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Flouvat, Frédéric</au><au>De Marchi, Fabien</au><au>Petit, Jean-Marc</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A new classification of datasets for frequent itemsets</atitle><jtitle>Journal of intelligent information systems</jtitle><stitle>J Intell Inf Syst</stitle><date>2010-02-01</date><risdate>2010</risdate><volume>34</volume><issue>1</issue><spage>1</spage><epage>19</epage><pages>1-19</pages><issn>0925-9902</issn><eissn>1573-7675</eissn><abstract>The discovery of frequent patterns is a famous problem in data mining. While plenty of algorithms have been proposed during the last decade, only a few contributions have tried to understand the influence of datasets on the algorithms behavior. Being able to explain why certain algorithms are likely to perform very well or very poorly on some datasets is still an open question. In this setting, we describe a thorough experimental study of datasets with respect to frequent itemsets. We study the distribution of frequent itemsets with respect to itemsets size together with the distribution of three concise representations: frequent closed, frequent free and frequent essential itemsets. For each of them, we also study the distribution of their positive and negative borders whenever possible. The main outcome of these experiments is a new classification of datasets invariant w.r.t. minsup variations and robust to explain efficiency of several implementations.</abstract><cop>Boston</cop><pub>Springer US</pub><doi>10.1007/s10844-008-0077-0</doi><tpages>19</tpages><orcidid>https://orcid.org/0000-0001-7288-0498</orcidid><orcidid>https://orcid.org/0000-0002-0015-745X</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 0925-9902
ispartof Journal of intelligent information systems, 2010-02, Vol.34 (1), p.1-19
issn 0925-9902
1573-7675
language eng
recordid cdi_hal_primary_oai_HAL_hal_01381427v1
source ABI/INFORM Global; Springer Nature
subjects Algorithms
Artificial Intelligence
Classification
Computer Science
Data mining
Data Structures and Information Theory
Datasets
Information Storage and Retrieval
IT in Business
Natural Language Processing (NLP)
Studies
title A new classification of datasets for frequent itemsets
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-30T21%3A55%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20new%20classification%20of%20datasets%20for%20frequent%20itemsets&rft.jtitle=Journal%20of%20intelligent%20information%20systems&rft.au=Flouvat,%20Fr%C3%A9d%C3%A9ric&rft.date=2010-02-01&rft.volume=34&rft.issue=1&rft.spage=1&rft.epage=19&rft.pages=1-19&rft.issn=0925-9902&rft.eissn=1573-7675&rft_id=info:doi/10.1007/s10844-008-0077-0&rft_dat=%3Cproquest_hal_p%3E1944424631%3C/proquest_hal_p%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-c381t-ec3ca759487555ec78dfd6523430bacf077a34be9ccf7494cde857f16d68e3c93%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=200185293&rft_id=info:pmid/&rfr_iscdi=true