Loading…

PGLCM: efficient parallel mining of closed frequent gradual itemsets

Numerical data (e.g., DNA micro-array data, sensor data) pose a challenging problem to existing frequent pattern mining methods which hardly handle them. In this framework, gradual patterns have been recently proposed to extract covariations of attributes, such as: “When X increases, Y decreases”. T...

Full description

Saved in:

Bibliographic Details
Published in:	Knowledge and information systems 2015-06, Vol.43 (3), p.497-527
Main Authors:	Do, Trong Dinh Thac, Termier, Alexandre, Laurent, Anne, Negrevergne, Benjamin, Omidvar-Tehrani, Behrooz, Amer-Yahia, Sihem
Format:	Article
Language:	English
Subjects:	Age Algorithms Analysis Computer Science Constants Data mining Data Mining and Knowledge Discovery Database Management Datasets Deoxyribonucleic acid Information Storage and Retrieval Information systems Information Systems and Communication Service Information Systems Applications (incl.Internet) IT in Business Microprocessors Parallel processing Pattern analysis Pattern recognition Processors Regular Paper State of the art Studies
Citations:	Items that this one cites Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	Numerical data (e.g., DNA micro-array data, sensor data) pose a challenging problem to existing frequent pattern mining methods which hardly handle them. In this framework, gradual patterns have been recently proposed to extract covariations of attributes, such as: “When X increases, Y decreases”. There exist some algorithms for mining frequent gradual patterns, but they cannot scale to real-world databases. We present in this paper GLCM, the first algorithm for mining closed frequent gradual patterns, which proposes strong complexity guarantees: the mining time is linear with the number of closed frequent gradual itemsets. Our experimental study shows that GLCM is two orders of magnitude faster than the state of the art, with a constant low memory usage. We also present PGLCM, a parallelization of GLCM capable of exploiting multicore processors, with good scale-up properties on complex datasets. These algorithms are the first algorithms capable of mining large real world datasets to discover gradual patterns.
ISSN:	0219-1377 0219-3116
DOI:	10.1007/s10115-014-0749-8