Loading…

Efficient mining of top-k high utility itemsets through genetic algorithms

Mining high utility itemsets is an emerging and very active research area in data mining. The goal is to mine all itemsets with a utility value, in terms of importance to the user, no less than a predefined threshold value. Setting an appropriate threshold value is not trivial, requiring not only mu...

Full description

Saved in:
Bibliographic Details
Published in:Information sciences 2023-05, Vol.624, p.529-553
Main Authors: Luna, José María, Kiran, Rage Uday, Fournier-Viger, Philippe, Ventura, Sebastián
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Mining high utility itemsets is an emerging and very active research area in data mining. The goal is to mine all itemsets with a utility value, in terms of importance to the user, no less than a predefined threshold value. Setting an appropriate threshold value is not trivial, requiring not only multiple trials but also the know-how in the application field. The advantage of algorithms for mining top-k high utility itemsets is they do not require such a utility threshold, but they suffer from very long runtimes and large memory requirements when large input data is considered. We propose a new genetic algorithm for mining top-k high utility itemsets, named TKHUIM-GA (Top-K High Utility Itemset Mining through Genetic Algorithms). It guides the search process by considering the utility of each item to produce initial solutions and to combine solutions accordingly, reducing the runtime and memory consumption as a result. A highly efficient data representation is utilized to reduce memory usage and runtime. A key advantage of TKHUIM-GA is that it works on positive, negative, integer and real unit utility values unlike existing approaches. Experiments on popular benchmark datasets demonstrate the high performance of the proposal regarding the state-of-the-art algorithms.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2022.12.092