Loading…

DIAFM: An Improved and Novel Approach for Incremental Frequent Itemset Mining

Traditional approaches to data mining are generally designed for small, centralized, and static datasets. However, when a dataset grows at an enormous rate, the algorithms become infeasible in terms of huge consumption of computational and I/O resources. Frequent itemset mining (FIM) is one of the k...

Full description

Saved in:
Bibliographic Details
Published in:Mathematics (Basel) 2024-12, Vol.12 (24), p.3930
Main Authors: Shaikh, Mohsin, Akram, Sabina, Khan, Jawad, Shah, Khalid, Lee, Youngmoon
Format: Article
Language:English
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Traditional approaches to data mining are generally designed for small, centralized, and static datasets. However, when a dataset grows at an enormous rate, the algorithms become infeasible in terms of huge consumption of computational and I/O resources. Frequent itemset mining (FIM) is one of the key algorithms in data mining and finds applications in a variety of domains; however, traditional algorithms do face problems in efficiently processing large and dynamic datasets. This research introduces a distributed incremental approximation frequent itemset mining (DIAFM) algorithm that tackles the mentioned challenges using shard-based approximation within the MapReduce framework. DIAFM minimizes the computational overhead of a program by reducing dataset scans, bypassing exact support checks, and incorporating shard-level error thresholds for an appropriate trade-off between efficiency and accuracy. Extensive experiments have demonstrated that DIAFM reduces runtime by 40–60% compared to traditional methods with losses in accuracy within 1–5%, even for datasets over 500,000 transactions. Its incremental nature ensures that new data increments are handled efficiently without needing to reprocess the entire dataset, making it particularly suitable for real-time, large-scale applications such as transaction analysis and IoT data streams. These results demonstrate the scalability, robustness, and practical applicability of DIAFM and establish it as a competitive and efficient solution for mining frequent itemsets in distributed, dynamic environments.
ISSN:2227-7390
DOI:10.3390/math12243930