Loading…
WBIN-Tree: A Single Scan Based Complete, Compact and Abstract Tree for Discovering Rare and Frequent Itemset Using Parallel Technique
Data analytics is an integral part of strategic decision making in various fields but not limited to business, education and healthcare systems. Existing research works focus on the discovery of itemsets with rare antecedents and consequent or frequent antecedents and consequent. Analysis of associa...
Saved in:
Published in: | IEEE access 2024, Vol.12, p.6281-6297 |
---|---|
Main Authors: | , , , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Data analytics is an integral part of strategic decision making in various fields but not limited to business, education and healthcare systems. Existing research works focus on the discovery of itemsets with rare antecedents and consequent or frequent antecedents and consequent. Analysis of association among itemsets with rare antecedents and frequent consequent is equally important to gain valuable insights before making crucial decisions. Mining these itemsets from large datasets is time and resource intensive process. Expedition in the process of mining aids in quick decision making and hence, the entire dataset needs to be stored in the RAM. In this paper, a novel Weighted Binary Count Tree (WBIN-Tree) is proposed and implemented in CUDA to exploit the power of GPU and discover rules with rare antecedent and frequent consequent using parallel approach. WBIN-Tree stores the entire dataset in an abstract, complete and compact form in the RAM using single database scan. WBIN-Tree is compared with existing sequential and parallel algorithms by varying the data size and dimension. The performance evaluation of WBIN-Tree showed promising results, proving to be the most time and space efficient algorithm to store the entire large dataset in the RAM. However, based on the size of the GPU, the performance drops when executed on datasets with large dimensions which could be handled by processing the attributes in batches. Additionally, a case study is included to understand the importance of mining association rules with rare antecedent and frequent consequent by executing the algorithm on breast cancer dataset. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2024.3350737 |