Loading…
Finding the best diversity generation procedures for mining contrast patterns
•Comparison of diversity generation procedures for mining contrast patterns.•Diversity calculated based on the amount of total, unique, and minimal patterns.•Three new deterministic methods for generating diversity in decision trees.•Study of the influence of data type in diversity and accuracy of m...
Saved in:
Published in: | Expert systems with applications 2015-07, Vol.42 (11), p.4859-4866 |
---|---|
Main Authors: | , , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | •Comparison of diversity generation procedures for mining contrast patterns.•Diversity calculated based on the amount of total, unique, and minimal patterns.•Three new deterministic methods for generating diversity in decision trees.•Study of the influence of data type in diversity and accuracy of methods.•Random Forest and Bagging are the best procedures.
Most understandable classifiers are based on contrast patterns, which can be accurately mined from decision trees. Nevertheless, tree diversity must be ensured to mine a representative pattern collection. In this paper, we performed an experimental comparison among different diversity generation procedures. We compare diversity generated by each procedure based on the amount of total, unique, and minimal patterns extracted from the induced tree for different minimal support thresholds. This comparison, together with an accuracy and abstention experiment, shows that Random Forest and Bagging generate the most diverse and accurate pattern collection. Additionally, we study the influence of data type in the results, finding that Random Forest is best for categorical data and Bagging for numerical data. Comparison includes most known diversity generation procedures and three new deterministic procedures introduced here. These deterministic procedures outperform existing deterministic method, but are still outperformed by random procedures. |
---|---|
ISSN: | 0957-4174 1873-6793 |
DOI: | 10.1016/j.eswa.2015.02.028 |