Loading…

Estimating the structural diversity introduced by decision forest algorithms : A probabilistic approach

Structurally diverse decision trees are important for knowledge discovery and classification/prediction accuracy. Over the years, researchers have devoted much effort to the development of algorithms to increase diversity among the trees within an ensemble. While Kappa is commonly used to measure di...

Full description

Saved in:
Bibliographic Details
Published in:Knowledge-based systems 2024-02, Vol.286, p.111435, Article 111435
Main Authors: Ip, Ryan H.L., Bewong, Michael, Adnan, Md. Nasim, Islam, Md. Zahidul
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Structurally diverse decision trees are important for knowledge discovery and classification/prediction accuracy. Over the years, researchers have devoted much effort to the development of algorithms to increase diversity among the trees within an ensemble. While Kappa is commonly used to measure diversity among the decision trees, it does not measure the ability of the tree building algorithms to introduce diversity. Further, Kappa does not consider the structural diversity amongst the trees. Instead, Kappa measures the diversity of the predictions made from the trees produced, and are dependent on the datasets used. This paper presents a novel data-independent metric, called R index, for measuring the diversity that can be introduced by a decision forest algorithm without building the entire decision forest. The proposed measure is applied to five well-known algorithms that involve bagging and random subspacing. An efficient practical approach for calculating the R index empirically – R finder – is also proposed, and is implemented. Both R finder and Kappa were applied to thirty-two publicly available benchmark datasets under various algorithms to estimate the resulting diversity. The results indicate a generally strong negative correlation between R finder and Kappa, implying that R finder is effective at estimating the diversity of trees without the added computational costs associated with calculating Kappa.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2024.111435