Loading…

The Effect of Instance-Space Partition on Significance

This paper demonstrates experimentally that concluding which induction algorithm is more accurate based on the results from one partition of the instances into the cross-validation folds may lead to statistically erroneous conclusions. Comparing two decision tree induction and one naive-bayes induct...

Full description

Saved in:

Bibliographic Details
Published in:	Machine learning 2001-03, Vol.42 (3), p.269-286
Main Authors:	Bradford, Jeffrey P, Brodley, Carla E
Format:	Article
Language:	English
Subjects:	Algorithms Decision trees
Citations:	Items that cite this one
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	This paper demonstrates experimentally that concluding which induction algorithm is more accurate based on the results from one partition of the instances into the cross-validation folds may lead to statistically erroneous conclusions. Comparing two decision tree induction and one naive-bayes induction algorithms, we find situations in which one algorithm is judged more accurate at the p = 0.05 level with one partition of the training instances but the other algorithm is judged more accurate at the p = 0.05 level with an alternate partition. We recommend a new significance procedure that involves performing cross-validation using multiple instance-space partitions. Significance is determined by applying the paired Student t-test separately to the results from each cross-validation partition, averaging their values, and converting this averaged value into a significance value.[PUBLICATION ABSTRACT]
ISSN:	0885-6125 1573-0565
DOI:	10.1023/A:1007613918580