Loading…
Decision Qualities of Bayes Factor and p Value-Based Hypothesis Testing
The purpose of this article is to investigate the decision qualities of the Bayes factor (BF) method compared with the p value-based null hypothesis significance testing (NHST). The performance of the 2 methods is assessed in terms of the false- and true-positive rates, as well as the false-discover...
Saved in:
Published in: | Psychological methods 2017-06, Vol.22 (2), p.340-360 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | The purpose of this article is to investigate the decision qualities of the Bayes factor (BF) method compared with the p value-based null hypothesis significance testing (NHST). The performance of the 2 methods is assessed in terms of the false- and true-positive rates, as well as the false-discovery rates and the posterior probabilities of the null hypothesis for 2 different models: an independent-samples t test and an analysis of variance (ANOVA) model with 2 random factors. Our simulation study results showed the following: (a) The common BF > 3 criterion is more conservative than the NHST α = .05 criterion, and it corresponds better with the α = .01 criterion. (b) An increasing sample size has a different effect on the false-positive rate and the false-discovery rate, depending on whether the BF or NHST approach is used. (c) When effect sizes are randomly sampled from the prior, power curves tend to be flat compared with when effect sizes are prespecified. (d) The larger the scale factor (or the wider the prior), the more conservative the inferential decision is. (e) The false-positive and true-positive rates of the BF method are very sensitive to the scale factor when the effect size is small. (f) While the posterior probabilities of the null hypothesis ideally follow from the BF value, they can be surprisingly high using NHST. In general, these findings were consistent independent of which of the 2 different models was used.
Translational Abstract
The traditional statistical tests rely on the p value, the probability of the data given that the null hypothesis, H0, is true instead of the alternative hypothesis, H1. An alternative approach is to compare the probabilities of H0 and H1 being true given the data. The BF expresses how much the data are in favor of H1 compared with H0. Both a small p value and a large BF value can be used to decide in favor of H1. In the current article, based on simulation studies we compared the 2 types of criteria, p and BF, for 2 different types of data and using 2 sets of decision criteria: p < .05 or p < .01 and BF > 1 or BF > 3. The following was found: (a) The commonly used BF > 3 criterion works well but is more conservative than p < .05 and about as strict as p < .01. (b) The BF > 1 criterion is not sufficiently robust because it is too sensitive to the specifics of H1. (c) Using the BF, the proportion of mistaken decisions in favor of H1 goes to zero, but this is not true when using p. (d) The p value is biased a |
---|---|
ISSN: | 1082-989X 1939-1463 |
DOI: | 10.1037/met0000140 |