Loading…
The Effect of the Raters' Marginal Distributions on Their Matched Agreement: A Rescaling Framework for Interpreting Kappa
Cohen's κ measures the improvement in classification above chance level and it is the most popular measure of interjudge agreement. Yet, there is considerable confusion about its interpretation. Specifically, researchers often ignore the fact that the observed level of matched agreement is boun...
Saved in:
Published in: | Multivariate behavioral research 2013-11, Vol.48 (6), p.923-952 |
---|---|
Main Authors: | , |
Format: | Article |
Language: | English |
Subjects: | |
Citations: | Items that this one cites Items that cite this one |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Cohen's κ measures the improvement in classification above chance level and it is the most popular measure of interjudge agreement. Yet, there is considerable confusion about its interpretation. Specifically, researchers often ignore the fact that the observed level of matched agreement is bounded from above and below and the bounds are a function of the particular marginal distributions of the table. We propose that these bounds should be used to rescale the components of κ (observed and expected agreement). Rescaling κ in this manner results in κ′, a measure that was originally proposed by
Cohen (1960)
and was largely ignored in both research and practice. This measure provides a common scale for agreement measures of tables with different marginal distributions. It reaches the maximal value of 1 when the judges show the highest level of agreement possible, given their marginal disagreements. We conclude that κ′ should be used to measure the level of matched agreement contingent on a particular set of marginal distributions. The article provides a framework and a set of guidelines that facilitate comparisons between various types of agreement tables. We illustrate our points with simulations and real data from two studies-one involving judges' ratings of baseball players and one involving ratings of essays in high-stakes tests. |
---|---|
ISSN: | 0027-3171 1532-7906 |
DOI: | 10.1080/00273171.2013.830064 |