Loading…

Using Linkage Sets to Improve Connectedness in Rater Response Model Estimation

Using item‐response theory to model rater effects provides an alternative solution for rater monitoring and diagnosis, compared to using standard performance metrics. In order to fit such models, the ratings data must be sufficiently connected in order to estimate rater effects. Due to popular ratin...

Full description

Saved in:
Bibliographic Details
Published in:Journal of educational measurement 2023-09, Vol.60 (3), p.428-454
Main Authors: Casabianca, Jodi M., Donoghue, John R., Shin, Hyo Jeong, Chao, Szu‐Fu, Choi, Ikkyu
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Using item‐response theory to model rater effects provides an alternative solution for rater monitoring and diagnosis, compared to using standard performance metrics. In order to fit such models, the ratings data must be sufficiently connected in order to estimate rater effects. Due to popular rating designs used in large‐scale testing scenarios, there tends to be a large proportion of missing data, yielding sparse matrices and estimation issues. In this article, we explore the impact of different types of connectedness, or linkage, brought about by using a linkage set—a collection of responses scored by most or all raters. We also explore the impact of the properties and composition of the linkage set, the different connectedness yielded from different rating designs, and the role of scores from automated scoring engines. In designing monitoring systems using the rater response version of the generalized partial credit model, the study results suggest use of a linkage set, especially a large one that is comprised of responses representing the full score scale. Results also show that a double‐human‐scoring design provides more connectedness than a design with one human and an automated scoring engine. Furthermore, scores from automated scoring engines do not provide adequate connectedness. We discuss considerations for operational implementation and further study.
ISSN:0022-0655
1745-3984
DOI:10.1111/jedm.12360