Loading…

Impact of Rater Training on Residents Technical Skill Assessments: A Randomized Trial

•Rater training encourages wider use of a technical skill scoring scale•Rater training has a modest impact on the validity of technical skill ratings•Consensus scoring has better rating agreement than individual video review The ACS/APDS Resident Skills Curriculum's Objective Structured Assessm...

Full description

Saved in:
Bibliographic Details
Published in:Journal of surgical education 2022-11, Vol.79 (6), p.e225-e234
Main Authors: Jogerst, Kristen M., Park, Yoon Soo, Anteby, Roi, Sinyard, Robert, Coe, Taylor M., Cassidy, Douglas, McKinley, Sophia K., Petrusa, Emil, Phitayakorn, Roy, Mohapatra, Abhisekh, Gee, Denise W.
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:•Rater training encourages wider use of a technical skill scoring scale•Rater training has a modest impact on the validity of technical skill ratings•Consensus scoring has better rating agreement than individual video review The ACS/APDS Resident Skills Curriculum's Objective Structured Assessment of Technical Skills (OSATS) consists of task-specific checklists and a global rating scale (GRS) completed by raters. Prior work demonstrated a need for rater training. This study evaluates the impact of a rater-training curriculum on scoring discrimination, consistency, and validity for handsewn bowel anastomosis (HBA) and vascular anastomosis (VA). A rater training video model was developed, which included a GRS orientation and anchoring performances representing the range of potential scores. Faculty raters were randomized to rater training or no rater training and were asked to score videos of resident HBA/VA. Consensus scores were assigned to each video using a modified Delphi process (Gold Score). Trained and untrained scores were analyzed for discrimination and score spread and compared to the Gold Score for relative agreement. Eight general and eight vascular surgery faculty were randomized to score 24 HBA/VA videos. Rater training increased rater discrimination and decreased rating scale shrinkage for both VA (mean trained score: 2.83, variance 1.88; mean untrained score: 3.1, variance 1.14, p = 0.007) and HBA (mean trained score: 3.52, variance 1.44; mean untrained score: 3.42, variance 0.96, p = 0.033). On validity analyses, a comparison between each rater group vs Gold Score revealed a moderate training impact for VA, trained κ=0.65 vs untrained κ=0.57 and no impact for HBA, R1 κ = 0.71 vs R2 κ = 0.73. A rater-training curriculum improved raters’ ability to differentiate performance levels and use a wider range of the scoring scale. However, despite rater training, there was persistent disagreement between faculty GRS scores with no groups reaching the agreement threshold for formative assessment. If technical skill exams are incorporated into high stakes assessments, consensus ratings via a standard setting process are likely a more valid option than individual faculty ratings.
ISSN:1931-7204
1878-7452
DOI:10.1016/j.jsurg.2022.09.013