Loading…

Interrater reliability of grading strength of evidence varies with the complexity of the evidence in systematic reviews

Abstract Objectives To examine consistency (interrater reliability) of applying guidance for grading strength of evidence in systematic reviews for the Agency for Healthcare Research and Quality Evidence-based Practice Center program. Study Design and Setting Using data from two systematic reviews,...

Full description

Saved in:
Bibliographic Details
Published in:Journal of clinical epidemiology 2013-10, Vol.66 (10), p.1105-1117.e1
Main Authors: Berkman, Nancy D, Lohr, Kathleen N, Morgan, Laura C, Kuo, Tzy-Mey, Morton, Sally C
Format: Article
Language:English
Subjects:
Citations: Items that this one cites
Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Abstract Objectives To examine consistency (interrater reliability) of applying guidance for grading strength of evidence in systematic reviews for the Agency for Healthcare Research and Quality Evidence-based Practice Center program. Study Design and Setting Using data from two systematic reviews, authors tested the main components of the approach: (1) scoring evidence on the four required domains (risk of bias, consistency, directness, and precision) separately for randomized controlled trials (RCTs) and observational studies and (2) developing an overall strength of evidence grade, given the scores for each of these domains. Results Conclusions about overall strength of evidence reached by experienced systematic reviewers based on the same evidence can differ greatly, especially for complex bodies of evidence. Current instructions may be sufficient for straightforward quantitative evaluations that use meta-analysis for summarizing RCT findings. In contrast, agreement suffered when evaluations did not lend themselves to meta-analysis and reviewers needed to rely on their own qualitative judgment. Three areas raised particular concern: (1) evidence from a combination of RCTs and observational studies, (2) outcomes with differing measurement, and (3) evidence that appeared to show no differences in outcomes. Conclusion Interrater reliability was highly variable for scoring strength of evidence domains and combining scores to reach overall strength of evidence grades. Future research can help in establishing improved methods for evaluating these complex bodies of evidence.
ISSN:0895-4356
1878-5921
DOI:10.1016/j.jclinepi.2013.06.002