Loading…

Updated Guidelines on Selecting an Intraclass Correlation Coefficient for Interrater Reliability, With Applications to Incomplete Observational Designs

Several intraclass correlation coefficients (ICCs) are available to assess the interrater reliability (IRR) of observational measurements. Selecting an ICC is complicated, and existing guidelines have three major limitations. First, they do not discuss incomplete designs, in which raters partially v...

Full description

Saved in:
Bibliographic Details
Published in:Psychological methods 2024-10, Vol.29 (5), p.967-979
Main Authors: ten Hove, Debby, Jorgensen, Terrence D., van der Ark, L. Andries
Format: Article
Language:English
Subjects:
Citations: Items that cite this one
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
cited_by cdi_FETCH-LOGICAL-a351t-78c2fe775288d76e9eed2860093c419cee473b309951d93ba937e2c27db3b1a43
cites
container_end_page 979
container_issue 5
container_start_page 967
container_title Psychological methods
container_volume 29
creator ten Hove, Debby
Jorgensen, Terrence D.
van der Ark, L. Andries
description Several intraclass correlation coefficients (ICCs) are available to assess the interrater reliability (IRR) of observational measurements. Selecting an ICC is complicated, and existing guidelines have three major limitations. First, they do not discuss incomplete designs, in which raters partially vary across subjects. Second, they provide no coherent perspective on the error variance in an ICC, clouding the choice between the available coefficients. Third, the distinction between fixed or random raters is often misunderstood. Based on generalizability theory (GT), we provide updated guidelines on selecting an ICC for IRR, which are applicable to both complete and incomplete observational designs. We challenge conventional wisdom about ICCs for IRR by claiming that raters should seldom (if ever) be considered fixed. Also, we clarify how to interpret ICCs in the case of unbalanced and incomplete designs. We explain four choices a researcher needs to make when selecting an ICC for IRR, and guide researchers through these choices by means of a flowchart, which we apply to three empirical examples from clinical and developmental domains. In the Discussion, we provide guidance in reporting, interpreting, and estimating ICCs, and propose future directions for research into the ICCs for IRR. Translational Abstract Several intraclass correlation coefficients (ICCs) are available to assess the interrater reliability (IRR) of observational measurements. Selecting an ICC is complicated, and existing guidelines have three major limitations. First, they do not discuss incomplete designs, in which raters partially vary across subjects. Second, they provide no coherent perspective on the error variance in an ICC, clouding the choice between the available coefficients. Third, the distinction between fixed or random raters is often misunderstood. Based on generalizability theory (GT), we provide updated guidelines on selecting an ICC for IRR, which are applicable to both complete and incomplete observational designs. We challenge conventional wisdom about ICCs for IRR by claiming that raters should seldom (if ever) be considered fixed. Also, we clarify how to interpret ICCs in the case of unbalanced and incomplete designs. We explain four choices a researcher needs to make when selecting an ICC for IRR, and guide researchers through these choices by means of a flowchart, which we apply to three empirical examples from clinical and developmental domains. In the Discussion,
doi_str_mv 10.1037/met0000516
format article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2709019672</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2708804511</sourcerecordid><originalsourceid>FETCH-LOGICAL-a351t-78c2fe775288d76e9eed2860093c419cee473b309951d93ba937e2c27db3b1a43</originalsourceid><addsrcrecordid>eNpd0dtqFTEUBuAgij3ojQ8gAW9EHc1hZpJclm2thUJBLXoXMpk1NSWTjElG2E_i65q9d1UwNwmsLz8LfoSeUfKWEi7ezVBIPR3tH6BjqrhqaNvzh_VNJGuUVN-O0EnOd4TQlsv2MTriPWkl6dgx-nWzjKbAiC9WN4J3ATKOAX8GD7a4cItNwJehJGO9yRlvYkrgTXHVbCJMk7MOQsFTTDsGKdWwhD_VJDM478r2Df7qynd8tize2f3HjEus2MZ58VAAXw8Z0s_9yHj8HrK7DfkJejQZn-Hp_X2Kbj6cf9l8bK6uLy43Z1eN4R0tjZCWTSBEx6QcRQ8KYGSyJ0Rx21JlAVrBB06U6uio-GAUF8AsE-PAB2pafopeHnKXFH-skIueXbbgvQkQ16yZIIpQ1QtW6Yv_6F1cU915r6QkbUdpVa8OyqaYc4JJL8nNJm01JXpXl_5XV8XP7yPXYYbxL_3TTwWvD8AsRi95a00qznrIdq1FhLIL00zpTtcV-W_5rKHs</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2708804511</pqid></control><display><type>article</type><title>Updated Guidelines on Selecting an Intraclass Correlation Coefficient for Interrater Reliability, With Applications to Incomplete Observational Designs</title><source>EBSCOhost APA PsycARTICLES</source><creator>ten Hove, Debby ; Jorgensen, Terrence D. ; van der Ark, L. Andries</creator><contributor>Steinley, Douglas</contributor><creatorcontrib>ten Hove, Debby ; Jorgensen, Terrence D. ; van der Ark, L. Andries ; Steinley, Douglas</creatorcontrib><description>Several intraclass correlation coefficients (ICCs) are available to assess the interrater reliability (IRR) of observational measurements. Selecting an ICC is complicated, and existing guidelines have three major limitations. First, they do not discuss incomplete designs, in which raters partially vary across subjects. Second, they provide no coherent perspective on the error variance in an ICC, clouding the choice between the available coefficients. Third, the distinction between fixed or random raters is often misunderstood. Based on generalizability theory (GT), we provide updated guidelines on selecting an ICC for IRR, which are applicable to both complete and incomplete observational designs. We challenge conventional wisdom about ICCs for IRR by claiming that raters should seldom (if ever) be considered fixed. Also, we clarify how to interpret ICCs in the case of unbalanced and incomplete designs. We explain four choices a researcher needs to make when selecting an ICC for IRR, and guide researchers through these choices by means of a flowchart, which we apply to three empirical examples from clinical and developmental domains. In the Discussion, we provide guidance in reporting, interpreting, and estimating ICCs, and propose future directions for research into the ICCs for IRR. Translational Abstract Several intraclass correlation coefficients (ICCs) are available to assess the interrater reliability (IRR) of observational measurements. Selecting an ICC is complicated, and existing guidelines have three major limitations. First, they do not discuss incomplete designs, in which raters partially vary across subjects. Second, they provide no coherent perspective on the error variance in an ICC, clouding the choice between the available coefficients. Third, the distinction between fixed or random raters is often misunderstood. Based on generalizability theory (GT), we provide updated guidelines on selecting an ICC for IRR, which are applicable to both complete and incomplete observational designs. We challenge conventional wisdom about ICCs for IRR by claiming that raters should seldom (if ever) be considered fixed. Also, we clarify how to interpret ICCs in the case of unbalanced and incomplete designs. We explain four choices a researcher needs to make when selecting an ICC for IRR, and guide researchers through these choices by means of a flowchart, which we apply to three empirical examples from clinical and developmental domains. In the Discussion, we provide guidance in reporting, interpreting, and estimating ICCs, and propose future directions for research into the ICCs for IRR.</description><identifier>ISSN: 1082-989X</identifier><identifier>ISSN: 1939-1463</identifier><identifier>EISSN: 1939-1463</identifier><identifier>DOI: 10.1037/met0000516</identifier><identifier>PMID: 36048052</identifier><language>eng</language><publisher>United States: American Psychological Association</publisher><subject>Experimental Design ; Human ; Interrater Reliability ; Measurement ; Observation Methods ; Statistical Correlation ; Theories</subject><ispartof>Psychological methods, 2024-10, Vol.29 (5), p.967-979</ispartof><rights>2022 American Psychological Association</rights><rights>2022, American Psychological Association</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a351t-78c2fe775288d76e9eed2860093c419cee473b309951d93ba937e2c27db3b1a43</citedby><orcidid>0000-0001-5111-6773 ; 0000-0003-3131-7943 ; 0000-0002-1335-4452</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27915,27916</link.rule.ids><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/36048052$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><contributor>Steinley, Douglas</contributor><creatorcontrib>ten Hove, Debby</creatorcontrib><creatorcontrib>Jorgensen, Terrence D.</creatorcontrib><creatorcontrib>van der Ark, L. Andries</creatorcontrib><title>Updated Guidelines on Selecting an Intraclass Correlation Coefficient for Interrater Reliability, With Applications to Incomplete Observational Designs</title><title>Psychological methods</title><addtitle>Psychol Methods</addtitle><description>Several intraclass correlation coefficients (ICCs) are available to assess the interrater reliability (IRR) of observational measurements. Selecting an ICC is complicated, and existing guidelines have three major limitations. First, they do not discuss incomplete designs, in which raters partially vary across subjects. Second, they provide no coherent perspective on the error variance in an ICC, clouding the choice between the available coefficients. Third, the distinction between fixed or random raters is often misunderstood. Based on generalizability theory (GT), we provide updated guidelines on selecting an ICC for IRR, which are applicable to both complete and incomplete observational designs. We challenge conventional wisdom about ICCs for IRR by claiming that raters should seldom (if ever) be considered fixed. Also, we clarify how to interpret ICCs in the case of unbalanced and incomplete designs. We explain four choices a researcher needs to make when selecting an ICC for IRR, and guide researchers through these choices by means of a flowchart, which we apply to three empirical examples from clinical and developmental domains. In the Discussion, we provide guidance in reporting, interpreting, and estimating ICCs, and propose future directions for research into the ICCs for IRR. Translational Abstract Several intraclass correlation coefficients (ICCs) are available to assess the interrater reliability (IRR) of observational measurements. Selecting an ICC is complicated, and existing guidelines have three major limitations. First, they do not discuss incomplete designs, in which raters partially vary across subjects. Second, they provide no coherent perspective on the error variance in an ICC, clouding the choice between the available coefficients. Third, the distinction between fixed or random raters is often misunderstood. Based on generalizability theory (GT), we provide updated guidelines on selecting an ICC for IRR, which are applicable to both complete and incomplete observational designs. We challenge conventional wisdom about ICCs for IRR by claiming that raters should seldom (if ever) be considered fixed. Also, we clarify how to interpret ICCs in the case of unbalanced and incomplete designs. We explain four choices a researcher needs to make when selecting an ICC for IRR, and guide researchers through these choices by means of a flowchart, which we apply to three empirical examples from clinical and developmental domains. In the Discussion, we provide guidance in reporting, interpreting, and estimating ICCs, and propose future directions for research into the ICCs for IRR.</description><subject>Experimental Design</subject><subject>Human</subject><subject>Interrater Reliability</subject><subject>Measurement</subject><subject>Observation Methods</subject><subject>Statistical Correlation</subject><subject>Theories</subject><issn>1082-989X</issn><issn>1939-1463</issn><issn>1939-1463</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpd0dtqFTEUBuAgij3ojQ8gAW9EHc1hZpJclm2thUJBLXoXMpk1NSWTjElG2E_i65q9d1UwNwmsLz8LfoSeUfKWEi7ezVBIPR3tH6BjqrhqaNvzh_VNJGuUVN-O0EnOd4TQlsv2MTriPWkl6dgx-nWzjKbAiC9WN4J3ATKOAX8GD7a4cItNwJehJGO9yRlvYkrgTXHVbCJMk7MOQsFTTDsGKdWwhD_VJDM478r2Df7qynd8tize2f3HjEus2MZ58VAAXw8Z0s_9yHj8HrK7DfkJejQZn-Hp_X2Kbj6cf9l8bK6uLy43Z1eN4R0tjZCWTSBEx6QcRQ8KYGSyJ0Rx21JlAVrBB06U6uio-GAUF8AsE-PAB2pafopeHnKXFH-skIueXbbgvQkQ16yZIIpQ1QtW6Yv_6F1cU915r6QkbUdpVa8OyqaYc4JJL8nNJm01JXpXl_5XV8XP7yPXYYbxL_3TTwWvD8AsRi95a00qznrIdq1FhLIL00zpTtcV-W_5rKHs</recordid><startdate>20241001</startdate><enddate>20241001</enddate><creator>ten Hove, Debby</creator><creator>Jorgensen, Terrence D.</creator><creator>van der Ark, L. Andries</creator><general>American Psychological Association</general><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7RZ</scope><scope>PSYQQ</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-5111-6773</orcidid><orcidid>https://orcid.org/0000-0003-3131-7943</orcidid><orcidid>https://orcid.org/0000-0002-1335-4452</orcidid></search><sort><creationdate>20241001</creationdate><title>Updated Guidelines on Selecting an Intraclass Correlation Coefficient for Interrater Reliability, With Applications to Incomplete Observational Designs</title><author>ten Hove, Debby ; Jorgensen, Terrence D. ; van der Ark, L. Andries</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a351t-78c2fe775288d76e9eed2860093c419cee473b309951d93ba937e2c27db3b1a43</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Experimental Design</topic><topic>Human</topic><topic>Interrater Reliability</topic><topic>Measurement</topic><topic>Observation Methods</topic><topic>Statistical Correlation</topic><topic>Theories</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>ten Hove, Debby</creatorcontrib><creatorcontrib>Jorgensen, Terrence D.</creatorcontrib><creatorcontrib>van der Ark, L. Andries</creatorcontrib><collection>PubMed</collection><collection>CrossRef</collection><collection>PsycArticles</collection><collection>ProQuest One Psychology</collection><collection>MEDLINE - Academic</collection><jtitle>Psychological methods</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>ten Hove, Debby</au><au>Jorgensen, Terrence D.</au><au>van der Ark, L. Andries</au><au>Steinley, Douglas</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Updated Guidelines on Selecting an Intraclass Correlation Coefficient for Interrater Reliability, With Applications to Incomplete Observational Designs</atitle><jtitle>Psychological methods</jtitle><addtitle>Psychol Methods</addtitle><date>2024-10-01</date><risdate>2024</risdate><volume>29</volume><issue>5</issue><spage>967</spage><epage>979</epage><pages>967-979</pages><issn>1082-989X</issn><issn>1939-1463</issn><eissn>1939-1463</eissn><abstract>Several intraclass correlation coefficients (ICCs) are available to assess the interrater reliability (IRR) of observational measurements. Selecting an ICC is complicated, and existing guidelines have three major limitations. First, they do not discuss incomplete designs, in which raters partially vary across subjects. Second, they provide no coherent perspective on the error variance in an ICC, clouding the choice between the available coefficients. Third, the distinction between fixed or random raters is often misunderstood. Based on generalizability theory (GT), we provide updated guidelines on selecting an ICC for IRR, which are applicable to both complete and incomplete observational designs. We challenge conventional wisdom about ICCs for IRR by claiming that raters should seldom (if ever) be considered fixed. Also, we clarify how to interpret ICCs in the case of unbalanced and incomplete designs. We explain four choices a researcher needs to make when selecting an ICC for IRR, and guide researchers through these choices by means of a flowchart, which we apply to three empirical examples from clinical and developmental domains. In the Discussion, we provide guidance in reporting, interpreting, and estimating ICCs, and propose future directions for research into the ICCs for IRR. Translational Abstract Several intraclass correlation coefficients (ICCs) are available to assess the interrater reliability (IRR) of observational measurements. Selecting an ICC is complicated, and existing guidelines have three major limitations. First, they do not discuss incomplete designs, in which raters partially vary across subjects. Second, they provide no coherent perspective on the error variance in an ICC, clouding the choice between the available coefficients. Third, the distinction between fixed or random raters is often misunderstood. Based on generalizability theory (GT), we provide updated guidelines on selecting an ICC for IRR, which are applicable to both complete and incomplete observational designs. We challenge conventional wisdom about ICCs for IRR by claiming that raters should seldom (if ever) be considered fixed. Also, we clarify how to interpret ICCs in the case of unbalanced and incomplete designs. We explain four choices a researcher needs to make when selecting an ICC for IRR, and guide researchers through these choices by means of a flowchart, which we apply to three empirical examples from clinical and developmental domains. In the Discussion, we provide guidance in reporting, interpreting, and estimating ICCs, and propose future directions for research into the ICCs for IRR.</abstract><cop>United States</cop><pub>American Psychological Association</pub><pmid>36048052</pmid><doi>10.1037/met0000516</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0001-5111-6773</orcidid><orcidid>https://orcid.org/0000-0003-3131-7943</orcidid><orcidid>https://orcid.org/0000-0002-1335-4452</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1082-989X
ispartof Psychological methods, 2024-10, Vol.29 (5), p.967-979
issn 1082-989X
1939-1463
1939-1463
language eng
recordid cdi_proquest_miscellaneous_2709019672
source EBSCOhost APA PsycARTICLES
subjects Experimental Design
Human
Interrater Reliability
Measurement
Observation Methods
Statistical Correlation
Theories
title Updated Guidelines on Selecting an Intraclass Correlation Coefficient for Interrater Reliability, With Applications to Incomplete Observational Designs
url http://sfxeu10.hosted.exlibrisgroup.com/loughborough?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T06%3A28%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Updated%20Guidelines%20on%20Selecting%20an%20Intraclass%20Correlation%20Coefficient%20for%20Interrater%20Reliability,%20With%20Applications%20to%20Incomplete%20Observational%20Designs&rft.jtitle=Psychological%20methods&rft.au=ten%20Hove,%20Debby&rft.date=2024-10-01&rft.volume=29&rft.issue=5&rft.spage=967&rft.epage=979&rft.pages=967-979&rft.issn=1082-989X&rft.eissn=1939-1463&rft_id=info:doi/10.1037/met0000516&rft_dat=%3Cproquest_cross%3E2708804511%3C/proquest_cross%3E%3Cgrp_id%3Ecdi_FETCH-LOGICAL-a351t-78c2fe775288d76e9eed2860093c419cee473b309951d93ba937e2c27db3b1a43%3C/grp_id%3E%3Coa%3E%3C/oa%3E%3Curl%3E%3C/url%3E&rft_id=info:oai/&rft_pqid=2708804511&rft_id=info:pmid/36048052&rfr_iscdi=true