Pitfalls in the use of kappa when interpreting agreement between multiple raters in reliability studies

O'Leary, Shaun, Lund, Marte, Ytre-Hauge, Tore Johan, Holm, Sigrid Reiesen, Naess, Kaja, Dalland, Lars Nagelsted and McPhail, Steven M. (2013) Pitfalls in the use of kappa when interpreting agreement between multiple raters in reliability studies. Physiotherapy, 100 1: 27-35. doi:10.1016/j.physio.2013.08.002

Author O'Leary, Shaun
Lund, Marte
Ytre-Hauge, Tore Johan
Holm, Sigrid Reiesen
Naess, Kaja
Dalland, Lars Nagelsted
McPhail, Steven M.
Title Pitfalls in the use of kappa when interpreting agreement between multiple raters in reliability studies
Journal name Physiotherapy   Check publisher's open access policy
ISSN 0031-9406
Publication date 2013
Year available 2013
Sub-type Article (original research)
DOI 10.1016/j.physio.2013.08.002
Open Access Status DOI
Volume 100
Issue 1
Start page 27
End page 35
Total pages 9
Place of publication London, United Kingdom
Publisher Elsevier Ltd
Collection year 2014
Language eng
Subject 3612 Physical Therapy, Sports Therapy and Rehabilitation
Abstract Objective: To compare different reliability coefficients (exact agreement, and variations of the kappa (generalised, Cohen's and Prevalence Adjusted and Biased Adjusted (PABAK))) for four physiotherapists conducting visual assessments of scapulae. Design: Inter-therapist reliability study. Setting: Research laboratory. Participants: 30 individuals with no history of neck or shoulder pain were recruited with no obvious significant postural abnormalities. Main outcome measures: Ratings of scapular posture were recorded in multiple biomechanical planes under four test conditions (at rest, and while under three isometric conditions) by four physiotherapists. Results: The magnitude of discrepancy between the two therapist pairs was 0.04 to 0.76 for Cohen's kappa, and 0.00 to 0.86 for PABAK. In comparison, the generalised kappa provided a score between the two paired kappa coefficients. The difference between mean generalised kappa coefficients and mean Cohen's kappa (0.02) and between mean generalised kappa and PABAK (0.02) were negligible, but the magnitude of difference between the generalised kappa and paired kappa within each plane and condition was substantial; 0.02 to 0.57 for Cohen's kappa and 0.02 to 0.63 for PABAK, respectively. Conclusions: Calculating coefficients for therapist pairs alone may result in inconsistent findings. In contrast, the generalised kappa provided a coefficient close to the mean of the paired kappa coefficients. These findings support an assertion that generalised kappa may lead to a better representation of reliability between three or more raters and that reliability studies only calculating agreement between two raters should be interpreted with caution. However, generalised kappa may mask more extreme cases of agreement (or disagreement) that paired comparisons may reveal.
Keyword Agreement
Inter therapist
Q-Index Code C1
Q-Index Status Confirmed Code
Institutional Status UQ

Document type: Journal Article
Sub-type: Article (original research)
Collections: UQ Centre for Clinical Research Publications
Official 2014 Collection
School of Health and Rehabilitation Sciences Publications
Version Filter Type
Citation counts: TR Web of Science Citation Count  Cited 4 times in Thomson Reuters Web of Science Article | Citations
Scopus Citation Count Cited 5 times in Scopus Article | Citations
Google Scholar Search Google Scholar
Created: Sun, 02 Mar 2014, 00:06:38 EST by System User on behalf of UQ Centre for Clinical Research