American Journal of Epidemiology Advance Access originally published online on September 8, 2005
American Journal of Epidemiology 2005 162(9):919-920; doi:10.1093/aje/kwi288
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Letter to the Editor |
THE AUTHORS REPLY
Department of Research, Olmsted Medical Center, Rochester, MN 55904
The letter from Dr. Sheikh (1
) about our article (2
) raises some interesting issues that are important to address. The kappa statistic is most clearly useful when a single item is being evaluated, the number of observers is small, the number of possible responses is small, the number of cases observed is large, and agreement due to chance is an important concern. If the question being addressed is whether agreement is significantly better than chance, where "chance" is defined as multinomial independence, then the hypothesis test based on kappa is clear. Otherwise, interpretation is difficult and rather arbitrary (3
5
).
In our context, we needed to assess agreement across several observers, on items of widely varying response types. It is naïve to believe that misclassification does not occur when copying items from the medical record or applying standard algorithms or indices, and it is important to assess reliability for these items. Although kappa can be computed for agreement on a subject's gender, stochastic independence is really not an issue. It is not at all clear what probability model should be used for chance agreement on date of birth, and combining agreement on gender and date of birth into a single weighted kappa would seem to muddy things considerably. Then, we consider it important to distinguish between "demographic" items such as gender, date of birth, or date of a physician visit and "judgment" items, in particular whether cardiovascular disease was considered at the visit. Although kappa is certainly relevant to this last item, the issues of interpretation remain, and we did not think that presenting a separate weighted kappa value would have furthered the discussion.
The point of view we have taken is more common in the quality assurance literature, where relatively straightforward, easily interpreted statistics are used to evaluate how well one is doing, and the critical issues are whether that is good enough and how to do better (6
, 7
). It may be useful to use separate terms for agreementwhere failure to agree is frequent and agreement by chance is of concernand reliability, where agreement is high and each disagreement may represent a system or process failure that may be correctable. Although kappa statistics and similar measures are often of value, we continue to believe that, in our context, they are not helpful.
ACKNOWLEDGMENTS
Conflict of interest: none declared.
References
- Sheikh K. Re: "Interrater reliability: completing the methods description in medical records review studies." (Letter). Am J Epidemiol 2005;162:919.
[Free Full Text] - Yawn BP, Wollan P. Interrater reliability: completing the methods description in medical records review studies. Am J Epidemiol 2005;161:9747.
[Abstract/Free Full Text] - Bishop YMM, Feinberg SE, Holland PW. Discrete multivariate analysis: theory and practice. Cambridge, MA: MIT Press, 1975.
- Fleiss JL, ed. Statistical methods for rates and proportions: the measurement of interrater agreement. New York, NY: John Wiley & Sons, 1981.
- Landis JR, Koch GG. A review of statistical methods in the analysis of data arising from observer reliability studies, part 1. Statistica Neerlandica 1975;29:10125.
- Rosander AC. The quest for quality in services. Milwaukee, WI: American Society for Quality Control, 1989.
- Juran JM. The quality trilogy: a universal approach to managing for quality. Quality Progress 1986 Aug;19:1924.
This article has been cited by other articles:
![]() |
T. L. Lash, M. P. Fox, S. S. Thwin, A. M. Geiger, D. S. M. Buist, F. Wei, T. S. Field, M. U. Yood, F. J. Frost, V. P. Quinn, et al. Using Probabilistic Corrections to Account for Abstractor Agreement in Medical Record Reviews Am. J. Epidemiol., June 15, 2007; 165(12): 1454 - 1461. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. P. Yawn and P. M. Wollan THE AUTHORS REPLY Am. J. Epidemiol., November 1, 2005; 162(9): 919 - 920. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
