inter-rater reliability

This note last modified March 24, 2025

Got raters? How much do they agree

Fleiss' Kappa for most cases, Krippendorf's Alpha for missing data. There are others

Carey et. al 1996?

I’m doing a project where we are using Cohen’s Kappa and summing the codes. Coder 1 says [A,A,B,C] and Coder 2 says [A,B,C,D] so we sum Coder 1 to A2, B1, C1 and Coder 2 as A1, B1, C1, D1 and run the sums through Cohen’s Kappa.

Kendall’s W?

Coding Manual For Qualitative Researchers-Saldana pg (35|58) says 80-90% agreement is good

For Cohen’s Kappa, Landis and Koch 1977 The measurement of observer agreement for categorical data (biometrics) talks about appropriate interrater reliability.

For multiple codes, do IRR based on the primary code (first code in the list?)