inter-rater reliability

This note last modified February 18, 2022

Got raters? How much do they agree

Fleiss' Kappa for most cases, Krippendorf's Alpha for missing data. There are others

Carey et. al 1996?

I’m doing a project where we are using Cohen’s Kappa and summing the codes. Coder 1 says [A,A,B,C] and Coder 2 says [A,B,C,D] so we sum Coder 1 to A2, B1, C1 and Coder 2 as A1, B1, C1, D1 and run the sums through Cohen’s Kappa.

Kendall’s W?

Coding Manual For Qualitative Researchers-Saldana pg (35|58) says 80-90% agreement is good

For Cohen’s Kappa, Landis and Koch 1977 The measurement of observer agreement for categorical data (biometrics) talks about appropriate interrater reliability.