inter-rater reliability
Got raters? How much do they agree
Fleiss' Kappa for most cases, Krippendorf's Alpha for missing data. There are others
Carey et. al 1996?
I’m doing a project where we are using Cohen’s Kappa and summing the codes. Coder 1 says [A,A,B,C] and Coder 2 says [A,B,C,D] so we sum Coder 1 to A2, B1, C1 and Coder 2 as A1, B1, C1, D1 and run the sums through Cohen’s Kappa.
Coding Manual For Qualitative Researchers-Saldana pg (35|58) says 80-90% agreement is good
For Cohen’s Kappa, Landis and Koch 1977 The measurement of observer agreement for categorical data (biometrics) talks about appropriate interrater reliability.
For multiple codes, do IRR based on the primary code (first code in the list?)