Maxwell`s chi square statistic tests the general disagreements between the two councillors. McNemar`s general statistics test the asymmetry in the distribution of subjects that councillors disagree on, i.e. there are more differences of opinion on some categories of responses than others. Establishes a ranking table from raw data in the calculation table for two observers and calculates an inter-rater agreement statistic (Kappa) to assess the match between two classifications on ordinal or nominal scales. There are a number of statistics that have been used to measure the reliability of interreters and intraraterns. A sub-list includes a match percentage, Kappa cohens (for two tyters), kappa fleiss (Adjustment of Cohens Kappa for 3 or more raters), contingency coefficient, Pearson r and Spearman Rho, intraclassin correlation coefficient, match correlation coefficient, and Alpha krippendorff (useful if there are several tips and evaluations). The use of correlation coefficients such as Pearsons r can be a poor reflection of the agreement between advisors, leading to an extreme overshoot or underestimation of the actual level of the breach agreement (6). In this document, we will take into account only two of the most common measures, the percentage of consent and Kappa cohens. Basically and above, a kappa less than 0.2 for poor match, and a kappa above 0.8 indicates a very good adhesion beyond chance. Cohens coefficient Kappa () is a statistic used to measure reliability between advisors (and also the reliability of inter-raters) for qualitative (categorical) elements.

[1] It is generally accepted that this is a more robust indicator than a simple percentage of the agreement calculation, since the possibility of a random agreement is taken into account. There are controversies around Cohens Kappa because of the difficulty of interpreting the indications of the agreement. Some researchers have suggested that it is easier, conceptually, to assess differences of opinion between objects. [2] For more details, see Restrictions. An example of the use of Fleiss` Kappa could be this: Consider fourteen psychiatrists are asked to consider ten patients. Each psychiatrist gives each patient one of the five diagnoses. These are compiled in a matrix, and Fleiss` Kappa can be calculated from this matrix (see example below) to show the degree of agreement between psychiatrists above the degree of concordance expected by chance. Disagreement on each category and the asymmetry of divergences (2 holes) If statistical significance is not a useful guide, what is the magnitude of Kappa that reflects an appropriate match? The guidelines would be helpful, but other factors than the agreement may influence their magnitude, making it problematic to interpret a certain order of magnitude.

As Sim and Wright have noted, two important factors are prevalence (codes are likely or vary in probabilities) and bias (marginal probabilities are similar or different for both observers).