Cohen’s Kappa Calculator

Calculate Cohen's Kappa using either two arrays of ratings or a confusion matrix.

Understanding Cohen's Kappa

💡 Cohen's Kappa (\(\kappa\)) is a statistical measure of inter-rater agreement for qualitative (categorical) items. It considers the observed agreement and the agreement expected by chance, providing a more robust measure compared to simple agreement percentages.

Formula for Cohen's Kappa

The formula for Cohen's Kappa is given by:

\[ \kappa = \frac{P_o - P_e}{1 - P_e} \] where:
  • \(P_o\): Observed agreement, the proportion of cases where both raters agree.
  • \(P_e\): Expected agreement by chance, calculated based on the marginal totals of the rating categories.

Key Concepts

  • Perfect Agreement (\(\kappa = 1\)): When the raters are in complete agreement.
  • No Agreement (\(\kappa = 0\)): When the agreement is equivalent to chance.
  • Negative Agreement (\(\kappa < 0\)): When the agreement is worse than chance.

Note: Cohen's Kappa assumes that the ratings are independent and the categories are mutually exclusive.

Interpreting Kappa Values

The following ranges are commonly used to interpret Cohen's Kappa values:

  • \( > 0.80 \): Almost perfect agreement
  • \( 0.60 - 0.80 \): Substantial agreement
  • \( 0.40 - 0.60 \): Moderate agreement
  • \( 0.20 - 0.40 \): Fair agreement
  • \( < 0.20 \): Poor agreement

Step-by-Step Calculation of Cohen's Kappa

💡 Let's calculate Cohen's Kappa using the two arrays:

  • Rater 1: [1, 2, 2, 3, 3, 4, 4, 5]
  • Rater 2: [1, 2, 3, 3, 3, 4, 4, 5]

Step 1: Create the Confusion Matrix

The confusion matrix counts how often each pair of ratings occurred. For the given arrays:

Rater 1 / Rater 2 1 2 3 4 5 Total
1 1 0 0 0 0 1
2 0 1 1 0 0 2
3 0 0 2 0 0 2
4 0 0 0 2 0 2
5 0 0 0 0 1 1
Total 1 1 3 2 1 8

Step 2: Calculate Observed Agreement (\(P_o\))

The observed agreement is the proportion of times both raters agreed. This is the sum of the diagonal elements (agreements) divided by the total number of ratings.

  • Diagonal elements: \(1 + 1 + 2 + 2 + 1 = 7\)
  • Total ratings: \(8\)
  • \[ P_o = \frac{\text{Diagonal Sum}}{\text{Total}} = \frac{7}{8} = 0.875 \]

Step 3: Calculate Expected Agreement (\(P_e\))

The expected agreement is calculated based on the marginal probabilities of each category (row and column totals).

  • Category 1: \(P(1) = \frac{1}{8} \times \frac{1}{8} = 0.015625\)
  • Category 2: \(P(2) = \frac{2}{8} \times \frac{1}{8} = 0.03125\)
  • Category 3: \(P(3) = \frac{2}{8} \times \frac{3}{8} = 0.09375\)
  • Category 4: \(P(4) = \frac{2}{8} \times \frac{2}{8} = 0.0625\)
  • Category 5: \(P(5) = \frac{1}{8} \times \frac{1}{8} = 0.015625\)
  • \[ P_e = 0.015625 + 0.03125 + 0.09375 + 0.0625 + 0.015625 = 0.21875 \]

Step 4: Calculate Cohen's Kappa

Using the formula: \[ \kappa = \frac{P_o - P_e}{1 - P_e} \] Substituting the values:

\[ \kappa = \frac{0.875 - 0.21875}{1 - 0.21875} = \frac{0.65625}{0.78125} \approx 0.84 \]

Step 5: Interpret the Result

Based on the calculated Kappa value of \(0.84\), we can interpret the agreement as: Almost perfect agreement.

Real-Life Applications

Cohen's Kappa is widely used in various fields to assess inter-rater reliability:

  • Healthcare: Assessing the consistency of diagnoses between doctors.
  • Education: Measuring agreement in grading assignments or exams.
  • Market Research: Evaluating the consistency of customer feedback classifications.
  • Psychology: Determining agreement in categorizing behavioral observations.

Factors Affecting Cohen's Kappa

  • Prevalence of Categories: Kappa is sensitive to imbalances in category frequencies.
  • Number of Categories: More categories can reduce the likelihood of agreement by chance, increasing Kappa.
  • Marginal Distributions: Unequal distributions of ratings between raters can affect Kappa.

Limitations of Cohen's Kappa

  • Prevalence Paradox: High agreement on rare categories can lead to low Kappa values.
  • Simplistic Assumptions: Assumes raters are equally reliable, which may not always be true.
  • Category Independence: Assumes categories are mutually exclusive and exhaustive.

Reducing Bias in Kappa Calculations

To address potential biases and limitations:

  • Ensure balanced categories to reduce the prevalence effect.
  • Use weighted Kappa for ordinal data to account for the degree of disagreement.
  • Conduct sensitivity analyses to explore the effect of category imbalances.

Python Implementation

Python Function for Cohen's Kappa
from sklearn.metrics import cohen_kappa_score

# Example data: Ratings by two raters
rater1 = [1, 2, 2, 3, 3, 4, 4, 5]
rater2 = [1, 2, 3, 3, 3, 4, 4, 5]

# Calculate Cohen's Kappa
kappa = cohen_kappa_score(rater1, rater2)
print(f"Cohen's Kappa: {kappa:.5f}")

# Interpretation
if kappa > 0.8:
    print("Almost perfect agreement.")
elif kappa > 0.6:
    print("Substantial agreement.")
elif kappa > 0.4:
    print("Moderate agreement.")
elif kappa > 0.2:
    print("Fair agreement.")
else:
    print("Poor agreement.")
        
Python Function for Cohen's Kappa from Confusion Matrix
from sklearn.metrics import cohen_kappa_score
import numpy as np

def calculate_kappa_from_matrix(conf_matrix):
    """
    Calculate Cohen's Kappa from a confusion matrix.

    Parameters:
        conf_matrix (numpy.ndarray): Confusion matrix where rows represent Rater 1 categories
                                     and columns represent Rater 2 categories.

    Returns:
        float: Cohen's Kappa score.
    """
    # Total number of observations
    total = np.sum(conf_matrix)

    # Observed agreement (Po)
    observed_agreement = np.trace(conf_matrix) / total

    # Expected agreement (Pe)
    row_totals = np.sum(conf_matrix, axis=1) / total
    col_totals = np.sum(conf_matrix, axis=0) / total
    expected_agreement = np.sum(row_totals * col_totals)

    # Calculate kappa
    kappa = (observed_agreement - expected_agreement) / (1 - expected_agreement)
    return kappa


# Example confusion matrix
confusion_matrix = np.array([
    [1, 0, 0, 0, 0],  # Rater 1: Category 1
    [0, 1, 1, 0, 0],  # Rater 1: Category 2
    [0, 0, 2, 0, 0],  # Rater 1: Category 3
    [0, 0, 0, 2, 0],  # Rater 1: Category 4
    [0, 0, 0, 0, 1]   # Rater 1: Category 5
])

# Calculate Cohen's Kappa
kappa_score = calculate_kappa_from_matrix(confusion_matrix)
print(f"Cohen's Kappa: {kappa_score:.5f}")
# Interpretation
if kappa > 0.8:
    print("Almost perfect agreement.")
elif kappa > 0.6:
    print("Substantial agreement.")
elif kappa > 0.4:
    print("Moderate agreement.")
elif kappa > 0.2:
    print("Fair agreement.")
else:
    print("Poor agreement.")
    

R Implementation

R Function for Cohen's Kappa
# Function to calculate Cohen's Kappa
calculate_kappa <- function(observed_matrix) {
    # Calculate total observations
    total <- sum(observed_matrix)

    # Calculate row and column marginals (proportions)
    row_totals <- rowSums(observed_matrix) / total
    col_totals <- colSums(observed_matrix) / total

    # Calculate observed agreement
    observed_agreement <- sum(diag(observed_matrix)) / total

    # Calculate expected agreement
    expected_agreement <- sum(row_totals * col_totals)

    # Calculate Cohen's Kappa
    kappa <- (observed_agreement - expected_agreement) / (1 - expected_agreement)

    return(kappa)
}

# Example usage
# Confusion matrix (rows = Rater 1, columns = Rater 2)
# Example usage
# Confusion matrix (rows = Rater 1, columns = Rater 2)
observed_matrix <- matrix(c(
    1, 0, 0, 0, 0,  # Rater 1: Category 1
    0, 1, 1, 0, 0,  # Rater 1: Category 2
    0, 0, 2, 0, 0,  # Rater 1: Category 3
    0, 0, 0, 2, 0,  # Rater 1: Category 4
    0, 0, 0, 0, 1   # Rater 1: Category 5
), nrow = 5, byrow = TRUE)


# Calculate Cohen's Kappa
kappa <- calculate_kappa(observed_matrix)
cat(sprintf("Cohen's Kappa: %.5f\n", kappa))

# Interpretation
if (kappa > 0.8) {
    cat("Almost perfect agreement.\n")
} else if (kappa > 0.6) {
    cat("Substantial agreement.\n")
} else if (kappa > 0.4) {
    cat("Moderate agreement.\n")
} else if (kappa > 0.2) {
    cat("Fair agreement.\n")
} else {
    cat("Poor agreement.\n")
}
R Function for Cohen's Kappa Using Two Arrays
# Function to calculate Cohen's Kappa from two rater arrays
calculate_kappa <- function(rater1, rater2) {
    # Check if the input arrays are of the same length
    if (length(rater1) != length(rater2)) {
        stop("The two arrays must have the same length.")
    }

    # Generate the confusion matrix
    confusion_matrix <- table(rater1, rater2)

    # Total number of observations
    total <- sum(confusion_matrix)

    # Observed agreement (Po)
    observed_agreement <- sum(diag(confusion_matrix)) / total

    # Expected agreement (Pe)
    row_totals <- rowSums(confusion_matrix) / total
    col_totals <- colSums(confusion_matrix) / total
    expected_agreement <- sum(row_totals * col_totals)

    # Calculate Cohen's Kappa
    kappa <- (observed_agreement - expected_agreement) / (1 - expected_agreement)

    return(kappa)
}

# Example data: Ratings by two raters
rater1 <- c(1, 2, 2, 3, 3, 4, 4, 5)
rater2 <- c(1, 2, 3, 3, 3, 4, 4, 5)

# Calculate Cohen's Kappa
kappa_score <- calculate_kappa(rater1, rater2)
cat(sprintf("Cohen's Kappa: %.5f\n", kappa_score))

# Interpretation
if (kappa > 0.8) {
    cat("Almost perfect agreement.\n")
} else if (kappa > 0.6) {
    cat("Substantial agreement.\n")
} else if (kappa > 0.4) {
    cat("Moderate agreement.\n")
} else if (kappa > 0.2) {
    cat("Fair agreement.\n")
} else {
    cat("Poor agreement.\n")
}

JavaScript Implementation

JavaScript Function for Cohen's Kappa From Confusion Matrix
function calculateKappa(matrix) {
    const total = matrix.flat().reduce((sum, val) => sum + val, 0);

    // Calculate row and column marginals (proportions)
    const rowTotals = matrix.map(row => row.reduce((sum, val) => sum + val, 0) / total);
    const colTotals = matrix[0].map((_, colIndex) =>
        matrix.reduce((sum, row) => sum + row[colIndex], 0) / total
    );

    // Calculate observed agreement
    const observedAgreement = matrix.reduce(
        (sum, row, rowIndex) => sum + row[rowIndex] / total,
        0
    );

    // Calculate expected agreement
    const expectedAgreement = rowTotals.reduce(
        (sum, rowProp, rowIndex) => sum + rowProp * colTotals[rowIndex],
        0
    );

    // Calculate Cohen's Kappa
    return (observedAgreement - expectedAgreement) / (1 - expectedAgreement);
}

// Example usage
const observedMatrix = [
    [1, 0, 0, 0, 0], // Rater 1: Category 1
    [0, 1, 1, 0, 0], // Rater 1: Category 2
    [0, 0, 2, 0, 0], // Rater 1: Category 3
    [0, 0, 0, 2, 0], // Rater 1: Category 4
    [0, 0, 0, 0, 1]  // Rater 1: Category 5
];


const kappa = calculateKappa(observedMatrix);
console.log(`Cohen's Kappa: ${kappa.toFixed(5)}`);

// Interpretation
if (kappa > 0.8) {
    console.log("Almost perfect agreement.");
} else if (kappa > 0.6) {
    console.log("Substantial agreement.");
} else if (kappa > 0.4) {
    console.log("Moderate agreement.");
} else if (kappa > 0.2) {
    console.log("Fair agreement.");
} else {
    console.log("Poor agreement.");
}
JavaScript Function for Cohen's Kappa Using Two Arrays

/**
 * Calculate Cohen's Kappa from two arrays of ratings.
 * @param {Array} rater1 - Ratings by Rater 1.
 * @param {Array} rater2 - Ratings by Rater 2.
 * @returns {number} - Cohen's Kappa score.
 */
function calculateKappa(rater1, rater2) {
    if (rater1.length !== rater2.length) {
        throw new Error("The two arrays must have the same length.");
    }

    // Generate the confusion matrix
    const uniqueLabels = Array.from(new Set(rater1.concat(rater2))).sort();
    const matrix = Array(uniqueLabels.length).fill(0).map(() => Array(uniqueLabels.length).fill(0));
    const labelIndex = Object.fromEntries(uniqueLabels.map((label, i) => [label, i]));

    rater1.forEach((label, i) => {
        matrix[labelIndex[label]][labelIndex[rater2[i]]]++;
    });

    // Total number of observations
    const total = rater1.length;

    // Observed agreement (Po)
    const observedAgreement = matrix.reduce((sum, row, i) => sum + row[i], 0) / total;

    // Expected agreement (Pe)
    const rowTotals = matrix.map(row => row.reduce((sum, val) => sum + val, 0) / total);
    const colTotals = matrix[0].map((_, colIndex) =>
        matrix.reduce((sum, row) => sum + row[colIndex], 0) / total
    );
    const expectedAgreement = rowTotals.reduce(
        (sum, rowProp, i) => sum + rowProp * colTotals[i], 0
    );

    // Calculate Cohen's Kappa
    const kappa = (observedAgreement - expectedAgreement) / (1 - expectedAgreement);
    return kappa;
}

// Example data: Ratings by two raters
const rater1 = [1, 2, 2, 3, 3, 4, 4, 5];
const rater2 = [1, 2, 3, 3, 3, 4, 4, 5];

// Calculate Cohen's Kappa
const kappaScore = calculateKappa(rater1, rater2);
console.log(`Cohen's Kappa: ${kappaScore.toFixed(5)}`);

// Interpretation
if (kappa > 0.8) {
    console.log("Almost perfect agreement.");
} else if (kappa > 0.6) {
    console.log("Substantial agreement.");
} else if (kappa > 0.4) {
    console.log("Moderate agreement.");
} else if (kappa > 0.2) {
    console.log("Fair agreement.");
} else {
    console.log("Poor agreement.");
}

Further Reading

Attribution

If you found this guide helpful, feel free to link back to this post for attribution and share it with others!

Feel free to use these formats to reference our tools in your articles, blogs, or websites.

Profile Picture
Senior Advisor, Data Science | [email protected] | + posts

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.