Kruskal-Wallis Test Calculator

Kruskal-Wallis Test Results

Understanding the Kruskal-Wallis Test

The Kruskal-Wallis test is a non-parametric statistical test used to determine if there are statistically significant differences between the medians of three or more independent groups. Unlike ANOVA, the Kruskal-Wallis test does not require the assumption of normally distributed data, making it useful for ordinal data or continuous data that does not meet the assumptions of parametric tests.

Key Characteristics of the Kruskal-Wallis Test

  • Non-parametric: The Kruskal-Wallis test is the non-parametric alternative to the one-way ANOVA, and it compares the ranks of the data rather than the actual data values.
  • No normality assumption: Since it’s based on ranks, the test does not assume that the data is normally distributed, making it robust for non-normal or ordinal data.
  • Distribution testing: The test evaluates whether the samples originate from the same distribution.
  • Degrees of freedom: Calculated as the number of groups minus one.

Formula for the Kruskal-Wallis Test

The Kruskal-Wallis H-statistic is calculated as follows:

\[ H = \frac{12}{N(N+1)} \sum_{i=1}^k \frac{R_i^2}{n_i} - 3(N + 1) \]

where:

  • \( N \): Total number of observations across all groups
  • \( k \): Number of groups
  • \( R_i \): Sum of ranks for group \( i \)
  • \( n_i \): Sample size of group \( i \)

The H-statistic follows a chi-square distribution with \( k - 1 \) degrees of freedom. If the calculated H-statistic is greater than the critical value from the chi-square distribution table for the chosen significance level, we reject the null hypothesis that the groups come from the same distribution.

Interpreting the Results

The null hypothesis for the Kruskal-Wallis test states that the medians of all groups are equal, meaning the groups are from the same distribution. If the p-value is less than the chosen significance level (e.g., 0.05), we reject the null hypothesis, suggesting that at least one group differs significantly from the others. Otherwise, we fail to reject the null hypothesis, indicating that there is no statistically significant difference between the groups.

Programmatically Calculating the Kruskal-Wallis Test

Below are examples of how to perform the Kruskal-Wallis test in different programming languages.

1. Using JavaScript

In JavaScript, jStat library does not have a Kruskal-Wallis test implementation, so we will go through a from-scratch implementation with no external libraries.

function kruskalWallis(groups) {
  // Combine all values and store group indices
  const allValues = [];
  const groupIndices = [];
  groups.forEach((group, groupIndex) => {
    group.forEach(value => {
      allValues.push(value);
      groupIndices.push(groupIndex);
    });
  });

  // Calculate ranks
  const sortedPairs = allValues.map((value, index) => ({ value, index }))
    .sort((a, b) => a.value - b.value);

  const ranks = new Array(allValues.length).fill(0);
  let i = 0;
  while (i < sortedPairs.length) {
    let j = i;
    // Find ties
    while (j < sortedPairs.length - 1 && sortedPairs[j].value === sortedPairs[j + 1].value) {
      j++;
    }
    // Assign average rank for ties
    const avgRank = (i + j) / 2 + 1;
    for (let k = i; k <= j; k++) {
      ranks[sortedPairs[k].index] = avgRank;
    }
    i = j + 1;
  }

  // Calculate group rank sums and sizes
  const k = groups.length;
  const n = allValues.length;
  const groupRankSums = new Array(k).fill(0);
  const groupSizes = new Array(k).fill(0);

  ranks.forEach((rank, index) => {
    const groupIndex = groupIndices[index];
    groupRankSums[groupIndex] += rank;
    groupSizes[groupIndex]++;
  });

  // Calculate H-statistic
  let h = 12 / (n * (n + 1));
  h *= groupRankSums.reduce((sum, rankSum, i) =>
    sum + Math.pow(rankSum, 2) / groupSizes[i], 0);
  h -= 3 * (n + 1);

  // Calculate correction factor for ties
  let tieCorrection = 0;
  let currentTieCount = 1;
  for (let i = 1; i < sortedPairs.length; i++) {
    if (sortedPairs[i].value === sortedPairs[i-1].value) {
      currentTieCount++;
    } else {
      if (currentTieCount > 1) {
        tieCorrection += Math.pow(currentTieCount, 3) - currentTieCount;
      }
      currentTieCount = 1;
    }
  }
  // Check last group of ties
  if (currentTieCount > 1) {
    tieCorrection += Math.pow(currentTieCount, 3) - currentTieCount;
  }

  // Apply tie correction
  if (tieCorrection > 0) {
    h = h / (1 - tieCorrection / (n * (n * n - 1)));
  }

  // Improved p-value calculation using gamma function approximation
  const df = k - 1;
  const pValue = gammaCDF(h / 2, df / 2);

  return {
    statistic: h,
    pValue: 1 - pValue
  };
}

// Gamma function approximation
function gamma(z) {
  const p = [676.5203681218851, -1259.1392167224028, 771.32342877765313,
    -176.61502916214059, 12.507343278686905, -0.13857109526572012,
    9.9843695780195716e-6, 1.5056327351493116e-7];

  if (z < 0.5) {
    return Math.PI / (Math.sin(Math.PI * z) * gamma(1 - z));
  }

  z -= 1;
  let x = 0.99999999999980993;
  for (let i = 0; i < p.length; i++) {
    x += p[i] / (z + i + 1);
  }

  const t = z + p.length - 0.5;
  return Math.sqrt(2 * Math.PI) * Math.pow(t, z + 0.5) * Math.exp(-t) * x;
}

// Lower incomplete gamma function using series expansion
function lowerGamma(s, x) {
  if (x <= 0) return 0;

  let sum = 0;
  let term = 1 / s;
  let n = 1;
  const maxIterations = 1000;
  const epsilon = 1e-10;

  while (Math.abs(term) > epsilon && n < maxIterations) {
    sum += term;
    term *= x / (s + n);
    n++;
  }

  return Math.pow(x, s) * Math.exp(-x) * sum;
}

// Gamma CDF
function gammaCDF(x, a) {
  if (x <= 0) return 0;
  return lowerGamma(a, x) / gamma(a);
}

// Example usage
const group1 = [10, 15, 14, 18, 20];
const group2 = [12, 17, 16, 19, 21];
const group3 = [20, 25, 23, 21, 24];

const groups = [group1, group2, group3];
const result = kruskalWallis(groups);

console.log(`H-statistic: ${result.statistic.toFixed(5)}`);
console.log(`p-value: ${result.pValue.toFixed(5)}`);

2. Using Python

In Python, you can use the scipy.stats.kruskal function to perform the Kruskal-Wallis test:

from scipy.stats import kruskal

# Define data for each group
group1 = [10, 15, 14, 18, 20]
group2 = [12, 17, 16, 19, 21]
group3 = [20, 25, 23, 21, 24]

# Perform Kruskal-Wallis Test
stat, p_value = kruskal(group1, group2, group3)

print(f"H-statistic: {stat:.5f}, p-value: {p_value:.5f}")

3. Using R

In R, you can use the kruskal.test function for the Kruskal-Wallis test:

# Define data for each group
group1 <- c(10, 15, 14, 18, 20)
group2 <- c(12, 17, 16, 19, 21)
group3 <- c(20, 25, 23, 21, 24)

# Perform Kruskal-Wallis Test
kruskal.test(list(group1, group2, group3))

Further Reading

Profile Picture
Senior Advisor, Data Science | [email protected] | + posts

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.