Kruskal-Wallis Test Results
Understanding the Kruskal-Wallis Test
The Kruskal-Wallis test is a non-parametric statistical test used to determine if there are statistically significant differences between the medians of three or more independent groups. Unlike ANOVA, the Kruskal-Wallis test does not require the assumption of normally distributed data, making it useful for ordinal data or continuous data that does not meet the assumptions of parametric tests.
Key Characteristics of the Kruskal-Wallis Test
- Non-parametric: The Kruskal-Wallis test is the non-parametric alternative to the one-way ANOVA, and it compares the ranks of the data rather than the actual data values.
- No normality assumption: Since it’s based on ranks, the test does not assume that the data is normally distributed, making it robust for non-normal or ordinal data.
- Distribution testing: The test evaluates whether the samples originate from the same distribution.
- Degrees of freedom: Calculated as the number of groups minus one.
Formula for the Kruskal-Wallis Test
The Kruskal-Wallis H-statistic is calculated as follows:
where:
- \( N \): Total number of observations across all groups
- \( k \): Number of groups
- \( R_i \): Sum of ranks for group \( i \)
- \( n_i \): Sample size of group \( i \)
The H-statistic follows a chi-square distribution with \( k - 1 \) degrees of freedom. If the calculated H-statistic is greater than the critical value from the chi-square distribution table for the chosen significance level, we reject the null hypothesis that the groups come from the same distribution.
Interpreting the Results
The null hypothesis for the Kruskal-Wallis test states that the medians of all groups are equal, meaning the groups are from the same distribution. If the p-value is less than the chosen significance level (e.g., 0.05), we reject the null hypothesis, suggesting that at least one group differs significantly from the others. Otherwise, we fail to reject the null hypothesis, indicating that there is no statistically significant difference between the groups.
Programmatically Calculating the Kruskal-Wallis Test
Below are examples of how to perform the Kruskal-Wallis test in different programming languages.
1. Using JavaScript
In JavaScript, jStat library does not have a Kruskal-Wallis test implementation, so we will go through a from-scratch implementation with no external libraries.
function kruskalWallis(groups) {
// Combine all values and store group indices
const allValues = [];
const groupIndices = [];
groups.forEach((group, groupIndex) => {
group.forEach(value => {
allValues.push(value);
groupIndices.push(groupIndex);
});
});
// Calculate ranks
const sortedPairs = allValues.map((value, index) => ({ value, index }))
.sort((a, b) => a.value - b.value);
const ranks = new Array(allValues.length).fill(0);
let i = 0;
while (i < sortedPairs.length) {
let j = i;
// Find ties
while (j < sortedPairs.length - 1 && sortedPairs[j].value === sortedPairs[j + 1].value) {
j++;
}
// Assign average rank for ties
const avgRank = (i + j) / 2 + 1;
for (let k = i; k <= j; k++) {
ranks[sortedPairs[k].index] = avgRank;
}
i = j + 1;
}
// Calculate group rank sums and sizes
const k = groups.length;
const n = allValues.length;
const groupRankSums = new Array(k).fill(0);
const groupSizes = new Array(k).fill(0);
ranks.forEach((rank, index) => {
const groupIndex = groupIndices[index];
groupRankSums[groupIndex] += rank;
groupSizes[groupIndex]++;
});
// Calculate H-statistic
let h = 12 / (n * (n + 1));
h *= groupRankSums.reduce((sum, rankSum, i) =>
sum + Math.pow(rankSum, 2) / groupSizes[i], 0);
h -= 3 * (n + 1);
// Calculate correction factor for ties
let tieCorrection = 0;
let currentTieCount = 1;
for (let i = 1; i < sortedPairs.length; i++) {
if (sortedPairs[i].value === sortedPairs[i-1].value) {
currentTieCount++;
} else {
if (currentTieCount > 1) {
tieCorrection += Math.pow(currentTieCount, 3) - currentTieCount;
}
currentTieCount = 1;
}
}
// Check last group of ties
if (currentTieCount > 1) {
tieCorrection += Math.pow(currentTieCount, 3) - currentTieCount;
}
// Apply tie correction
if (tieCorrection > 0) {
h = h / (1 - tieCorrection / (n * (n * n - 1)));
}
// Improved p-value calculation using gamma function approximation
const df = k - 1;
const pValue = gammaCDF(h / 2, df / 2);
return {
statistic: h,
pValue: 1 - pValue
};
}
// Gamma function approximation
function gamma(z) {
const p = [676.5203681218851, -1259.1392167224028, 771.32342877765313,
-176.61502916214059, 12.507343278686905, -0.13857109526572012,
9.9843695780195716e-6, 1.5056327351493116e-7];
if (z < 0.5) {
return Math.PI / (Math.sin(Math.PI * z) * gamma(1 - z));
}
z -= 1;
let x = 0.99999999999980993;
for (let i = 0; i < p.length; i++) {
x += p[i] / (z + i + 1);
}
const t = z + p.length - 0.5;
return Math.sqrt(2 * Math.PI) * Math.pow(t, z + 0.5) * Math.exp(-t) * x;
}
// Lower incomplete gamma function using series expansion
function lowerGamma(s, x) {
if (x <= 0) return 0;
let sum = 0;
let term = 1 / s;
let n = 1;
const maxIterations = 1000;
const epsilon = 1e-10;
while (Math.abs(term) > epsilon && n < maxIterations) {
sum += term;
term *= x / (s + n);
n++;
}
return Math.pow(x, s) * Math.exp(-x) * sum;
}
// Gamma CDF
function gammaCDF(x, a) {
if (x <= 0) return 0;
return lowerGamma(a, x) / gamma(a);
}
// Example usage
const group1 = [10, 15, 14, 18, 20];
const group2 = [12, 17, 16, 19, 21];
const group3 = [20, 25, 23, 21, 24];
const groups = [group1, group2, group3];
const result = kruskalWallis(groups);
console.log(`H-statistic: ${result.statistic.toFixed(5)}`);
console.log(`p-value: ${result.pValue.toFixed(5)}`);
2. Using Python
In Python, you can use the scipy.stats.kruskal
function to perform the Kruskal-Wallis test:
from scipy.stats import kruskal
# Define data for each group
group1 = [10, 15, 14, 18, 20]
group2 = [12, 17, 16, 19, 21]
group3 = [20, 25, 23, 21, 24]
# Perform Kruskal-Wallis Test
stat, p_value = kruskal(group1, group2, group3)
print(f"H-statistic: {stat:.5f}, p-value: {p_value:.5f}")
3. Using R
In R, you can use the kruskal.test
function for the Kruskal-Wallis test:
# Define data for each group
group1 <- c(10, 15, 14, 18, 20)
group2 <- c(12, 17, 16, 19, 21)
group3 <- c(20, 25, 23, 21, 24)
# Perform Kruskal-Wallis Test
kruskal.test(list(group1, group2, group3))
Further Reading
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.