G-Test of Goodness of Fit Calculator

G-Test of Goodness of Fit Calculator

Enter observed and expected values for each category, then click "Calculate" to see the G-test statistic and p-value.

Category Observed Expected Action

Understanding the G-Test of Goodness of Fit

What is the G-Test of Goodness of Fit?

The G-Test of Goodness of Fit is a statistical test used to determine if a categorical variable’s observed distribution significantly differs from an expected (or hypothesized) distribution. It’s particularly useful for comparing observed frequencies to expected frequencies across multiple categories.

Why Use the G-Test?

  • Assess Goodness of Fit: The G-Test helps determine if observed frequencies align with expected frequencies based on a hypothesized model.
  • Flexible with Multiple Categories: This test accommodates up to 10 categories, allowing for detailed comparisons across a wide range of categories.

How the G-Test Statistic is Calculated

The G-test statistic (G) is calculated using the formula:

\( G = 2 \sum O \ln \left( \frac{O}{E} \right) \)

  • \( O \): Observed frequency in each category
  • \( E \): Expected frequency in each category

The G-test statistic measures the divergence between observed and expected frequencies, where larger values indicate a greater difference between the distributions.

Calculating the P-Value

To interpret the G-test statistic, we calculate a p-value based on the chi-square distribution, using degrees of freedom equal to the number of categories minus one:

\( p = 1 - F_{\chi^{2}}( G, \text{df}) \)

  • \( G \): The computed G-test statistic
  • \( \text{df} \): Degrees of freedom, equal to the number of categories minus one
  • \( F_{\chi^{2}} \): The cumulative distribution function of the chi-square distribution

The p-value helps determine whether the difference between observed and expected values is statistically significant.

Interpretation

A small p-value (typically \( p < 0.05 \)) suggests strong evidence against the null hypothesis, indicating that the observed distribution significantly differs from the expected distribution. A large p-value suggests that any differences may be due to random variation, supporting the null hypothesis of no significant difference.

Calculating the Chi-Square CDF and P-Value Programmatically

To calculate the p-value for a G-test statistic, find the cumulative distribution function (CDF) and subtract it from 1. This gives the probability that the chi-square random variable will have a value at least as extreme as the observed G-test statistic.

Python (using SciPy)

In Python, the scipy.stats library can be used to calculate the chi-square CDF for the G-test:

from scipy.stats import chi2

# Parameters
g_statistic = 2.3808  # Replace with your G value
degrees_of_freedom = 2  # Adjust based on the number of categories - 1

# Calculate p-value (1 - CDF)
p_value = 1 - chi2.cdf(g_statistic, degrees_of_freedom)

This returns the p-value for the G-test statistic.

JavaScript (using jStat)

In JavaScript, the jStat library can calculate the chi-square CDF, and you subtract it from 1 to get the p-value:

// Define the G-test statistic and degrees of freedom
const gStatistic = 2.3808; // Replace with your G value
const degreesOfFreedom = 2; // Adjust based on the number of categories - 1

// Calculate p-value (1 - CDF)
const pValue = 1 - jStat.chisquare.cdf(gStatistic, degreesOfFreedom);

This provides the p-value based on the G-test statistic and degrees of freedom.

R

In R, use the pchisq function to find the CDF, then subtract it from 1 to get the p-value:

# Parameters
g_statistic <- 2.3808  # Replace with your G value
degrees_of_freedom <- 2  # Adjust based on the number of categories - 1

# Calculate p-value (1 - CDF)
p_value <- 1 - pchisq(g_statistic, degrees_of_freedom)

This returns the p-value for the G-test in R.

Using these methods, you can calculate the right-tail p-value based on the G-test CDF, helping you evaluate the statistical significance of the observed differences.

Further Reading

Profile Picture
Senior Advisor, Data Science | [email protected] | + posts

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.