Group 1 | Group 2 | |
---|---|---|
Outcome A | ||
Outcome B |
Calculating the P-Value for Fisher's Exact Test
Step 1: Calculate the Probability of the Observed Table
The probability \( P \) of the observed table is calculated as follows:
\( P = \frac{(a + b)! \times (c + d)! \times (a + c)! \times (b + d)!}{a! \times b! \times c! \times d! \times (a + b + c + d)!} \)
- \( a, b, c, d \): The counts in each cell of the 2x2 contingency table.
- \( ! \): The factorial operation, representing the number of ways to arrange each group.
This probability represents the likelihood of observing this specific configuration under the null hypothesis.
Step 2: Identify Extreme Tables
To calculate the p-value, we need to consider all possible tables that are as extreme or more extreme than the observed table.
For the two-tailed p-value, we sum the probabilities of all tables with a probability \( \leq P \) of the observed table. For the one-tailed p-value, we sum only in one direction (e.g., tables where \( a \) is either consistently smaller or larger).
Step 3: Sum the Probabilities of Extreme Tables
The p-value is the sum of probabilities for all extreme tables:
Two-Tailed p-value: \( p = \sum_{\text{extreme tables}} P_{\text{extreme}} \)
One-Tailed p-value: \( p = \sum_{\text{one-direction extreme tables}} P_{\text{extreme}} \)
Example Calculation for Fisher's Exact Test
Observed Table
Consider the following observed table where we want to test for an association between two categorical variables:
Group 1 | Group 2 | |
---|---|---|
Outcome A | a = 10 | b = 5 |
Outcome B | c = 3 | d = 8 |
Step 1: Calculate the Probability of the Observed Table
Using the formula for probability \( P \) of the observed table:
\( P = \frac{(a + b)! \times (c + d)! \times (a + c)! \times (b + d)!}{a! \times b! \times c! \times d! \times (a + b + c + d)!} \)
For our table values, we have \( a = 10 \), \( b = 5 \), \( c = 3 \), and \( d = 8 \). Calculating the individual terms:
- \( (a + b)! = (10 + 5)! = 15! \)
- \( (c + d)! = (3 + 8)! = 11! \)
- \( (a + c)! = (10 + 3)! = 13! \)
- \( (b + d)! = (5 + 8)! = 13! \)
- \( (a + b + c + d)! = (10 + 5 + 3 + 8)! = 26! \)
Substitute these values into the formula:
\( P = \frac{15! \times 11! \times 13! \times 13!}{10! \times 5! \times 3! \times 8! \times 26!} \)
Calculating this expression gives the probability \( P_{\text{observed}} \) of the observed table. This probability represents the likelihood of obtaining this exact table under the null hypothesis.
Step 2: Identify Extreme Tables
Next, we identify tables with configurations as extreme or more extreme than the observed table, based on the value of \( a \). For example, if \( a = 10 \) is extreme in one direction, we would consider tables with \( a \) values that deviate similarly or further from the expected under the null hypothesis.
Step 3: Calculate Probabilities of Extreme Tables
We calculate the probability \( P_{\text{extreme}} \) for each identified extreme table using the same formula from Step 1.
For each extreme table, we substitute the values into the probability formula. For instance, a table with \( a = 11 \) (an even more extreme deviation if the observed result is already considered extreme) would have its own probability calculation:
\( P_{\text{extreme}} = \frac{(a_{\text{extreme}} + b)! \times (c + d)! \times (a_{\text{extreme}} + c)! \times (b + d)!}{a_{\text{extreme}}! \times b! \times c! \times d! \times (a_{\text{extreme}} + b + c + d)!} \)
Step 4: Calculate the One-Tailed and Two-Tailed P-Values
After calculating probabilities for all relevant extreme tables:
- One-Tailed P-Value: Sum the probabilities of the observed table and all tables more extreme in one direction.
- Two-Tailed P-Value: Sum the probabilities of the observed table and all tables as extreme or more extreme, regardless of direction.
For example:
- One-Tailed p-value: \( p_{\text{one-tailed}} = P_{\text{observed}} + \sum P_{\text{one-direction extreme tables}} \)
- Two-Tailed p-value: \( p_{\text{two-tailed}} = P_{\text{observed}} + \sum P_{\text{both-direction extreme tables}} \)
Interpretation
If the calculated two-tailed p-value is smaller than your chosen significance level (for example, \( \alpha = 0.05 \)), it means that the observed association between the groups and outcomes is statistically significant. In this case, we reject the null hypothesis, indicating that there is likely a real association between the groups and outcomes rather than it being due to random chance.
Further Reading
For how to implement Fisher’s Exact Test in R, go to: Fisher’s Exact Test in R: Independence Test for a Small Sample
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.