How to Perform One Sample Z-Test in R with Practical Examples

by | Programming, R, Statistics

Introduction to One-Sample Z-Tests

The one-sample Z-test is a statistical test used to determine if the mean of a sample differs significantly from a known or hypothesized population mean. Unlike the t-test, which is commonly used with smaller sample sizes, the Z-test assumes a large sample size (usually n > 30) or a known population standard deviation, making it appropriate for large data sets in which the Central Limit Theorem applies.

When to Use a One-Sample Z-Test

The one-sample Z-test is suitable when:

  • The sample size is large (typically n > 30).
  • The population standard deviation (\( \sigma \)) is known.
  • You want to compare the sample mean to a known population mean.

Formula for the One-Sample Z-Test

The Z-test calculates a z-score that tells us how many standard deviations the sample mean is from the hypothesized population mean. The formula is:

\[ Z = \frac{\bar{x} – \mu}{\sigma / \sqrt{n}} \]

Where:

  • \( \bar{x} \) = sample mean
  • \( \mu \) = population mean (hypothesized mean)
  • \( \sigma \) = population standard deviation
  • \( n \) = sample size

Calculating the One-Sample Z-Test in R

In R, there isn’t a built-in function for the one-sample Z-test as there is for the t-test. However, we can calculate it manually by computing the z-score and using R to find the p-value for that z-score.

Calculating the p-value from the Z-score

Once we have the Z-score, we can calculate the p-value to interpret the result:

For a two-tailed test, the p-value is given by:

\[ p = 2 \times (1 – \text{pnorm}(|Z|)) \]

This formula calculates the probability of observing a Z-score as extreme as the calculated one in either direction (positive or negative).

Example Calculation Using an Array of Values

In some cases, we might want to calculate the Z-test using individual sample data values rather than summary statistics. Let’s walk through an example where we have an array of individual sample values.

Suppose we have test scores from 30 students, and we want to see if their average score differs from the national average of 70. We also know the population standard deviation (\( \sigma \)) is 10.

Sample Data

Here’s our sample data:

# Sample data: test scores of 30 students
scores <- c(68, 72, 71, 69, 70, 73, 68, 74, 69, 70, 67, 71, 72, 75, 70, 68, 69, 73, 70, 72, 71, 68, 74, 70, 69, 71, 73, 67, 74, 70)

# Population mean
population_mean <- 70

# Population standard deviation
population_sd <- 10

# Calculate sample mean
sample_mean <- mean(scores)

# Calculate sample size
sample_size <- length(scores)

# Calculate the Z-score
z_score <- (sample_mean - population_mean) / (population_sd / sqrt(sample_size))
z_score

# Calculate the p-value for a two-tailed test
p_value <- 2 * (1 - pnorm(abs(z_score)))
p_value

The output will display the Z-score and p-value:


# Expected Output:
Z-score: [1] 0.3286335
p-value: [1] 0.7424327

Example Calculation Using Summary Statistics

Let’s say you want to test whether the average score of a sample of 50 students is significantly different from the national average score of 70, assuming the national standard deviation is 10.


# Given values
sample_mean <- 72
population_mean <- 70
population_sd <- 10
sample_size <- 50

# Calculate the z-score
z_score <- (sample_mean - population_mean) / (population_sd / sqrt(sample_size))
z_score

# Calculate the p-value for a two-tailed test
p_value <- 2 * (1 - pnorm(abs(z_score)))
p_value

Here’s what each part of the code does:

  • We calculate the z-score as \( \frac{72 - 70}{10 / \sqrt{50}} = 1.414 \), indicating that the sample mean is 1.414 standard deviations above the population mean.
  • We then use the pnorm() function in R to find the probability of observing a z-score this extreme or more extreme if the null hypothesis were true.
[1] 1.414214
[1] 0.1572992

Interpreting the Results

In the example with summary statistics, suppose we calculated a z-score of 1.414 and a p-value of 0.1573. Here’s how to interpret these results:

  • z-score: A z-score of 1.414 tells us that the sample mean is 1.414 standard deviations above the hypothesized population mean.
  • p-value: The p-value of 0.1573 is greater than the common significance level of 0.05, so we would fail to reject the null hypothesis. This indicates that the sample mean is not significantly different from the population mean at the 5% significance level.

In this context, our findings suggest that the sample does not provide strong enough evidence to conclude that the average score differs from the national average.

Assumptions and Limitations of the One-Sample Z-Test

The one-sample Z-test is a widely used statistical test, but it comes with several key assumptions and limitations that must be considered for accurate results:

Assumptions

  • Known Population Standard Deviation (\( \sigma \)): The Z-test assumes that the population standard deviation is known. If \( \sigma \) is unknown, a one-sample t-test is generally more appropriate.
  • Normal Distribution: The test assumes that the sample data are drawn from a population with a normal distribution. However, for larger sample sizes (typically n > 30), the Central Limit Theorem allows the test to be valid even if the data are not perfectly normal.
  • Independence: The data points in the sample must be independent of each other. This means that each observation should not influence any other observation in the sample.
  • Random Sampling: The sample data should be collected through a random sampling process to ensure that the sample is representative of the population.

Limitations

  • Sample Size Requirements: The Z-test is most accurate with large sample sizes (n > 30). For smaller samples, the one-sample t-test is generally preferred, as it accounts for sample variability more effectively.
  • Sensitivity to Outliers: The Z-test can be sensitive to outliers, especially if the sample size is small. Outliers can skew the sample mean, leading to inaccurate results.
  • Dependence on Normality: Although the Z-test can tolerate slight deviations from normality with large samples, non-normal data in small samples can lead to misleading results. For non-normal distributions, alternative non-parametric tests may be more appropriate.
  • Fixed Significance Level: The Z-test typically uses a fixed significance level (e.g., 0.05) for hypothesis testing. This binary decision rule (reject or fail to reject) can oversimplify complex data relationships, potentially leading to Type I or Type II errors.

Understanding these assumptions and limitations is essential for accurate application and interpretation of the Z-test. When any of these assumptions are violated, consider using alternative statistical methods, such as the one-sample t-test or non-parametric tests, to ensure robust and reliable conclusions.

Conclusion

The one-sample Z-test is a valuable statistical tool for determining whether a sample mean differs significantly from a known population mean, especially when the population standard deviation is known and the sample size is large. The Z-test provides insights by calculating the Z-score, which indicates how many standard deviations the sample mean is from the hypothesized mean. Using the Z-score, we calculate the p-value, which helps determine the significance of the difference.

Although R doesn’t have a built-in Z-test function, calculating it manually is straightforward and effective for statistical decision-making. By following the process outlined here, you can easily use R to test your data against known benchmarks and make informed conclusions about its statistical significance.

Try the One-Sample Z-Test Calculator

For an easy way to calculate the Z-test and determine the p-value, you can check out our One-Sample Z-Test Calculator on the Research Scientist Pod.

Profile Picture
Senior Advisor, Data Science | [email protected] |  + posts

Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.

Buy Me a Coffee ✨