Enter predictor values, response values, an individual predictor value, a confidence level, and the number of decimal places.
Prediction Interval:
Understanding the Prediction Interval in Linear Regression
The prediction interval provides a range within which a future observation is likely to fall, based on the current linear regression model. Unlike a confidence interval, which estimates the range for the mean response, a prediction interval accounts for individual variability, making it wider.
Key Components of the Prediction Interval
- Regression Equation: The formula \( \hat{y} = b_0 + b_1 x \) estimates the expected value of \( y \) based on \( x \). Here, \( b_0 \) is the intercept, and \( b_1 \) is the slope.
- Confidence Level: The probability, usually set at 90%, that the prediction interval will contain the true value of a future observation.
- t-Score: The critical value from the t-distribution, based on the confidence level and degrees of freedom, is used to scale the interval.
Formula for the Prediction Interval
The prediction interval for a given predictor value \( x \) is calculated as:
where:
- \( \hat{y} \): Predicted value based on the regression equation.
- \( t \): t-score for the specified confidence level and degrees of freedom.
- \( s \): Standard error of the regression.
- \( n \): Sample size.
- \( \bar{x} \): Mean of the predictor values.
- \( x_i \): Individual predictor values.
Programmatically Calculating the Prediction Interval
Below are examples for calculating the prediction interval in JavaScript, Python, and R using the new inputs:
1. Using JavaScript (with jStat)
In JavaScript, the jStat library can be used to calculate the critical t-score for the prediction interval:
// Define inputs
const xValues = [10, 12, 15, 18, 20, 25, 28, 30, 32, 35];
const yValues = [35, 40, 45, 50, 53, 60, 65, 68, 70, 75];
const individualValue = 22;
const confidenceLevel = 0.95;
const n = xValues.length;
const xMean = jStat.mean(xValues);
const yMean = jStat.mean(yValues);
// Calculate b1 (slope) and b0 (intercept)
const b1 = jStat.covariance(xValues, yValues) / jStat.variance(xValues, true);
const b0 = yMean - b1 * xMean;
// Predicted value (y-hat)
const yHat = b0 + b1 * individualValue;
// Calculate residual sum of squares and standard error
let rss = 0;
for (let i = 0; i < n; i++) {
rss += Math.pow(yValues[i] - (b0 + b1 * xValues[i]), 2);
}
const s = Math.sqrt(rss / (n - 2));
// t-score
const tScore = jStat.studentt.inv(1 - (1 - confidenceLevel) / 2, n - 2);
// Prediction interval
const marginError = tScore * s * Math.sqrt(1 + (1 / n) + Math.pow(individualValue - xMean, 2) / jStat.sum(xValues.map(x => Math.pow(x - xMean, 2))));
const lowerBound = yHat - marginError;
const upperBound = yHat + marginError;
console.log(`Prediction Interval: [${lowerBound.toFixed(2)}, ${upperBound.toFixed(2)}]`);
2. Using Python (with SciPy)
In Python, the SciPy library can be used to calculate the prediction interval as follows:
import numpy as np
from scipy.stats import t
# Define inputs
x_values = np.array([10, 12, 15, 18, 20, 25, 28, 30, 32, 35])
y_values = np.array([35, 40, 45, 50, 53, 60, 65, 68, 70, 75])
individual_value = 22
confidence_level = 0.95
n = len(x_values)
x_mean = np.mean(x_values)
y_mean = np.mean(y_values)
# Calculate b1 (slope) and b0 (intercept)
b1 = np.cov(x_values, y_values, bias=True)[0, 1] / np.var(x_values, ddof=0)
b0 = y_mean - b1 * x_mean
# Predicted value (y-hat)
y_hat = b0 + b1 * individual_value
# Calculate residual sum of squares and standard error
rss = np.sum((y_values - (b0 + b1 * x_values)) ** 2)
s = np.sqrt(rss / (n - 2))
# t-score
t_score = t.ppf(1 - (1 - confidence_level) / 2, n - 2)
# Prediction interval
margin_error = t_score * s * np.sqrt(1 + (1 / n) + ((individual_value - x_mean) ** 2 / np.sum((x_values - x_mean) ** 2)))
lower_bound = y_hat - margin_error
upper_bound = y_hat + margin_error
print(f"Prediction Interval: [{lower_bound:.2f}, {upper_bound:.2f}]")
3. Using R
In R, the stats package includes functions to calculate the prediction interval:
# Define inputs
x_values <- c(10, 12, 15, 18, 20, 25, 28, 30, 32, 35)
y_values <- c(35, 40, 45, 50, 53, 60, 65, 68, 70, 75)
individual_value <- 22
confidence_level <- 0.95
n <- length(x_values)
x_mean <- mean(x_values)
y_mean <- mean(y_values)
# Calculate b1 (slope) and b0 (intercept)
b1 <- cov(x_values, y_values) / var(x_values)
b0 <- y_mean - b1 * x_mean
# Predicted value (y-hat)
y_hat <- b0 + b1 * individual_value
# Residual sum of squares and standard error
rss <- sum((y_values - (b0 + b1 * x_values))^2)
s <- sqrt(rss / (n - 2))
# t-score
t_score <- qt(1 - (1 - confidence_level) / 2, n - 2)
# Prediction interval
margin_error <- t_score * s * sqrt(1 + (1 / n) + ((individual_value - x_mean)^2 / sum((x_values - x_mean)^2)))
lower_bound <- y_hat - margin_error
upper_bound <- y_hat + margin_error
cat("Prediction Interval:", round(lower_bound, 2), "to", round(upper_bound, 2), "\n")
Further Reading
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.