Prediction Interval and Confidence Interval

As it has multiple approaches to explain concepts, I’ll utilize a few comparisons.

High level explanation

Confidence intervals tell you about how well you have determined the mean. Assume that the data really are randomly sampled from a Gaussian distribution. If you do this many times, and calculate a confidence interval of the mean from each sample, you’d expect about 95 % of those intervals to include the true value of the population mean. The key point is that the confidence interval tells you about the likely location of the true population parameter.

Prediction intervals tell you where you can expect to see the next data point sampled. Assume that the data really are randomly sampled from a Gaussian distribution. Collect a sample of data and calculate a prediction interval. Then sample one more value from the population. If you do this many times, you’d expect that next value to lie within that prediction interval in 95% of the samples.The key point is that the prediction interval tells you about the distribution of values, not the uncertainty in determining the population mean.

Prediction intervals must account for both the uncertainty in knowing the value of the population mean, plus data scatter (variance). So a prediction interval is always wider than a confidence interval.

The difference between a prediction interval and a confidence interval is the standard error.

The standard error for a confidence interval on the mean takes into account the uncertainty due to sampling. The line you computed from your sample will be different from the line that would have been computed if you had the entire population, the standard error takes this uncertainty into account.

The standard error for a prediction interval on an individual observation takes into account the uncertainty due to sampling like above, but also takes into account the variability of the individuals around the predicted mean. The standard error for the prediction interval will be wider than for the confidence interval and hence the prediction interval will be wider than the confidence interval.

Mathematical Explanation

Let’s consider a simple linear regression model:

\[y = \alpha + \beta x + \epsilon\]

Where y is the response variable, x is the explanatory variable, α and β are the model parameters, and ε is the random error term. This error term is assumed to be normally distributed with a mean of 0 and a variance of σ².

Since we typically don’t know the true error variance σ², we must estimate it from our sample data. We use the Mean Squared Error (MSE) for this, which represents the average of the squared differences between the observed and predicted values.

Confidence Interval for the Mean Response

A confidence interval provides a range for the average value of y for a given x. The formula is:

\[\hat{y} \pm t_{n-2, \alpha/2} \cdot \sqrt{MSE \left( \frac{1}{n} + \frac{(x - \bar{x})^2}{\sum(x_i - \bar{x})^2} \right)}\]

The term inside the square root represents the variance of our prediction for the mean response. It’s composed of two parts:

  • Uncertainty in the intercept (α): The 1/n term reflects the variability in estimating the model’s intercept.
  • Uncertainty in the slope (β): The second term, (x - x̄)² / Σ(xᵢ - x̄)², accounts for the variability in estimating the slope. This term increases as x moves further from the data’s center (), making predictions less certain at the extremes.

Prediction Interval for a New Observation

A prediction interval provides a range for a single future observation. It must account for the same uncertainty as the confidence interval, plus the inherent variability of an individual data point. The formula is:

\[\hat{y} \pm t_{n-2, \alpha/2} \cdot \sqrt{MSE \left( 1 + \frac{1}{n} + \frac{(x - \bar{x})^2}{\sum(x_i - \bar{x})^2} \right)}\]

Notice the extra 1 inside the square root. This accounts for the irreducible error (σ², estimated by MSE) of a single observation. This additional source of variance is why the prediction interval is always wider than the confidence interval.

Why Prediction Intervals Can Be Too Narrow

While the formulas above provide a solid foundation for understanding prediction intervals, they highlight a critical limitation, especially in complex models like those used for time series forecasting.

As explained in Forecasting: Principles and Practice, almost all prediction intervals from time series models are too narrow. This happens because they fail to account for all sources of uncertainty.

There are at least four sources of uncertainty in forecasting:

  1. The random error term (ε): This is the inherent, irreducible randomness in the data generating process. The standard prediction interval formula does account for this.
  2. The parameter estimates: The model parameters (like α and β in our linear regression example) are estimated from data and are not the true values. This introduces uncertainty.
  3. The choice of model: We choose a model (e.g., linear regression, ETS, ARIMA) based on the historical data, but this choice itself is a source of uncertainty. A different model might be better.
  4. The continuation of the historical data generating process: We assume the underlying process that generated the past data will remain the same in the future. This assumption can be wrong.

Standard prediction intervals, like the one we derived for linear regression, generally only account for the first source of uncertainty, and sometimes the second. They rarely account for model uncertainty or the possibility of the data-generating process changing.

This is a primary reason why we need more advanced techniques like bootstrapping and conformal prediction. These methods are designed to create more robust prediction intervals that better capture the total uncertainty, leading to more reliable forecasts.

Tags:

Updated: