Most people are familiar with the idea of a “margin of error” for political polls. These polls usually try to provide an answer that is accurate within +/- 3 percent. For example, when a candidate is estimated to win an election by 9 percentage points with a margin of error of 3, the percentage by which they will win is estimated to fall within 6-12 percentage points. In statistics we refer to this range of values as the confidence interval, which provides a measure of our degree of uncertainty about how close our estimate is to the population parameter. The larger the condidence interval, the greater our uncertainty.
We saw in the previous section that with sufficient sample size, the sampling distribution of the mean is normally distributed, and that the standard error describes the standard deviation of this sampling distribution. Using this knowledge, we can ask: What is the range of values within which we would expect to capture 95% of all estimates of the mean? To answer this, we can use the normal distribution, for which we know the values between which we expect 95% of all sample means to fall. Specifically, we use the quantile function for the normal distribution (
qnorm() in R) to determine the values of the normal distribution that below which 2.5% and 97.5% of the distribution falls. We choose these points because we want to find the 95% of values in the center of the distribution, so we need to cut off 2.5% on each end in order to end up with 95% in the middle. Figure 12.3 shows that this occurs for .
Using these cutoffs, we can create a confidence interval for the estimate of the mean:
Let’s compute the confidence interval for the NHANES height data.
|Sample mean||SEM||Lower bound of CI||Upper bound of CI|
Confidence intervals are notoriously confusing, primarily because they don’t mean what we would hope they mean. It seems natural to think that the 95% confidence interval tells us that there is a 95% chance that the population mean falls within the interval. However, as we will see throughout the course, concepts in statistics often don’t mean what we think they should mean. In the case of confidence intervals, we can’t interpret them in this way because the population parameter has a fixed value – it either is or isn’t in the interval. The proper interpretation of the 95% confidence interval is that it is an interval that will contain the true population mean 95% of the time. We can confirm this by obtaining a large number of samples from the NHANES data and counting how often the interval contains the true population mean. The proportion of confidence intervals contaning the true mean is (0.94)This confirms that the confidence interval does indeed capture the population mean about 95% of the time.