# 7.1: Large Sample Estimation of a Population Mean


Learning Objectives

• To become familiar with the concept of an interval estimate of the population mean.
• To understand how to apply formulas for a confidence interval for a population mean.

The Central Limit Theorem says that, for large samples (samples of size $$n \ge 30$$), when viewed as a random variable the sample mean $$\overline{X}$$ is normally distributed with mean $$\mu_{ \overline{X}}=\mu$$ and standard deviation $$\sigma_{\overline{X}}=\frac{\sigma}{\sqrt{n}}$$. The Empirical Rule says that we must go about two standard deviations from the mean to capture $$95\%$$ of the values of $$\overline{X}$$ generated by sample after sample. A more precise distance based on the normality of $$\overline{X}$$ is $$1.960$$ standard deviations, which is $$E=\frac{1.960 \sigma}{\sqrt{n}}$$.

The key idea in the construction of the $$95\%$$ confidence interval is this, as illustrated in Figure $$\PageIndex{1}$$, because in sample after sample $$95\%$$ of the values of $$\overline{X}$$ lie in the interval $$[\mu -E,\mu +E]$$, if we adjoin to each side of the point estimate $$x-a$$ “wing” of length $$E$$, $$95\%$$ of the intervals formed by the winged dots contain $$\mu$$. The $$95\%$$ confidence interval is thus $$\bar{x}\pm 1.960\frac{\sigma }{\sqrt{n}}$$. For a different level of confidence, say $$90\%$$ or $$99\%$$, the number $$1.960$$ will change, but the idea is the same.

Figure $$\PageIndex{2}$$ shows the intervals generated by a computer simulation of drawing $$40$$ samples from a normally distributed population and constructing the $$95\%$$ confidence interval for each one. We expect that about $$(0.05)(40)=2$$ of the intervals so constructed would fail to contain the population mean $$\mu$$, and in this simulation two of the intervals, shown in red, do.

It is standard practice to identify the level of confidence in terms of the area $$α$$ in the two tails of the distribution of $$\overline{X}$$ when the middle part specified by the level of confidence is taken out. This is shown in Figure $$\PageIndex{3}$$, drawn for the general situation, and in Figure $$\PageIndex{4}$$, drawn for $$95\%$$ confidence.

Remember from Section 5.4 that the $$z$$-value that cuts off a right tail of area $$c$$ is denoted $$z_c$$. Thus the number $$1.960$$ in the example is $$z_{.025}$$, which is $$z_{\frac{\alpha }{2}}$$ for $$\alpha =1-0.95=0.05$$.

For $$95\%$$ confidence the area in each tail is $$\alpha /2=0.025$$.

The level of confidence can be any number between $$0$$ and $$100\%$$, but the most common values are probably $$90\%$$ $$(\alpha =0.10)$$, $$95\%$$ $$(\alpha =0.05)$$, and $$99\%$$ $$(\alpha =0.01)$$.

Thus in general for a $$100(1-\alpha )\%$$ confidence interval, $$E=z_{\alpha /2}(\sigma /\sqrt{n})$$, so the formula for the confidence interval is $$\bar{x}\pm z_{\alpha /2}(\sigma /\sqrt{n})$$. While sometimes the population standard deviation $$\sigma$$ is known, typically it is not. If not, for $$n\geq 30$$ it is generally safe to approximate $$\sigma$$ by the sample standard deviation $$s$$.

Large Sample $$100(1-\alpha )\%$$ Confidence Interval for a Population Mean

• If $$\sigma$$ is known: $\bar{x}\pm z_{\alpha /2}\left ( \dfrac{\sigma }{\sqrt{n}} \right )$
• If $$\sigma$$ is unknown: $\bar{x}\pm z_{\alpha /2}\left ( \dfrac{s}{\sqrt{n}} \right )$

A sample is considered large when $$n\geq 30$$.

As mentioned earlier, the number

$E=z_{\alpha /2}\left ( \frac{\sigma }{\sqrt{n}} \right )$

or

$E=z_{\alpha /2}\left ( \frac{s}{\sqrt{n}} \right )$

is called the margin of error of the estimate.

Example $$\PageIndex{1}$$

Find the number $$z_{\alpha /2}$$ needed in construction of a confidence interval:

1. when the level of confidence is $$90\%$$;
2. when the level of confidence is $$99\%$$.

using the tables in Figure $$\PageIndex{5}$$ below.

Solution:

1. For confidence level $$90\%$$, $$\alpha =1-0.90=0.10$$, so $$z_{\alpha /2}=z_{0.05}$$. Since the area under the standard normal curve to the right of $$z_{0.05}$$ is $$0.05$$, the area to the left of $$z_{0.05}$$ is $$0.95$$. We search for the area $$0.9500$$ in Figure $$\PageIndex{5}$$. The closest entries in the table are $$0.9495$$ and $$0.9505$$, corresponding to $$z$$-values $$1.64$$ and $$1.65$$. Since $$0.95$$ is halfway between $$0.9495$$ and $$0.9505$$ we use the average $$1.645$$ of the $$z$$-values for $$z_{0.05}$$.
2. For confidence level $$99\%$$, $$\alpha =1-0.99=0.01$$, so $$z_{\alpha /2}=z_{0.005}$$. Since the area under the standard normal curve to the right of $$z_{0.005}$$ is $$0.005$$, the area to the left of $$z_{0.005}$$ is $$0.9950$$. We search for the area $$0.9950$$ in Figure $$\PageIndex{5}$$. The closest entries in the table are $$0.9949$$ and $$0.9951$$, corresponding to $$z$$-values $$2.57$$ and $$2.58$$. Since $$0.995$$ is halfway between $$0.9949$$ and $$0.9951$$ we use the average $$2.575$$ of the $$z$$-values for $$z_{0.005}$$.

Example $$\PageIndex{2}$$

Use Figure $$\PageIndex{6}$$ below to find the number $$z_{\alpha /2}$$ needed in construction of a confidence interval:

1. when the level of confidence is $$90\%$$;
2. when the level of confidence is $$99\%$$.

Solution:

1. In the next section we will learn about a continuous random variable that has a probability distribution called the Student $$t$$-distribution. Figure $$\PageIndex{6}$$ gives the value $$t_c$$ that cuts off a right tail of area $$c$$ for different values of $$c$$. The last line of that table, the one whose heading is the symbol $$\infty$$ for infinity and $$[z]$$, gives the corresponding $$z$$-value $$z_c$$ that cuts off a right tail of the same area $$c$$. In particular, $$z_{0.05}$$ is the number in that row and in the column with the heading $$t_{0.05}$$. We read off directly that $$z_{0.05}=1.645$$.
2. In Figure $$\PageIndex{6}$$ $$z_{0.005}$$ is the number in the last row and in the column headed $$t_{0.005}$$, namely $$2.576$$.

Figure $$\PageIndex{6}$$ can be used to find $$z_c$$ only for those values of $$c$$ for which there is a column with the heading $$t_c$$ appearing in the table; otherwise we must use Figure $$\PageIndex{5}$$ in reverse. But when it can be done it is both faster and more accurate to use the last line of Figure $$\PageIndex{6}$$ to find $$z_c$$ than it is to do so using Figure $$\PageIndex{5}$$ in reverse.

Example $$\PageIndex{3}$$

A sample of size $$49$$ has sample mean $$35$$ and sample standard deviation $$14$$. Construct a $$98\%$$ confidence interval for the population mean using this information. Interpret its meaning.

Solution:

For confidence level $$98\%$$, $$\alpha =1-0.98=0.02$$, so $$z_{\alpha /2}=z_{0.01}$$. From Figure $$\PageIndex{6}$$ we read directly that $$z_{0.01}=2.326$$.Thus

$\bar{x}\pm z_{\alpha /2}\frac{s}{\sqrt{n}}=35\pm 2.326\left ( \frac{14}{\sqrt{49}} \right )=35\pm 4.652\approx 35\pm 4.7$

We are $$98\%$$ confident that the population mean $$\mu$$ lies in the interval $$[30.3,39.7]$$, in the sense that in repeated sampling $$98\%$$ of all intervals constructed from the sample data in this manner will contain $$\mu$$.

Example $$\PageIndex{4}$$

A random sample of $$120$$ students from a large university yields mean GPA $$2.71$$ with sample standard deviation $$0.51$$. Construct a $$90\%$$ confidence interval for the mean GPA of all students at the university.

Solution:

For confidence level $$90\%$$, $$\alpha =1-0.90=0.10$$, so $$z_{\alpha /2}=z_{0.05}$$. From Figure $$\PageIndex{6}$$ we read directly that $$z_{0.05}=1.645$$. Since $$n=120$$, $$\bar{x}=2.71$$, and $$s=0.51$$,

$\bar{x}\pm z_{\alpha /2}\frac{s}{\sqrt{n}}=2.71\pm 1.645\left ( \frac{0.51}{\sqrt{120}} \right )=2.71\pm 0.0766$

One may be $$90\%$$ confident that the true average GPA of all students at the university is contained in the interval $$(2.71-0.08,2.71+0.08)=(2.63,2.79)$$.

Key Takeaway

• A confidence interval for a population mean is an estimate of the population mean together with an indication of reliability.
• There are different formulas for a confidence interval based on the sample size and whether or not the population standard deviation is known.
• The confidence intervals are constructed entirely from the sample data (or sample data and the population standard deviation, when it is known).

## Contributor

• Anonymous

This page titled 7.1: Large Sample Estimation of a Population Mean is shared under a CC BY-NC-SA 3.0 license and was authored, remixed, and/or curated by via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.