Skip to main content
Statistics LibreTexts

8.2: A Confidence Interval When the Population Standard Deviation Is Known or Large Sample Size

  • Page ID
    40772
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    A confidence interval for a population mean, when the population standard deviation is known based on the conclusion of the Central Limit Theorem that the sampling distribution of the sample means follow an approximately normal distribution.

    Calculating the Confidence Interval

    Consider the standardizing formula for the sampling distribution developed in the discussion of the Central Limit Theorem:

    \[Z_1=\dfrac{\bar{X}-\mu_{\bar{X}}}{\sigma_{\bar{X}}}=\dfrac{\bar{X}-\mu}{\sigma / \sqrt{n}}\]

    Notice that \(\mu\) is substituted for \(\mu_x\) because we know that the expected value of \(\mu_x\) is \(\mu\) from the Central Limit theorem and \(\sigma_{\bar{x}}\) is replaced with \(\sigma / \sqrt{n}\), also from the Central Limit Theorem.

    In this formula we know \(\bar{X}, \sigma_x\) and n , the sample size. (In actuality we do not know the population standard deviation, but we do have a point estimate for it, s, from the sample we took. More on this later.) What we do not know is \(\mu\) or \(Z_1\). We can solve for either one of these in terms of the other. Solving for \(\mu\) in terms of \(Z_1\) gives:

    \[\mu=\bar{X} \pm Z_1 \sigma / \sqrt{n}\]

    Remembering that the Central Limit Theorem tells us that the distribution of the \(\bar{X}\) 's, the sampling distribution for means, is normal, and that the normal distribution is symmetrical, we can rearrange terms thus:

    \[\bar{X}-Z_\alpha(\sigma / \sqrt{n}) \leq \mu \leq \bar{X}+Z_\alpha(\sigma / \sqrt{n})\]

    This is the formula for a confidence interval for the mean of a population.Notice that \(Z_\alpha\) has been substituted for \(Z_1\) in this equation. This is where a choice must be made by the statistician. The analyst must decide the level of confidence they wish to impose on the confidence interval. \(\alpha\) is the probability that the interval will not contain the true population mean. The confidence level is defined as \((1-\alpha) . Z_\alpha\) is the number of standard deviations \(\bar{X}\) lies from the mean with a certain probability. If we chose \(\mathrm{Z}_\alpha=1.96\) we are asking for the \(95 \%\) confidence interval because we are setting the probability that the true mean lies within the range at 0.95 . If we set \(Z_\alpha\) at 1.64 we are asking for the \(90 \%\) confidence interval because we have set the probability at 0.90 . These numbers can be verified by consulting the Standard Normal table. Divide either 0.95 or 0.90 in half and find that probability inside the body of the table. Then read on the top and left margins the number of standard deviations it takes to get this level of probability.

    In reality, we can set whatever level of confidence we desire simply by changing the \(Z_\alpha\) value in the formula. It is the analyst's choice. Common convention in Economics and most social sciences sets confidence intervals at either 90, 95, or 99 percent levels. Levels less than \(90 \%\) are considered of little value. The level of confidence of a particular interval estimate is called by (1-a).

    A good way to see the development of a confidence interval is to graphically depict the solution to a problem requesting a confidence interval. This is presented in Figure 8.2 for the example in the introduction concerning the number of downloads from Apple Music. That case was for a 95% confidence interval, but other levels of confidence could have just as easily been chosen depending on the need of the analyst. However, the level of confidence MUST be pre-set and not subject to revision as a result of the calculations.

    ..
    Figure \(\PageIndex{1}\)

    \begin{array}{c}
    \mu=\bar{X} \pm Z_\alpha\left(\dfrac{\sigma}{\sqrt{n}}\right) \\
    =2 \pm 1.96(0.1) \\
    =2 \pm 0.196 \\
    1.804 \leq \mu \leq 2.196
    \end{array}

    For this example, let's say we know that the actual population mean number of Apple Music downloads is 2.1. The true population mean falls within the range of the 95% confidence interval. There is absolutely nothing to guarantee that this will happen. Further, if the true mean falls outside of the interval we will never know it. We must always remember that we will never ever know the true mean. Statistics simply allows us, with a given level of probability (confidence), to say that the true mean is within the range calculated. This is what was called in the introduction, the "level of ignorance admitted".

    Changing the Confidence Level or Sample Size

    Here again is the formula for a confidence interval for an unknown population mean assuming we know the population standard deviation:

    \[\bar{X}-Z_\alpha(\sigma / \sqrt{n}) \leq \mu \leq \bar{X}+Z_\alpha(\sigma / \sqrt{n})\]

    It is clear that the confidence interval is driven by two things, the chosen level of confidence, \(Z_\alpha\), and the standard deviation of the sampling distribution. The Standard deviation of the sampling distribution is further affected by two things, the standard deviation of the population and the sample size we chose for our data. Here we wish to examine the effects of each of the choices we have made on the calculated confidence interval, the confidence level and the sample size.

    For a moment we should ask just what we desire in a confidence interval. Our goal was to estimate the population mean from a sample. We have forsaken the hope that we will ever find the true population mean, and population standard deviation for that matter, for any case except where we have an extremely small population and the cost of gathering the data of interest is very small. In all other cases we must rely on samples. With the Central Limit Theorem we have the tools to provide a meaningful confidence interval with a given level of confidence, meaning a known probability of being wrong. By meaningful confidence interval we mean one that is useful. Imagine that you are asked for a confidence interval for the ages of your classmates. You have taken a sample and find a mean of 19.8 years. You wish to be very confident so you report an interval between 9.8 years and 29.8 years. This interval would certainly contain the true population mean and have a very high confidence level. However, it hardly qualifies as meaningful. The very best confidence interval is narrow while having high confidence. There is a natural tension between these two goals. The higher the level of confidence the wider the confidence interval as the case of the students' ages above. We can see this tension in the equation for the confidence interval.

    \[\mu=\bar{x} \pm Z_\alpha\left(\dfrac{\sigma}{\sqrt{n}}\right)\]

    The confidence interval will increase in width as \(Z \alpha\) increases, \(Z \alpha\) increases as the level of confidence increases. There is a tradeoff between the level of confidence and the width of the interval. Now let's look at the formula again and we see that the sample size also plays an important role in the width of the confidence interval. The sample size, \(n\), shows up in the denominator of the standard deviation of the sampling distribution. As the sample size increases, the standard deviation of the sampling distribution decreases and thus the width of the confidence interval, while holding constant the level of confidence. Again we see the importance of having large samples for our analysis although we then face a second constraint, the cost of gathering data.

    Exercise \(\PageIndex{1}\)

    Suppose we are interested in the mean scores on an exam. A random sample of 36 scores is taken and gives a sample mean (sample mean score) of 68 \((\bar{X}=68)\). In this example we have the unusual knowledge that the population standard deviation is 3 points. Do not count on knowing the population parameters outside of textbook examples. Find a confidence interval estimate for the population mean exam score (the mean score on all exams).

    Problem

    Find a 90% confidence interval for the true (population) mean of statistics exam scores.

    e9dafce9b91ab40d3068f894f7246bc5744ea0da
    Figure \(\PageIndex{1}\): Copy and Paste Caption here. (Copyright; author via source)
    Answer

    The solution is shown step by step:

    The formula for a confidence interval for an unknown population mean assuming we know the population standard deviation is:

    \[\bar{X}-Z_\alpha\left(\dfrac{\sigma}{\sqrt{n}}\right) \leq \mu \leq \bar{X}+Z_\alpha\left(\dfrac{\sigma}{\sqrt{n}}\right)\]

    For a \(90 \%\) confidence interval, visualize an area of 0.90 centered under the normal curve (See Figure (\PageIndex{1}\)). The remaining area for the two tails of the normal distribution is then 0.10 , which indicates that the area in the left tail is one-half of 0.10 , which is 0.05 . The corresponding \(z\)-score that cuts off an area of 0.05 in the left tail is 1.645 .

    In this example we are given that the population standard deviation \(\sigma=3\).

    We are also given that the sample size \(n=36\) and the sample mean \(\bar{X}=68\).

    Substituting these values in the confidence interval formula results in the following:

    \[\begin{aligned}
    \bar{X}-Z_\alpha\left(\dfrac{\sigma}{\sqrt{n}}\right) & \leq \mu \leq \bar{X}+Z_\alpha\left(\dfrac{\sigma}{\sqrt{n}}\right) \\
    68-1.645\left(\dfrac{3}{\sqrt{36}}\right) & \leq \mu \leq 68+1.645\left(\dfrac{3}{\sqrt{36}}\right) \\
    68-0.8225 & \leq \mu \leq 68+0.8225 \\
    67.1775 & \leq \mu \leq 68.8225
    \end{aligned}\]

    We estimate with \(90 \%\) confidence that the true population mean exam score for all statistics students is between 67.18 and 68.82.

    Try It \(\PageIndex{1}\)

    Suppose average pizza delivery times are normally distributed with an unknown population mean and a population standard deviation of six minutes. A random sample of 28 pizza delivery restaurants is taken and has a sample mean delivery time of 36 minutes.

    Find a 90% confidence interval estimate for the population mean delivery time.

    Exercise \(\PageIndex{2}\)

    Suppose we change the original problem in Example (\PageIndex{2}\)) by using a 95% confidence level. Find a 95% confidence interval for the true (population) mean statistics exam score.

    Answer
    52e55fda522395e6efa7d09dc1fc23501d9f87f8
    Figure \(\PageIndex{1}\)

    \[\begin{array}{l}
    \quad \mu=\bar{x} \pm Z_\alpha\left(\dfrac{\sigma}{\sqrt{n}}\right) \\
    \mu=68 \pm 1.96\left(\dfrac{3}{\sqrt{36}}\right) \\
    \quad 67.02 \leq \mu \leq 68.98 \\
    \text { s } 95 \%(C L=0.95) . \\
    =0.05
    \end{array}\]

    \[\sigma=3 ; n=36 ; \text { The confidence level is } 95 \%(C L=0.95) .\]

    \[C L=0.95 \text { so } \alpha=1-C L=1-0.95=0.05\]

    \[Z_{\dfrac{\alpha}{2}}=Z_{0.025}=1.96\]

    Notice that the plus/minus term in the equation is larger for a \(95 \%\) confidence level in the original problem.
    Comparing the results: The \(90 \%\) confidence interval is (67.18, 68.82). The \(95 \%\) confidence interval is (67.02, 68.98). The \(95 \%\) confidence interval is wider. If you look at the graphs, because the area 0.95 is larger than the area 0.90, it makes sense that the 95\% confidence interval is wider. To be more confident that the confidence interval actually does contain the true value of the population mean for all statistics exam scores, the confidence interval necessarily needs to be wider. This demonstrates a very important principle of confidence intervals. There is a trade off between the level of confidence and the width of the interval. Our desire is to have a narrow confidence interval, huge wide intervals provide little information that is useful. But we would also like to have a high level of confidence in our interval. This demonstrates that we cannot have both.

    Part (a) shows a normal distribution curve. A central region with area equal to 0.90 is shaded. Each unshaded tail of the curve has area equal to 0.05. Part (b) shows a normal distribution curve. A central region with area equal to 0.95 is shaded. Each unshaded tail of the curve has area equal to 0.025.
    Figure \(\PageIndex{3}\): Copy and Paste Caption here. (Copyright; author via source)

    Summary: Effect of Changing the Confidence Level

    • Increasing the confidence level makes the confidence interval wider.
    • Decreasing the confidence level makes the confidence interval narrower.

    And again here is the formula for a confidence interval for an unknown mean assuming we have the population standard deviation:

    \[\bar{X}-Z_\alpha(\sigma / \sqrt{n}) \leq \mu \leq \bar{X}+Z_\alpha(\sigma / \sqrt{n})\]

    The standard deviation of the sampling distribution was provided by the Central Limit Theorem as \(\sigma / \sqrt{n}\). While we infrequently get to choose the sample size it plays an important role in the confidence interval. Because the sample size is in the denominator of the equation, as \(n\) increases it causes the standard deviation of the sampling distribution to decrease and thus the width of the confidence interval to decrease. We have met this before as we reviewed the effects of sample size on the Central Limit Theorem. There we saw that as \(n\) increases the sampling distribution narrows until in the limit it collapses on the true population mean.

    Try It \(\PageIndex{2}\)

    Refer back to the pizza-delivery Try It \(\PageIndex{1}\) exercise. The population standard deviation is six minutes and the sample mean deliver time is 36 minutes. Use a sample size of 20. Find a 95% confidence interval estimate for the true mean pizza delivery time.

    Changing the Sample Size

    Exercise \(\PageIndex{3}\)

    Suppose we change the original problem in Example \(\PageIndex{1}\) to see what happens to the confidence interval if the sample size is changed.

    Problem

    Leave everything the same except the sample size. Use the original 90% confidence level. a. What happens to the confidence interval if we increase the sample size and use n = 100 instead of n = 36? b. What happens if we decrease the sample size to n = 25 instead of n = 36?

    Answer

    a.

    \[\begin{array}{l}
    \mu=\bar{x} \pm Z_\alpha\left(\dfrac{\sigma}{\sqrt{n}}\right) \\
    \mu=68 \pm 1.645\left(\dfrac{3}{\sqrt{100}}\right) \\
    67.5065 \leq \mu \leq 68.4935
    \end{array}\]

    If we increase the sample size \(n\) to 100, we decrease the width of the confidence interval relative to the original sample size of 36 observations.

    b.

    \[\begin{array}{l}
    \mu=\bar{x} \pm Z_\alpha\left(\dfrac{\sigma}{\sqrt{n}}\right) \\
    \mu=68 \pm 1.645\left(\dfrac{3}{\sqrt{25}}\right) \\
    67.013 \leq \mu \leq 68.987
    \end{array}\]

    If we decrease the sample size \(n\) to 25, we increase the width of the confidence interval by comparison to the original sample size of 36 observations.

    Summary: Effect of Changing the Sample Size

    • Increasing the sample size makes the confidence interval narrower.
    • Decreasing the sample size makes the confidence interval wider.
    Try It \(\PageIndex{3}\)

    Refer back to the pizza-delivery Try It \(\PageIndex{1}\) exercise. The mean delivery time is 36 minutes and the population standard deviation is six minutes. Assume the sample size is changed to 50 restaurants with the same sample mean. Find a 90% confidence interval estimate for the population mean delivery time.

    We have already seen this effect when we reviewed the effects of changing the size of the sample, \(n\), on the Central Limit Theorem. See the figures at the bottom of Chapter 7 to see this effect. Before we saw that as the sample size increased the standard deviation of the sampling distribution decreases. This was why we choose the sample mean from a large sample as compared to a small sample, all other things held constant.

    Thus far we assumed that we knew the population standard deviation. This will virtually never be the case. We will have the sample standard deviation, \(s\), however. This is a point estimate for the population standard deviation and can be substituted into the formula for confidence intervals for a mean under certain circumstances. We just saw the effect the sample size has on the width of confidence interval and the impact on the sampling distribution for our discussion of the Central Limit Theorem. We can invoke this to substitute the point estimate for the standard deviation if the sample size is large "enough". Simulation studies indicate that 30 observations or more will be sufficient to eliminate any meaningful bias in the estimated confidence interval.

    Exercise \(\PageIndex{4}\)

    Spring break can be a very expensive holiday. A sample of 80 students is surveyed, and the average amount spent by students on travel and beverages is $593.84. The sample standard deviation is approximately $369.34.

    Answer

    We begin with the confidence interval for a mean. We use the formula for a mean because the random variable is dollars spent and this is a continuous random variable. The point estimate for the population standard deviation, \(s\), has been substituted for the true population standard deviation because with 80 observations there is no concern for bias in the estimate of the confidence interval.

    \[\mu=\bar{x} \pm\left[Z_{(\mathrm{a} / 2)} \dfrac{s}{\sqrt{n}}\right]\]

    Substituting the values into the formula, we have:

    \[\mu=593.84 \pm\left[1.75 \dfrac{369.34}{\sqrt{80}}\right]\]

    \(Z_{(a / 2)}\) is found on the standard normal table by looking up 0.46 in the body of the table and finding the number of standard deviations on the side and top of the table; 1.75 . The solution for the interval is thus:

    \[\mu=593.84 \pm 72.2636=(521.57,666.10)\]

    \[\$ 521.58 \leq \mu \leq \$ 666.10\]

    26d78a1e4d3b6382ba94cf3c1298aed4e9dcfa6a
    Figure \(\PageIndex{4}\)
    Try It \(\PageIndex{4}\)

    The price of a chair is a large range of cost. The average cost of 25 chairs in a store is $100. The sample standard deviation is $50. Construct a 92% confidence interval for the population mean of the cost of chairs.

    Formula Review

    The general form for a confidence interval for a single population mean, known standard deviation, normal distribution is given by \(\bar{X}-Z_\alpha(\sigma / \sqrt{n}) \leq \mu \leq \bar{X}+Z_\alpha(\sigma / \sqrt{n})\) This formula is used when the population standard deviation is known.

    \(C L=\) confidence level, or the proportion of confidence intervals created that are expected to contain the true population parameter

    \(\alpha=1-C L=\) the proportion of confidence intervals that will not contain the population parameter

    \(z_{\dfrac{\alpha}{2}}=\) the \(z\)-score with the property that the area to the right of the \(z\)-score is \(\dfrac{\alpha}{2}\) this is the \(z\)-score used in the calculation where \(\alpha=1-C L\).


    This page titled 8.2: A Confidence Interval When the Population Standard Deviation Is Known or Large Sample Size is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by OpenStax via source content that was edited to the style and standards of the LibreTexts platform.