Skip to main content
Statistics LibreTexts

7.2: Confidence Intervals for a Population Proportion

  • Page ID
    58917
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Confidence Intervals for a Population Proportion

    Imagine a psychology researcher is studying anxiety among college students. They survey a random sample of 200 students at several two-year colleges. The students complete the Generalized Anxiety Exam (GAD-7) where a score higher than 15 indicates "severe" levels of anxiety. In the sample of 200 students, 28 screen positive for severe anxiety, giving a proportion point estimate of \(\hat{p} = 28/200 = 0.14\)

    The researcher wants to estimate the true proportion of all two-year college students that experience severe anxiety, not just the proportion in their sample. The researcher ensured that their sample is unbiased and representative, but they know the sample proportion will probably vary a little, simply due to randomness.

    The researcher computes a 95% confidence interval of \(0.09 \leq p \leq 0.19\) which gives an interval estimate for the true population proportion. This confidence interval is defined using a technique that guarantees, if this study would be repeated over and over and over with a new random independent sample each time, then 95% of the the constructed confidence intervals would contain the true proportion. In simpler terms, we can say the probability of a random sample being selected that contains the true proportion is 0.95. Using this technique, our confidence interval does not guarantee that the true population value is in the interval, only that we have a certain level of confidence in the methodology used to calculate the interval.

    We often times simply report that we are 95% confident that the interval \(0.09 \leq p \leq 0.19\) contains the true population proportion.

    This concept is exceedingly abstract! We will come back to the conceptual ideas after we actually practice the techniques to compute these for proportions.


    Definition: Confidence Level

    A confidence level is the probability that we pick a random sample, assuming fairly and unbiasedly sampled, that will generate a confidence interval that contains the true population parameter. Common confidence levels are 0.9, 0.95 or 0.99.

    This confidence level is chosen, not calculated. We decide upon a confidence level, and then build our confidence interval in a particular way to guarantee this result. It comes from the central limit theorem, where we know the sample distribution fits a normal distribution.

    We define \(\alpha = 1 - \text{confidence level}\), where \(\alpha\) is the lowercase Greek letter alpha. For example, if our confidence level is 95%, then \(\alpha = 1 - 0.95 = 0.05\)

    Different research areas have different conventions and expectations for confidence levels. Additionally, we may be limited in the useful confidence level we can have based on the size of our sample.

    The Central Limit Theorem Returns

    Recall from last chapter, the Central Limit Theorem tells us that the sampling distribution approaches a normal distribution as our sample size increases. If we theoretically knew the population proportion, we could center a normal distribution with mean \(p\) and standard deviation \(\sqrt{p(1-p)/n}\) and determine the sample proportions that would lie at the boundaries of the middle 95%.

    However, since we do not know \(p\), we cannot do so. Instead, with the same logic, we could center the same normal distribution at \(\hat{p}\). Since there is a 95% chance that our random selection of data yielded a sample proportion in the middle 95% around the unknown population value, then there is a 95% chance a same sized interval centered at \(\hat{p}\) lies on top of the population value! This implies that we should try to find the specific values that give us the boundaries of 95% of a normal distribution centered at \(\hat{p}\) with standard deviation \(\sqrt{p(1-p)/n}\).

    You may notice however, a big problem in the standard deviation; it relies upon knowing \(\hat{p}\)! The final piece of the puzzle is to approximate this standard deviation with our sample values as follows: \(\sqrt{\hat{p}(1-\hat{p})/n} \approx \sqrt{p(1-p)/n}\).

    Critical Values

    A critical value is a \(z\) or \(t\) score that separates regions on a distribution. In our case, it allows us to specify where the boundaries of the confidence interval lie. We often denote the critical value as \(z^*\) or more specifically \(z_{\alpha/2\}\) which can be shown in the below diagram.

    It is conventional to use the \(z\) score on the right of the distribution. If we are interested in the middle 95%, then this means each tail contains 2.5% of the distribution. Therefore we would use a critical value corresponding to the 97.5 percentile value of the normal distribution.

    In general, a \(z\)-score is defined as the number of standard deviations from the mean:

    \[ z = \frac{x - \mu}{\sigma}\]

    In our case, we are placing the sample distribution around the sample proportion. Therefore we have:

    \[ z^* = \frac{p - \hat{p}}{\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}}\]

    Since we are interested in knowing suggested bounds for \(p\), we can solve for it algebraically:

    \[p = \hat{p} + z^*\cdot \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]

    This would give us the upper bound, but due to the symmetry of the normal distribution, we can simply switch the \(+\) to a \(-\) to recover the lower bound. This is summarized below:

    Wald Confidence Interval for a Population Proportion

    A Wald confidence interval for a population proportion gives a range of values around our sample proportion such that the true population proportion \( p \) is captured at a probability of our confidence level. Our interval is given by:

    \[ \hat{p} - z^* \cdot \sqrt{ \frac{ \hat{p}(1 - \hat{p}) }{n} } < p < \hat{p} + z^* \cdot \sqrt{ \frac{ \hat{p}(1 - \hat{p}) }{n} } \]

    • \( \hat{p} \): sample proportion
    • \( n \): sample size
    • \( z^* \): critical value from the standard normal distribution for the desired confidence level

    It is important to note that this confidence interval assumes three crucial things:

    • Our sample was unbiased, representative, and random
    • Our sample size is sufficiently large
    • Our proportions are sufficiently far from extreme (0 or 1)

    These last two conditions can be checked by ensuring that:

    \[ n\hat{p} \geq 10 \quad \text{and}\quad n(1-\hat{p}) \geq 10\]


    Example 1: Voter Poll

    A random sample of 350 voters shows that 195 support Proposition A. Construct a 95% confidence interval for the true population proportion.

    First we calculate our statistic: \( \hat{p} = \frac{195}{350} \approx 0.557 \)

    To even consider constructing this interval, we need to guarantee our conditions: \( n\hat{p} = 195, n(1 - \hat{p}) = 155 \), so both are greater than 10.

    Our standard error can be calculated as: \( \sqrt{ (0.557)(0.443) / 350 } \approx 0.0265 \)

    Since we are using a 95% confidence interval, \(alpha = 1 - 0.95 = 0.5\) and therefore we need to identify the \(z\)-score at the 97.5% percentile (\alpha/2\). We can use technology or a \(z\)-table to find our critical value: \(z^*\approx 1.96\). It may be worth keeping some common critical values in your notes.

    Finally, we calculate the confidence interval: \((0.557-0.052, 0.557+0.052) = (0.505, 0.609) \)

    To state this interval in words, we say: We are 95% confident that between 50.5% and 60.9% of all voters support Proposition A.


    Example 2: Collegiate AI Usage

    In a survey of 480 college students, it was found that 403 students reported using AI tools to help them with their classwork. Let us compute a 99% confidence interval around this statistic.

    First we calculate the sample proportion of \(\hat{p} = 0.84\) and check the validity of our confidence interval. Since \(n\hat{p} = 403\) and \(n(1-\hat{p}) = 77\), we are well within capability to ensure an accurate confidence interval. We next compute the standard error:

    \[SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} = \sqrt{\frac{0.84\cdot 0.16}{480}} = 0.017\]

    This problem has proceeded as the previous, but now we are faced with a 99% confidence interval. This means that \(\alpha = 1 - 0.99 = 0.01\). In intuitive terms, we are looking for the middle 99% of the interval, so \(alpha\) represents the 1% to each of the tails. In order to use our \(z\)-table or spreadsheet formulas, we need the right endpoint, where 0.5% of the distribution will be to the right, and 99.5% to the left. Therefore we find the \(z\)-score corresponding to the 99.5 percentile of the standard normal, which is \(z^* = 2.576\). Our final confidence interval is calculated as:

    \[0.84 \pm (2.576)\cdot (0.017) \rightarrow 0.80 \leq p \leq 0.88\]


    Example 3: Defective Products

    Suppose that you are a quality control engineer and you inspect 160 items and find that 7 are defective.

    In this case, \(n\cdot \hat{p} = 7\) which is too small to effectively trust the Wald confidence interval method. We would have to use a more complicated technique for calculating this confidence interval that falls outside the scope of the course.


    Summary

    point estimate \(\hat{p}\) tells us our best single value guess for a population proportion. A confidence interval gives us a range of values with a carefully quantified likelihood of covering the true value. Our formula is:

    \[\hat{p} \pm z^*\cdot \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}\]

    We would convey this by saying for example: We are 95% confident that the true population proportion would be found inside our interval.

    It's important to remember that the probability statement is about our sampling, where the real randomness occurs. The population value is fixed but unknown, so we talk about the likelihood of capturing it, not the likelihood of where it is.

    Related Video

    Up Next:

    We can do a similar estimation process to estimate a population mean using sample data. You'll see that in the next section.


    This page titled 7.2: Confidence Intervals for a Population Proportion is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Mathematics Department.

    • Was this article helpful?