Skip to main content
Statistics LibreTexts

7.5: Confidence Interval for a Proportion

  • Page ID
    56141
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Suppose you want to estimate the population proportion, p. As an example, an administrator may want to know what proportion of students at your school smoke. An insurance company may want to know what proportion of accidents are caused by teenage drivers who do not have a drivers’ education class. Every time we collect data from a new sample, we would expect the estimate of the proportion to change slightly. If you were to find a range of values over an interval this would give a better estimate of where the population proportion falls. This range of values that would better predict the true population parameter is called an interval estimate or confidence interval.

    The sample proportion \(\hat{p}\) is the point estimate for p, the standard error (the standard deviation of the sampling distribution) of \(\hat{p}\) is \(\sqrt{\left(\frac{\hat{p} \cdot \hat{q}}{n}\right)}\), the zα/2 is the critical value using the standard normal distribution, and the margin of error \(\mathrm{E}=Z_{\alpha / 2} \sqrt{\left(\frac{\hat{p} \cdot \hat{q}}{n}\right)}\). Some textbooks use \(\pi\) instead of p for the population proportion, and \(\bar{p}\) (pronounced “p-bar”) instead of \(\hat{p}\) for sample proportion.

    Choose a simple random sample of size n from a population having unknown population proportion p. The 100(1 – \(\alpha\))% confidence interval estimate for p is given by \(\hat{p} \pm Z_{\alpha / 2} \sqrt{\left(\frac{\hat{p} \hat{q}}{n}\right)}\).

    Where \(\hat{p}=\frac{x}{n}=\frac{\# \text { of successes }}{\# \text { of trials }}\) (read as “p hat”) is the sample proportion, and \(\hat{q}=1-\hat{p}\) is the complement.

    The above confidence interval can be expressed as an inequality or an interval of values.

    \(\hat{p}-z_{\alpha / 2} \sqrt{\left(\frac{\hat{p} \hat{q}}{n}\right)}<p<\hat{p}+z_{\alpha / 2} \sqrt{\left(\frac{\hat{p} \hat{q}}{n}\right)} \quad \text { or } \quad\left(\hat{p}-z_{\frac{\alpha}{2}} \sqrt{\left(\frac{\hat{p} \hat{q}}{n}\right)}, \hat{p}+z_{\alpha / 2} \sqrt{\left(\frac{\hat{p} \hat{q}}{n}\right)}\right)\)

    Assumption: \(n \cdot \hat{p} \geq 10 \text { and } n \cdot \hat{q} \geq 10\)

    *This assumption must be addressed before using these statistical inferences.

    This formula is derived from the normal approximation of the binomial distribution, therefore the same conditions for a binomial need to be met, namely a set sample size of independent trials, two outcomes that have the same probability for each trial.

    Steps for Calculating a Confidence Interval

    1. State the random variable and the parameter in words.

    x = number of successes

    p = proportion of successes

    2. State and check the assumptions for confidence interval.

    a. A simple random sample of size n is taken.

    b. The conditions for the binomial distribution are satisfied.

    c. To determine the sampling distribution of \(\hat{p}\), you need to show that \(n \cdot \hat{p} \geq 10 \text { and } n \cdot \hat{q} \geq 10\), where \(\hat{q}\) = 1 − \(\hat{p}\). If this requirement is true, then the sampling distribution of \(\hat{p}\) is well approximated by a normal curve. (In reality, this is not really true, since the correct assumption deals with p. However, in a confidence interval you do not know p, so you must use \(\hat{p}\). This means you just need to show that x ≥ 10 and n – x ≥ 10.)

    3. Compute the sample statistic \(\hat{p}=\frac{x}{n}\) and the confidence interval \(\hat{p} \pm z_\frac{\alpha}{2} \sqrt{\left(\frac{\hat{p} \hat{q}}{n}\right)}\).

    4. Statistical Interpretation: In general, this looks like:

    “We can be (1 – α)*100% confident that the interval \[\hat{p}-z_\frac{\alpha}{2} \sqrt{\left(\frac{\hat{p} \hat{q}}{n}\right)}<p<\hat{p}+z_\frac{\alpha}{2} \sqrt{\left(\frac{\hat{p} \hat{q}}{n}\right)}\]

    Real World Interpretation: This is where you state what interval contains the true proportion.

    A concern was raised in Australia that the percentage of deaths of indigenous Australian prisoners was higher than the percent of deaths of nonindigenous Australian prisoners, which is 0.27%. A sample of six years (1990- 1995) of data was collected, and it was found that out of 14,495 indigenous Australian prisoners, 51 died (“Indigenous deaths in,” 1996). Find a 95% confidence interval for the proportion of indigenous Australian prisoners who died.

    Solution

    1. State the random variable and the parameter in words.

    x = number of indigenous Australian prisoners who die

    p = proportion of indigenous Australian prisoners who die

    2. State and check the assumptions for a confidence interval.

    a. A simple random sample of 14,495 indigenous Australian prisoners was taken. However, the sample was not a random sample, since it was data from six years. It is the numbers for all prisoners in these six years, but the six years were not picked at random. Unless there was something special about the six years that were chosen, the sample is probably a representative sample. This assumption is probably met.

    b. There are 14,495 prisoners in this case. The prisoners are all indigenous Australians, so you are not mixing indigenous Australian with nonindigenous Australian prisoners. There are only two outcomes, the prisoner either dies or does not. The chance that one prisoner dies over another may not be constant, but if you consider all prisoners the same, then it may be close to the same probability. Thus, the assumptions for the binomial distribution are satisfied.

    c. In this case, x = 51 and n – x = 14,495 – 51 = 14,444. Both are greater than or equal to 10. The sampling distribution for \(\hat{p}\) is a normal distribution.

    3. Compute the sample statistic and the confidence interval.

    Sample Proportion: \(\hat{p}=\frac{x}{n}=\frac{51}{14495}=.003518\),

    Critical Value: \(z_{\alpha / 2}=1.96\), since 95% confidence level

    Margin of Error \(\mathrm{E}=z_{\alpha / 2} \sqrt{\left(\frac{\hat{p} \cdot \hat{q}}{n}\right)}=1.96 \sqrt{\left(\frac{0.003518(1-0.003518)}{14495}\right)}=0.000964\)

    Confidence Interval: \(\hat{p}-\mathrm{E}<p<\hat{p}+\mathrm{E}\)

    0.003518 – 0.000964 < p < 0.003518 + 0.000964

    0.002554 < p < 0.004482 or (0.002554, 0.004482)

    4. Statistical Interpretation: We can be 95% confident that 0.002554 < p < 0.004482 contains the proportion of all indigenous Australian prisoners who died.

    5. Real World Interpretation: We can be 95% confident that the percentage of all indigenous Australian prisoners who died is between 0.26% and 0.45%.


    This page titled 7.5: Confidence Interval for a Proportion is shared under a CC BY-SA 4.0 license and was authored, remixed, and/or curated by Rachel Webb via source content that was edited to the style and standards of the LibreTexts platform.