7.4: Confidence Interval and Sample Size for the Proportion
- Page ID
- 58287
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)- Compute confidence intervals for a population proportion using z-values based on data from a random sample.
- Estimate the range in which the true population proportion is likely to fall at a specified confidence level.
- Ensure the sample size is large enough to justify the normal approximation.
- Meet minimum sample size requirements to support accurate and valid z-score calculations.
Introduction
A confidence interval for a proportion is a range of values used to estimate the true population proportion based on a sample. It provides an interval within which we expect the actual proportion to fall, with a certain level of confidence (such as 90%, 95%, or 99%). This type of interval is commonly used in surveys, polls, and studies involving categorical data, such as estimating the percentage of voters who support a candidate or the proportion of defective products in a shipment. The interval is calculated using the sample proportion, the sample size, and a critical value from the standard normal (z) distribution.
This section will discuss how to estimate the population proportion, p. For example, a researcher may be curious about what proportion of students at a local college smoke regularly. To estimate the proportion p, a sampling distribution of sample proportions must be formed. To ensure that the distribution is normal the following inequalities must be verified.
\(n\cdot \widehat{p}\ge5\) and \(n\cdot \widehat{q}\ge5\)
- \(\widehat{p}\) is the proportion of success. It is computed as \(\widehat{p} = \dfrac{X}{n}\)
- \(\widehat{q}\) is the proportion of failure. It is computed as \(\widehat{q} = 1 - \widehat{p}\)
- \(n\) is the sample size.
- \(X\) is the number of items in the sample that possess a characteristic being studied.
After normality is established a sampling distribution is formed for \(\widehat{p}\). This sampling distribution has a mean of p. Also, the standard deviation of the distribution is \(\sqrt{\dfrac{pq}{n}}\). A similar process to that described in a previous section can be used to establish a confidence interval to estimate p. The formula is provided below.
Confidence Interval for One Population Proportion (1-Prop Interval)
\(\widehat{p}-Z_ \dfrac{\alpha}{2} \cdot\sqrt{\dfrac{\widehat{p}\widehat{q}}{n}}\)< p < \(\widehat{p}+Z_ \dfrac{\alpha}{2} \cdot\sqrt{\dfrac{\widehat{p}\widehat{q}}{n}}\)
Examples of Confidence Intervals for p
Suppose an educational researcher wishes to estimate the proportion of students who smoke regularly at a local college. She randomly samples 50 students and finds that 9 of them smoke regularly. To estimate the proportion she will use a confidence interval with a confidence level of 95%. Compute the confidence level using the formula.
Solution
Step 1) Compute the sample proportion \(\widehat{p}\).
\(\widehat{p} = \dfrac{9}{50}\) = 0.18
Step 2) Compute \(\widehat{q}\).
\(\widehat{q}\) = 1 - 0.18 = 0.82
Step 3) Look up \(Z_ \dfrac{\alpha}{2} \) using Table A.1. The value is 1.96.
Step 4) Calculate the confidence interval.
\(\widehat{p}-Z_ \dfrac{\alpha}{2} \cdot\sqrt{\dfrac{\widehat{p}\widehat{q}}{n}}\)< p < \(\widehat{p}+Z_ \dfrac{\alpha}{2} \cdot\sqrt{\dfrac{\widehat{p}\widehat{q}}{n}}\)
\(0.18-1.96\cdot\sqrt{\dfrac{0.18\cdot0.82}{50}}\)< p < \(0.18+1.96\cdot\sqrt{\dfrac{0.18\cdot0.82}{50}}\)
\(0.18-0.106\)< p < \(0.18+0.106\)
0.074 < p < 0.286
A concern was raised in Australia that the percentage of deaths of Aboriginal prisoners was higher than the percentage of deaths of non-Aboriginal prisoners, which is 0.27%. A sample of six years (1990-1995) of data was collected, and it was found that out of 14,495 Aboriginal prisoners, 51 died ("Indigenous deaths in," 1996). Find a 95% confidence interval for the proportion of Aboriginal prisoners who died.
- State the random variable and the parameter in words.
- State and check the assumptions for a confidence interval.
- Find the sample statistic and the confidence interval.
- Statistical Interpretation
- Real World Interpretation
Solution
1. x = number of Aboriginal prisoners who die
p = proportion of Aboriginal prisoners who die
2.
- A simple random sample of 14,495 Aboriginal prisoners was taken. However, the sample was not random since it was data from six years. The numbers are for all prisoners in these six years, but the six years were not picked at random. Unless there was something special about the chosen six years, the sample is probably representative. This assumption is probably met.
- There are 14,495 prisoners in this case. The prisoners are all Aboriginals, so you are not mixing Aboriginal with non-Aboriginal prisoners. There are only two outcomes, either the prisoner dies or doesn’t. The chance that one prisoner dies over another may not be constant, but if you consider all prisoners the same, then it may be close to the same probability. Thus the assumptions for the binomial distribution are satisfied
- In this case, x = 51 and n - x = 14495 - 51 = 14444; both are greater than or equal to 5. The sampling distribution for \(\hat{p}\) is normal.
3. Compute Sample Proportions:
\(\widehat{p}=\dfrac{x}{n}=\dfrac{51}{14495} \approx 0.003518\)
\(\widehat{q}=1 - 0.003518 = 0.996482\)
Confidence Interval:
\(z_{ \dfrac{\alpha}{2} }=1.96\), since 95% confidence level. This value is found in Table A.1.
\(\widehat{p}-Z_ \dfrac{\alpha}{2} \cdot\sqrt{\dfrac{\widehat{p}\widehat{q}}{n}}\)< p < \(\widehat{p}+Z_ \dfrac{\alpha}{2} \cdot\sqrt{\dfrac{\widehat{p}\widehat{q}}{n}}\)
\(0.003518-1.96\cdot\sqrt{\dfrac{0.003518\cdot0.996482}{14444}}\)< p < \(0.003518+1.96\cdot\sqrt{\dfrac{0.003518\cdot0.996482}{14444}}\)
\(0.003518-0.000964<p<0.003518+0.000964\)
\(0.002554<p<0.004482\)
4. There is a 95% chance that \(0.002554<p<0.004482\) contains the proportion of Aboriginal prisoners who died.
5. The proportion of Aboriginal prisoners who died is between 0.0026 and 0.0045.
Confidence intervals for p can also be done using technology. The following example shows the process on the TI-83/84.
A researcher studying the effects of income levels on the breastfeeding of infants hypothesizes that countries where the income level is lower have a higher rate of infant breastfeeding than higher-income countries. It is known that in Germany, considered a high-income country by the World Bank, 22% of all babies are breastfed. In Tajikistan, considered a low-income country by the World Bank, researchers found that in a random sample of 500 new mothers 125 were breastfeeding their infants. Find a 90% confidence interval of the proportion of mothers in low-income countries who breastfeed their infants.
- State your random variable and the parameter in words.
- State and check the assumptions for a confidence interval.
- Find the sample statistic and the confidence interval.
- Statistical Interpretation
- Real World Interpretation
Solution
1. x = number of women who breastfeed in a low-income country
p = proportion of women who breastfeed in a low-income country
2.
- A simple random sample of 500 breastfeeding habits of women in a low-income country was taken as was stated in the problem.
- There were 500 women in the study. The women are considered identical, though they probably have some differences. There are only two outcomes, either the woman breastfeeds or she doesn’t. The probability of a woman breastfeeding is probably not the same for each woman, but it is probably not very different for each woman. The assumptions for the binomial distribution are satisfied
- x = 125 and n - x = 500 - 125 = 375 and both are greater than or equal to 5, so the sampling distribution of \(\hat{p}\) is well approximated by a normal curve.
3. On the TI-83/84: Go into the [STAT] menu. Move over to [TESTS] and choose [1-PropZInt]. Type in X, n, and the confidence level, then select [CALCULATE] and press enter.
Once you press Calculate, you will see the results in the figure below.
The answer is computed using the TI-83/84 calculator and is written below after rounding each endpoint to three decimal places.
0.219 < p < 0.284
4. There is a 90% chance that 0.219 < p < 0.284 contains the proportion of women in low-income countries who breastfeed their infants.
5. The proportion of women in low-income countries who breastfeed their infants is between 0.219 and 0.284.
Minimum Sample Size
The minimum sample needed to construct a confidence interval for p can also be computed. The formula is found by using the margin of error. The formula is \(E=Z_ \dfrac{\alpha}{2} \sqrt{\dfrac{\widehat{p}\widehat{q}}{n}}\). Using algebra, the equation is used to isolate n. The formula is provided below.
\(n=\widehat{p}\cdot\widehat{q}\left(\dfrac{Z_ \dfrac{\alpha}{2} }{E}\right)^2\)
- E is the margin of error.
- Round n up to the next whole number.
A pollster wishes to construct a new confidence interval for the proportion of voters who are in favor of voting for a new proposition that reduces taxes on energy. Using previous data, she estimates the sample proportion to be 61.5%. What is the minimum sample size needed to construct an updated confidence interval for p with a confidence level of 90% and a margin of error of 10%?
Solution
Step 1) Write down the given information and look up \(Z_ \dfrac{\alpha}{2} \) in Table A.1.
- \(\widehat{p} = 0.615\)
- \(\widehat{q} = 1 - 0.615 = 0.385\)
- E = 0.10
- \(Z_ \dfrac{\alpha}{2} = 1.645\)
Step 2) Compute n.
\(n=\widehat{p}\cdot\widehat{q}\left(\dfrac{Z_ \dfrac{\alpha}{2} }{E}\right)^2\)
\(n=0.615\cdot0.385\left(\dfrac{1.645}{0.10}\right)^2\)
\(n=64.07190694\)
Step 3) Round n upwards to the next whole number.
\(n\approx 65\)
Step 4) State the result.
The minimum sample size needed is 65 people.
Authors
"7.4: Confidence Interval and Sample Size for the Proportion" by Toros Berberyan, Tracy Nguyen, and Alfie Swan is licensed under CC BY-SA 4.0
Attributions
"8.2: One-Sample Interval for the Proportion - Statistics" by Kathryn Kozak is licensed under CC BY-SA 4.0
Exercises
- A university wants to estimate the proportion of its students who participate in organized sports. To do this, a random sample of 200 students is surveyed, and 62 of them report that they play at least one sport at the college level. Use this information to construct a 95% confidence interval to estimate the true proportion of students at the university who play sports and round to three decimal places.
Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.
- A researcher wants to estimate the proportion of community college students in California who work full-time. In a random sample of 350 students, 147 report that they are employed full-time. Construct a 99% confidence interval to estimate the true proportion of community college students in California who work full-time and round to three decimal places.
Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.
- A marketing company wants to estimate the proportion of people who purchase holiday gifts during the winter season. They surveyed a random sample of 500 adults, and 410 of them said they typically buy gifts for the holidays. Construct a 90% confidence interval to estimate the true proportion of people who purchase holiday gifts.
Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.
- A nutrition researcher wants to estimate the proportion of college students who are vegetarians with a 95% confidence level and a margin of error of 5%. What is the minimum sample size needed to ensure the estimate is sufficiently precise? It was determined that the prior estimate for the population proportion is 24%.
Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.
- A fitness researcher wants to estimate the proportion of adults who work out regularly (at least three times a week) with a 99% confidence level and a margin of error of 4%. Based on a previous study, it is estimated that 68% of adults work out regularly. What is the minimum sample size needed to ensure the estimate is sufficiently precise?
Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.
- Answers
-
If you are an instructor and want the solutions to all the exercise questions for each section, please email Toros Berberyan.







