8.4: A Population Proportion

Last updated
Save as PDF

Page ID: 20061

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

During an election year, we see articles in the newspaper that state confidence intervals in terms of proportions or percentages. For example, a poll for a particular candidate running for president might show that the candidate has 40% of the vote within three percentage points (if the sample is large enough). Often, election polls are calculated with 95% confidence, so, the pollsters would be 95% confident that the true proportion of voters who favored the candidate would be between 0.37 and 0.43: (0.40 – 0.03,0.40 + 0.03).

Investors in the stock market are interested in the true proportion of stocks that go up and down each week. Businesses that sell personal computers are interested in the proportion of households in the United States that own personal computers. Confidence intervals can be calculated for the true proportion of stocks that go up or down each week and for the true proportion of households in the United States that own personal computers.

The procedure to find the confidence interval, the sample size, the error bound, and the confidence level for a proportion is similar to that for the population mean, but the formulas are different. How do you know you are dealing with a proportion problem? First, the underlying distribution is a binomial distribution. (There is no mention of a mean or average.) If \(X\) is a binomial random variable, then

\[X \sim B(n, p)\nonumber \]

where \(n\) is the number of trials and \(p\) is the probability of a success.

To form a proportion, take \(X\), the random variable for the number of successes and divide it by \(n\), the number of trials (or the sample size). The random variable \(\hat{P} \) (read "P hat") is that proportion,

\[\hat{P} = \dfrac{X}{n}\nonumber \]

When \(n\) is large and \(p\) is not close to zero or one, we can use the normal distribution to approximate the binomial.

\[X \sim N(np, \sqrt{npq})\nonumber \]

If we divide the random variable, the mean, and the standard deviation by \(n\), we get a normal distribution of proportions with \(\hat{P} \), called the estimated proportion, as the random variable. (Recall that a proportion as the number of successes divided by \(n\).)

Using algebra to simplify:

\[\dfrac{\sqrt{npq}}{n} = \sqrt{\dfrac{pq}{n}}\nonumber \]

\(\hat{P}\) follows a normal distribution for proportions:

The confidence interval has the form

\[(\hat{p} – EBP,\hat{p} + EBP).\nonumber \]

where

\(EBP\) is error bound for the proportion.
\(\hat{p} = \dfrac{x}{n}\)
\(\hat{p} =\) the estimated proportion of successes ( \(\hat{p}\) is a point estimate for p, the true proportion.)
\(x =\) the number of successes
\(n =\) the size of the sample

The error bound (EBP) for a proportion is

\[EBP = \left(z_{\frac{\alpha}{2}}\right)\left(\sqrt{\dfrac{\hat{p}\hat{q}}{n}}\right)\nonumber \]

where \(\hat{q}\ = 1 - \hat{p}\).

This formula is similar to the error bound formula for a mean, except that the "appropriate standard deviation" is different. For a mean, when the population standard deviation is known, the appropriate standard deviation that we use is \(\dfrac{\sigma}{\sqrt{n}}\). For a proportion, the appropriate standard deviation is

\[\sqrt{\dfrac{pq}{n}}.\nonumber \]

However, in the error bound formula, we use

\[\sqrt{\dfrac{\hat{p}\hat{q}}{n}}\nonumber \]

as the standard deviation, instead of

\[\sqrt{\dfrac{pq}{n}}.\nonumber \]

In the error bound formula, the sample proportions \(\hat{p}\) and \(\hat{q}\) are estimates of the unknown population proportions p and q. The estimated proportions \(\hat{p}\) and \(\hat{q}\) are used because \(p\) and \(q\) are not known. The sample proportions \(\hat{p}\) and \(\hat{q}\) are calculated from the data: \(\hat{p}\) is the estimated proportion of successes, and \(\hat{q}\) is the estimated proportion of failures.

The confidence interval can be used only if the number of successes \(n\hat{p}\) and the number of failures \(n\hat{q}\) are both greater than five.

Normal Distribution of Proportions

For the normal distribution of proportions, the \(z\)-score formula is as follows.

\[\hat{P} \sim N\left(p, \sqrt{\frac{pq}{n}}\right)\nonumber \]

then the \(z\)-score formula is

\[z = \dfrac{\hat{p}-p}{\sqrt{\dfrac{pq}{n}}} \]

Example \(\PageIndex{1}\)

Suppose that a market research firm is hired to estimate the percent of adults living in a large city who have cell phones. Five hundred randomly selected adult residents in this city are surveyed to determine whether they have cell phones. Of the 500 people surveyed, 421 responded yes - they own cell phones. Using a 95% confidence level, compute a confidence interval estimate for the true proportion of adult residents of this city who have cell phones.

Solution

Let \(X =\) the number of people in the sample who have cell phones. \(X\) is binomial.

\[X \sim B(500,\dfrac{421}{500}).\nonumber \]

To calculate the confidence interval, you must find \(\hat{p}\), \(\hat{q}\), and \(EBP\).

\(n = 500\)
\(x =\) the number of successes \(= 421\)

\[\hat{p} = \dfrac{x}{n} = \dfrac{421}{500} = 0.842\nonumber \]

\(\hat{p} = 0.842\) is the sample proportion; this is the point estimate of the population proportion.

\[\hat{q} = 1 – \hat{p} = 1 – 0.842 = 0.158\nonumber \]

Since \(CL = 0.95\), then

\[\alpha = 1 – CL = 1 – 0.95 = 0.05\nonumber \]

Use the Excel formula: \(=\text{NORM.S.INV}(1-0.05/2)=1.96\)

Then

\[z_{\dfrac{\alpha}{2}} = z_{0.025} = 1.96\nonumber \]

\[EBP = \left(z_{\dfrac{\alpha}{2}}\right)\sqrt{\dfrac{\hat{p}\hat{q}}{n}} = (1.96)\sqrt{\dfrac{(0.842)(0.158)}{500}} = 0.032\nonumber \]

\[\hat{p} – EBP = 0.842 – 0.032 = 0.81\nonumber \]

\[\hat{p} + EBP = 0.842 + 0.032 = 0.874\nonumber \]

The confidence interval for the true binomial population proportion is \((\hat{p} – EBP, \hat{p} +EBP) = (0.810, 0.874)\).

Interpretation

We estimate with 95% confidence that between 81% and 87.4% of all adult residents of this city have cell phones.

Explanation of 95% Confidence Level

Ninety-five percent of the confidence intervals constructed in this way would contain the true value for the population proportion of all adult residents of this city who have cell phones.

Exercise \(\PageIndex{1}\)

Suppose 250 randomly selected people are surveyed to determine if they own a tablet. Of the 250 surveyed, 98 reported owning a tablet. Using a 95% confidence level, compute a confidence interval estimate for the true proportion of people who own tablets.

Answer: (0.3315, 0.4525)

Example \(\PageIndex{2}\)

For a class project, a political science student at a large university wants to estimate the percent of students who are registered voters. He surveys 500 students and finds that 300 are registered voters. Compute a 90% confidence interval for the true percent of students who are registered voters, and interpret the confidence interval.

Solution

\(x = 300\) and
\(n = 500\)

\[\hat{p} = \dfrac{x}{n} = \dfrac{300}{500} = 0.600\nonumber \]

\[\hat{q} = 1 − \hat{p} = 1 − 0.600 = 0.400\nonumber \]

Since \(CL = 0.90\), then

\[\alpha = 1 – CL = 1 – 0.90 = 0.10\]

Use the Excel equation: \(=\text{NORM.S.INV}(1-0.10/2)=1.645\)

\[z_{\dfrac{\alpha}{2}} = z_{0.05} = 1.645\nonumber \]

\[EBP = \left(z_{\dfrac{\alpha}{2}}\right)\sqrt{\dfrac{\hat{p} \hat{q}}{n}} = (1.645)\sqrt{\dfrac{(0.60)(0.40)}{500}} = 0.036\nonumber \]

\[\hat{p} – EBP = 0.60 − 0.036 = 0.564\nonumber \]

\[\hat{p} + EBP = 0.60 + 0.036 = 0.636\nonumber \]

The confidence interval for the true binomial population proportion is \((\hat{p} – EBP, \hat{p} +EBP) = (0.564,0.636)\).

Interpretation

We estimate with 90% confidence that the true percent of all students that are registered voters is between 56.4% and 63.6%.
Alternate Wording: We estimate with 90% confidence that between 56.4% and 63.6% of ALL students are registered voters.

Explanation of 90% Confidence Level

Ninety percent of all confidence intervals constructed in this way contain the true value for the population percent of students that are registered voters.

Exercise \(\PageIndex{2}\)

A student polls his school to see if students in the school district are for or against the new legislation regarding school uniforms. She surveys 600 students and finds that 480 are against the new legislation.

Compute a 90% confidence interval for the true percent of students who are against the new legislation, and interpret the confidence interval.
In a sample of 300 students, 68% said they own an iPod and a smart phone. Compute a 97% confidence interval for the true percent of students who own an iPod and a smartphone.

Answer a: (0.7731, 0.8269); We estimate with 90% confidence that the true percent of all students in the district who are against the new legislation is between 77.31% and 82.69%.

Answer b

Sixty-eight percent (68%) of students own an iPod and a smart phone.

\[\hat{p} = 0.68\nonumber \]

\[\hat{q} = 1–\hat{p} = 1 – 0.68 = 0.32\nonumber \]

Since \(CL = 0.97\), we know

\[\alpha = 1 – 0.97 = 0.03\nonumber \]

Use the Excel equation: \(=\text{NORM.S.INV}(1-0.03/2)=2.17\)

\[z_{0.015} = 2.17\nonumber \]

\[EPB = \left(z_{\dfrac{\alpha}{2}}\right)\sqrt{\dfrac{\hat{p} \hat{q}}{n}} = 2.17\sqrt{\dfrac{0.68(0.32)}{300}} \approx 0.0269\nonumber \]

\[\hat{p} – EPB = 0.68 – 0.0269 = 0.6531\nonumber \]

\[\hat{p} + EPB = 0.68 + 0.0269 = 0.7069\nonumber \]

We are 97% confident that the true proportion of all students who own an iPod and a smart phone is between 0.6531 and 0.7069.

Calculating the Sample Size \(n\)

If researchers desire a specific margin of error, then they can use the error bound formula to calculate the required sample size. The error bound formula for a population proportion is

\[EBP = \left(z_{\frac{\alpha}{2}}\right)\left(\sqrt{\dfrac{\hat{p}\hat{q}}{n}}\right)\nonumber \]

Solving for \(n\) gives you an equation for the sample size.

\[n = \dfrac{\left(z_{\frac{\alpha}{2}}\right)^{2}(\hat{p}\hat{q})}{EBP^{2}}\nonumber \]

Example \(\PageIndex{5}\)

Suppose a mobile phone company wants to determine the current percentage of customers aged 50+ who use text messaging on their cell phones. How many customers aged 50+ should the company survey in order to be 90% confident that the estimated (sample) proportion is within three percentage points of the true population proportion of customers aged 50+ who use text messaging on their cell phones.

Answer

From the problem, we know that \(\bf{EBP = 0.03}\) (3%=0.03) and \(z_{\dfrac{\alpha}{2}} z_{0.05} = 1.645\) because the confidence level is 90%.

However, in order to find \(n\), we need to know the estimated (sample) proportion \( \hat{p} \). Remember that \(\hat{q} = 1 – \hat{p}\). But, we do not know \(\hat{p}\) yet. Since we multiply \(\hat{p}\) and \(\hat{q}\) together, we make them both equal to 0.5 because \(\hat{p}\hat{q} = (0.5)(0.5) = 0.25\) results in the largest possible product. (Try other products: \((0.6)(0.4) = 0.24\); \((0.3)(0.7) = 0.21\); \((0.2)(0.8) = 0.16\) and so on). The largest possible product gives us the largest \(n\). This gives us a large enough sample so that we can be 90% confident that we are within three percentage points of the true population proportion. To calculate the sample size \(n\), use the formula and make the substitutions.

\[n = \dfrac{z^{2}\hat{p}\hat{q}}{EBP^{2}}\nonumber \]

gives

\[n = \dfrac{1.645^{2}(0.5)(0.5)}{0.03^{2}} = 751.7\nonumber \]

Round the answer to the next higher value. The sample size should be 752 cell phone customers aged 50+ in order to be 90% confident that the estimated (sample) proportion is within three percentage points of the true population proportion of all customers aged 50+ who use text messaging on their cell phones.

Exercise \(\PageIndex{5}\)

Suppose an internet marketing company wants to determine the current percentage of customers who click on ads on their smartphones. How many customers should the company survey in order to be 90% confident that the estimated proportion is within five percentage points of the true population proportion of customers who click on ads on their smartphones?

Answer: 271 customers should be surveyed. Check the Real Estate section in your local

Glossary

Binomial Distribution: a discrete random variable (RV) which arises from Bernoulli trials; there are a fixed number, \(n\), of independent trials. “Independent” means that the result of any trial (for example, trial 1) does not affect the results of the following trials, and all trials are conducted under the same conditions. Under these circumstances the binomial RV \(X\) is defined as the number of successes in \(n\) trials. The notation is: \(X \sim B(\mathbf{n},\mathbf{p})\). The mean is \(\mu = np\) and the standard deviation is \(\sigma = \sqrt{npq}\). The probability of exactly \(x\) successes in \(n\) trials is \(P(X = x = \left(\binom{n}{x}\right))p^{x}q^{n-x}\).

Error Bound for a Population Proportion (\(EBP\)): the margin of error; depends on the confidence level, the sample size, and the estimated (from the sample) proportion of successes.

Contributors and Attributions

Barbara Illowsky and Susan Dean (De Anza College) with many other contributing authors. Content produced by OpenStax College is licensed under a Creative Commons Attribution License 4.0 license. Download for free at http://cnx.org/contents/30189442-699...b91b9de@18.114.

Search

Text Color

Text Size

Margin Size

Font Type

Example \(\PageIndex{5}\)

Exercise \(\PageIndex{5}\)