8.6: Testing For Two Proportions
- Page ID
- 66477
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)In the last section, we tested whether a single proportion matched a claimed value — for example, whether a coin was fair or whether a passing rate matched a historical benchmark. But many of the most interesting real-world questions aren't about one group in isolation. They're about comparing two groups:
- Is the graduation rate higher at one school than another?
- Do more women than men report voting in local elections?
- Does a new drug have a higher success rate than the current standard treatment?
- Are customers more likely to click an ad with an image than one without?
These questions all involve comparing two proportions from two independent groups. The logic is the same as everything we've done so far — we just need a new test statistic.
Setting Up the Hypotheses
We label the two population proportions \( p_1 \) and \( p_2 \). As always, the null hypothesis represents "no difference":
- \( H_0: p_1 = p_2 \) (the two proportions are equal)
- \( H_A: p_1 \neq p_2 \) (two-tailed — the proportions are different)
Or directionally:
- \( H_A: p_1 > p_2 \) (right-tailed)
- \( H_A: p_1 < p_2 \) (left-tailed)
The null hypothesis can also be written as \( H_0: p_1 - p_2 = 0 \), which makes it clear that we are testing whether the difference between the two proportions is zero.
The Test Statistic — and Why We Pool
Recall from Section 8.5 that the one-proportion z-test used the standard error \( \sqrt{\frac{p_0(1-p_0)}{n}} \), where \( p_0 \) was the claimed null value.
When comparing two proportions, we need the standard error of the difference \( \hat{p}_1 - \hat{p}_2 \). Under \( H_0 \), the two groups have the same true proportion — so our best estimate of that common proportion is to combine both samples together. This combined estimate is called the pooled proportion.
Pooled proportion:
\[ \hat{p} = \frac{x_1 + x_2}{n_1 + n_2} \]
where \( x_1 \) and \( x_2 \) are the number of successes in each group and \( n_1 \), \( n_2 \) are the sample sizes.
Two-proportion z-test statistic:
\[ Z = \frac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1} + \frac{1}{n_2}\right)}} \]
This test statistic follows the standard normal distribution (\( Z \)), so we use the normal distribution — not the t-distribution — to find p-values and critical values.
Why pooled? Under \( H_0 \), we are assuming \( p_1 = p_2 \). If that's true, both samples are drawn from populations with the same proportion, so it makes sense to combine all successes and all observations to get the single best estimate of that shared proportion. Using separate \( \hat{p}_1 \) and \( \hat{p}_2 \) in the standard error would contradict the assumption we're making under \( H_0 \).
Conditions for Using This Test
Before running the test, check that these conditions are met:
- Independence: The two samples are independent of each other, and observations within each sample are independent.
- Randomness: Both samples were collected using a random process.
- Success-failure condition: Each group must have at least 10 successes and 10 failures: \( n_1\hat{p} \geq 10 \), \( n_1(1-\hat{p}) \geq 10 \), \( n_2\hat{p} \geq 10 \), \( n_2(1-\hat{p}) \geq 10 \).
Example 1: "Does the new ad format get more clicks?"
A marketing team runs an A/B test on two versions of a digital advertisement. Version A (the current design, with text only) is shown to 200 users. Version B (a new design, with an image) is shown to 200 different users. The team records how many users click the ad.
- Version A (text only): 34 out of 200 users clicked. \( \hat{p}_1 = 34/200 = 0.17 \)
- Version B (image): 52 out of 200 users clicked. \( \hat{p}_2 = 52/200 = 0.26 \)
Is there statistically significant evidence that the image version gets a higher click rate?
Step 1: State the Hypotheses
- \( H_0: p_1 = p_2 \) (click rates are the same for both versions)
- \( H_A: p_1 < p_2 \) (the image version has a higher click rate)
This is a left-tailed test since we labeled the text-only version as group 1 and expect it to have the lower rate.
Step 2: Calculate the Test Statistic
First, calculate the pooled proportion:
\[ \hat{p} = \frac{x_1 + x_2}{n_1 + n_2} = \frac{34 + 52}{200 + 200} = \frac{86}{400} = 0.215 \]
Now calculate the standard error using the pooled proportion:
\[ SE = \sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1}+\frac{1}{n_2}\right)} = \sqrt{0.215(0.785)\left(\frac{1}{200}+\frac{1}{200}\right)} \]
\[ = \sqrt{0.1688 \times 0.01} = \sqrt{0.001688} \approx 0.0411 \]
Now calculate the test statistic:
\[ Z = \frac{\hat{p}_1 - \hat{p}_2}{SE} = \frac{0.17 - 0.26}{0.0411} = \frac{-0.09}{0.0411} \approx -2.19 \]
Step 3: Choose a Significance Level
Use \( \alpha = 0.05 \).
Step 4: Find the p-value
This is a left-tailed test, so we find the area to the left of \( z = -2.19 \) in the standard normal distribution:
\[ p = P(Z < -2.19) \approx 0.014 \]
Step 5: Make a Decision and State a Conclusion
Since \( p = 0.014 < 0.05 \), we reject \( H_0 \).
Conclusion: There is statistically significant evidence that the image-based ad version generates a higher click rate than the text-only version.
Reflect: The difference in click rates was 9 percentage points (17% vs. 26%). Statistically significant — but is it practically significant? For a large advertising campaign, even a few percentage points can represent thousands of additional clicks and substantial revenue. Context always matters when interpreting results.
Example 2: "Do graduation rates differ between two programs?" (Try It)
A community college is evaluating two different academic support programs. In Program A, 78 out of 120 students graduated within two years. In Program B, 91 out of 130 students graduated within two years.
Is there evidence at the \( \alpha = 0.05 \) level that the graduation rates differ between the two programs?
- State \( H_0 \) and \( H_A \). What kind of test is this — left, right, or two-tailed?
- Calculate \( \hat{p}_1 \), \( \hat{p}_2 \), and the pooled proportion \( \hat{p} \).
- Calculate the standard error and the z test statistic.
- Find the p-value and compare it to \( \alpha = 0.05 \).
- Write a conclusion in the context of the problem.
Hint: Since the question asks whether rates differ (not which is higher), this is a two-tailed test. Remember to double the tail probability for the p-value.
Tip: Check the success-failure condition before you begin. With \( \hat{p} \approx 0.672 \), verify that each group has at least 10 expected successes and 10 expected failures.
Comparing the One- and Two-Proportion Tests
It helps to see these two tests side by side so you can recognize which to use:
| One-proportion z-test (Section 8.5) | Two-proportion z-test (this section) | |
|---|---|---|
| Question type | Does one group's proportion equal a claimed value? | Do two groups have the same proportion? |
| Null hypothesis | \( H_0: p = p_0 \) | \( H_0: p_1 = p_2 \) |
| Test statistic | \( Z = \dfrac{\hat{p} - p_0}{\sqrt{\dfrac{p_0(1-p_0)}{n}}} \) | \( Z = \dfrac{\hat{p}_1 - \hat{p}_2}{\sqrt{\hat{p}(1-\hat{p})\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}} \) |
| Standard error uses | The claimed null value \( p_0 \) | The pooled sample proportion \( \hat{p} \) |
| Distribution | Standard normal (\( Z \)) | Standard normal (\( Z \)) |
- In Example 1, we used a left-tailed test because we labeled the text-only version as group 1. What would happen to the test statistic and the conclusion if we had switched the labels — making the image version group 1 and the text version group 2?
- Why do we use the pooled proportion in the standard error rather than using \( \hat{p}_1 \) and \( \hat{p}_2 \) separately?
- Suppose two hospitals report post-surgery infection rates of 4% and 6%, based on samples of 50 patients each. A researcher runs a two-proportion z-test and gets \( p = 0.43 \). What does this suggest? What would you recommend the researcher do next?
Looking Ahead
We have now worked through hypothesis tests for means and proportions — both single-sample and two-sample cases. Before moving on to new test types, the next section takes a step back to ask a deeper question: once we have a p-value, what does it actually mean — and what are the ways it can be misread or misused? Understanding how to interpret p-values carefully is just as important as knowing how to calculate them.

