# Difference of two proportions

We would like to make conclusions about the difference in two population proportions: \(p_1 - p_2\). We consider three examples. In the first, we compare the approval of the 2010 healthcare law under two different question phrasings. In the second application, a company weighs whether they should switch to a higher quality parts manufacturer. In the last example, we examine the cancer risk to dogs from the use of yard herbicides.

In our investigations, we first identify a reasonable point estimate of \(p_1 - p_2\) based on the sample. You may have already guessed its form: \(\hat {p}_1 - \hat {p}_2\). Next, in each example we verify that the point estimate follows the normal model by checking certain conditions. Finally, we compute the estimate's standard error and apply our inferential framework.

### Sample distribution of the difference of two proportions

We must check two conditions before applying the normal model to \(\hat {p}_1 - \hat {p}_2\). First, the sampling distribution for each sample proportion must be nearly normal, and secondly, the samples must be independent. Under these two conditions, the sampling distribution of \(\hat {p}_1 - \hat {p}_2\) may be well approximated using the normal model.

The difference \(\hat {p}_1 - \hat {p}_2\) tends to follow a normal model when • each proportion separately follows a normal model, and • the samples are independent. The standard error of the difference in sample proportions is \[SE_{\hat {p}_1 - \hat {p}_2} = \sqrt {SE^2_{\hat {p}_1} + SE^2_{\hat {p}_2}} = \sqrt {\frac {p_1(1 - p_1)}{n_1} + \frac {p_2(1 - p_2)}{n_2}} \tag {6.9}\] where \(p_1\) and \(p_2\) represent the population proportions, and n1 and n2 represent the sample sizes. |

For the difference in two means, the standard error formula took the following form:

\[SE_{\hat {x}_1 - \hat {x}_2} = \sqrt {SE^2_{\hat {x}_1} + SE^2_{\hat {x}_2}}\]

The standard error for the difference in two proportions takes a similar form. The reasons behind this similarity are rooted in the probability theory of Section 2.4, which is described for this context in Exercise 5.14 on page 221.

^{5}www.gallup.com/poll/155144/Congress-Approval-June.aspx

^{6}We complete the same computations as before, except now we use 0.17 instead of 0.5 for p:

*\[1.96 \times \sqrt {\frac {p(1 - p)}{n}} \approx 1.96 \times \sqrt {\frac {0.17(1 - 0.17)}{n}} \le 0.04 \rightarrow n \ge 338.8\]*

*A sample size of 339 or more would be reasonable.*

Sample size (ni) | Approve law (%) | Disapprove law (%) | Other | |

"people who cannot afford it will receive financial help from the government" is given second "people who do not buy it will pay a penalty" is given second | 771
732 | 47
34 | 49
63 | 3
3 |

Table 6.2: Results for a Pew Research Center poll where the ordering of two statements in a question regarding healthcare were randomized.

### Intervals and tests for \(p_1 - p_2\)

In the setting of confidence intervals, the sample proportions are used to verify the successfailure condition and also compute standard error, just as was the case with a single proportion.

**Example 6.10** The way a question is phrased can inuence a person's response. For example, Pew Research Center conducted a survey with the following question:^{7}

As you may know, by 2014 nearly all Americans will be required to have health insurance. [People who do not buy insurance will pay a penalty] while [People who cannot afford it will receive nancial help from the government]. Do you approve or disapprove of this policy?

For each randomly sampled respondent, the statements in brackets were randomized: either they were kept in the order given above, or the two statements were reversed. Table 6.2 shows the results of this experiment. Create and interpret a 90% confidence interval of the difference in approval.

First the conditions must be verified. Because each group is a simple random sample from less than 10% of the population, the observations are independent, both within the samples and between the samples. The success-failure condition also holds for each sample. Because all conditions are met, the normal model can be used for the point estimate of the difference in support, where \(p_1\) corresponds to the original ordering and \(p_2\) to the reversed ordering:

\[\hat {p}_1 - \hat {p}_2 = 0.47 - 0.34 = 0.13\]

The standard error may be computed from Equation (6.9) using the sample proportions:

\[SE \approx \sqrt {\frac {0.47(1 - 0.47)}{771} + \frac {0.34(1 - 0.34)}{732}} = 0.025\]

For a 90% con dence interval, we use z* = 1.65:

\[ \text {point estimate} \pm z^*SE \approx 0.13 \pm 1.65 \times 0.025 \rightarrow (0.09, 0.17)\]

We are 90% confident that the approval rating for the 2010 healthcare law changes between 9% and 17% due to the ordering of the two statements in the survey question. The Pew Research Center reported that this modestly large difference suggests that the opinions of much of the public are still uid on the health insurance mandate.

^{7}www.people-press.org/2012/03/26/public-remains-split-on-health-care-bill-opposed-to-mandate/.

*Sample sizes for each polling group are approximate.*

**Exercise 6.11 **A remote control car company is considering a new manufacturer for wheel gears. The new manufacturer would be more expensive but their higher quality gears are more reliable, resulting in happier customers and fewer warranty claims. However, management must be convinced that the more expensive gears are worth the conversion before they approve the switch. If there is strong evidence of a more than 3% improvement in the percent of gears that pass inspection, management says they will switch suppliers, otherwise they will maintain the current supplier. Set up appropriate hypotheses for the test.^{8}

**Example 6.12 ** The quality control engineer from Exercise 6.11 collects a sample of gears, examining 1000 gears from each company and nds that 899 gears pass inspection from the current supplier and 958 pass inspection from the prospective supplier. Using these data, evaluate the hypothesis setup of Exercise 6.11 using a signi cance level of 5%.

First, we check the conditions. The sample is not necessarily random, so to proceed we must assume the gears are all independent; for this sample we will suppose this assumption is reasonable, but the engineer would be more knowledgeable as to whether this assumption is appropriate. The success-failure condition also holds for each sample. Thus, the difference in sample proportions, 0.958 - 0.899 = 0.059, can be said to come from a nearly normal distribution.

The standard error can be found using Equation (6.9):

\[SE = \sqrt { \frac {0.958(1 - 0.958)}{1000} + \frac {0.899(1 - 0.899)}{1000}} = 0.0114\]

In this hypothesis test, the sample proportions were used. We will discuss this choice more in Section 6.2.3.

Next, we compute the test statistic and use it to nd the p-value, which is depicted in Figure 6.3.

\[Z = \frac {\text {point estimate - null value}{SE}} = \frac {0.059 - 0.03}{0.0114} = 2.54\]

Using the normal model for this test statistic, we identify the right tail area as 0.006. Since this is a one-sided test, this single tail area is also the p-value, and we reject the null hypothesis because 0.006 is less than 0.05. That is, we have statistically significant evidence that the higher quality gears actually do pass inspection more than 3% as often as the currently used gears. Based on these results, management will approve the switch to the new supplier.

^{8}H_{0}: The higher quality gears will pass inspection no more than 3% more frequently than the standard quality gears. \(p_{highQ} - p_{standard} = 0.03\). H_{A}: The higher quality gears will pass inspection more than 3% more often than the standard quality gears. \(p_{highQ} - p_{standard} > 0.03\).

Figure 6.3: Distribution of the test statistic if the null hypothesis was true.

The p-value is represented by the shaded area.

### Hypothesis testing when H_{0} : \(p_1 = p_2\)

Here we use a new example to examine a special estimate of standard error when H_{0} :\( p_1 = p_2\). We investigate whether there is an increased risk of cancer in dogs that are exposed to the herbicide 2,4-dichlorophenoxyacetic acid (2,4-D). A study in 1994 examined 491 dogs that had developed cancer and 945 dogs as a control group.9 Of these two groups, researchers identified which dogs had been exposed to 2,4-D in their owner's yard. The results are shown in Table 6.4.

cancer | no cancer | |

2,4 - D no 2,4 - D | 191 300 | 304 641 |

Table 6.4: Summary results for cancer in dogs and the use of 2,4-D by the dog's owner.

**Exercise 6.13** Is this study an experiment or an observational study?^{10}

**Exercise 6.14 ** Set up hypotheses to test whether 2,4-D and the occurrence of cancer in dogs are related. Use a one-sided test and compare across the cancer and no cancer groups.^{11}

^{9}Hayes HM, Tarone RE, Cantor KP, Jessen CR, McCurnin DM, and Richardson RC. 1991. CaseControl Study of Canine Malignant Lymphoma: Positive Association With Dog Owner's Use of 2, 4-Dichlorophenoxyacetic Acid Herbicides. Journal of the National Cancer Institute 83(17):1226-1231.

^{10}The owners were not instructed to apply or not apply the herbicide, so this is an observational study. This question was especially tricky because one group was called the control group, which is a term usually seen in experiments.

^{11}Using the proportions within the cancer and no cancer groups may seem odd. We intuitively may desire to compare the fraction of dogs with cancer in the 2,4-D and no 2,4-D groups, since the herbicide is an explanatory variable. However, the cancer rates in each group do not necessarily reect the cancer rates in reality due to the way the data were collected. For this reason, computing cancer rates may greatly alarm dog owners.

*H _{0}: the proportion of dogs with exposure to 2,4-D is the same in "cancer" and \no cancer" dogs, \(p_c-p_n = 0\).*

*H _{A}: dogs with cancer are more likely to have been exposed to 2,4-D than dogs without cancer, \(p_c-p_n > 0\).*

**Example 6.15** Are the conditions met to use the normal model and make inference on the results?

(1) It is unclear whether this is a random sample. However, if we believe the dogs in both the cancer and no cancer groups are representative of each respective population and that the dogs in the study do not interact in any way, then we may find it reasonable to assume independence between observations. (2) The success-failure condition holds for each sample.

Under the assumption of independence, we can use the normal model and make statements regarding the canine population based on the data.

In your hypotheses for Exercise 6.14, the null is that the proportion of dogs with exposure to 2,4-D is the same in each group. The point estimate of the difference in sample proportions is \(\hat {p}_c - \hat {p}_n = 0.067\). To identify the p-value for this test, we first check conditions (Example 6.15) and compute the standard error of the difference:

\[SE = \sqrt {\frac {p_c(1 - p_c)}{n_c} + \frac {p_n(1 - p_n)}{n_n}}\]

In a hypothesis test, the distribution of the test statistic is always examined as though the null hypothesis is true, i.e. in this case, \(p_c = p_n\). The standard error formula should reflect this equality in the null hypothesis. We will use p to represent the common rate of dogs that are exposed to 2,4-D in the two groups:

\[SE = \sqrt {\frac {p(1 - p}{n_c} + \frac {p(1 - p)}{n_n}}\]

We don't know the exposure rate, p, but we can obtain a good estimate of it by pooling the results of both samples:

\[ \hat {p} = \frac {\text {# of "successes"}}{\text {# of cases}} = \frac {191 + 304}{191 + 300 + 304 + 641} = 0.345\]

This is called the **pooled estimate **of the sample proportion, and we use it to compute the standard error when the null hypothesis is that \(p_1 = p_2\) (e.g. \(p_c = p_n\) or \(p_c - p_n = 0)\). We also typically use it to verify the success-failure condition.

When the null hypothesis is \(p_1 = p_2\), it is useful to nd the pooled estimate of the shared proportion: \[ \hat {p} = \frac {\text {number of "successes"}}{\text {number of cases}} = \frac {\hat {p}_1n_1 + \hat {p}_2n_2}{n_1 + n_2}\] Here \(\hat {p}_1n_1\) represents the number of successes in sample 1 since \[\hat {p}_1 = \frac {\text {number of successes in sample 1}}{n_1}\] Similarly, \(\hat {p}_2n_2\) represents the number of successes in sample 2. |

When the null hypothesis suggests the proportions are equal, we use the pooled proportion estimate ((\hat {p}\)) to verify the success-failure condition and also to estimate the standard error: \[SE = \sqrt {\frac {\hat {p}(1 - \hat {p})}{n_c} + \frac {\hat {p}(1 - \hat {p})}{n_n}} \tag {6.16}\] |

**Exercise 6.17 ** Using Equation (6.16), \(\hat {p} = 0.345, n_1 = 491\), and \(n_2 = 945\), verify the estimate for the standard error is SE = 0.026. Next, complete the hypothesis test using a signi cance level of 0.05. Be certain to draw a picture, compute the p-value, and state your conclusion in both statistical language and plain language.^{12}

### Contributors

- David M Diez (Google/YouTube)
- Christopher D Barr (Harvard School of Public Health)
- Mine Çetinkaya-Rundel (Duke University)