7.10: Comparing Two Independent Population Proportions

Last updated
Save as PDF

Page ID: 4607

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

When conducting a hypothesis test that compares two independent population proportions, the following characteristics should be present:

The two independent samples are random samples that are independent.
The number of successes is at least five, and the number of failures is at least five, for each of the samples.
Growing literature states that the population must be at least ten or even perhaps 20 times the size of the sample. This keeps each population from being over-sampled and causing biased results.

Comparing two proportions, like comparing two means, is common. If two estimated proportions are different, it may be due to a difference in the populations or it may be due to chance in the sampling. A hypothesis test can help determine if a difference in the estimated proportions reflects a difference in the two population proportions.

Like the case of differences in sample means, we construct a sampling distribution for differences in sample proportions: \(\left(p_{A}^{\prime}-p_{B}^{\prime}\right)\) where \(p_{A}^{\prime}=X_{\frac{A}{n_{A}}}\) and \(p_{B}^{\prime}=X_{\frac{B}{n_{B}}}\) are the sample proportions for the two sets of data in question. \(X_A\) and \(X_B\) are the number of successes in each sample group respectively, and \(n_A\) and \(n_B\) are the respective sample sizes from the two groups. Again we go the Central Figure \(\PageIndex{5}\).

96357031ad318e5115d8e8a7372d37ffc79ad7bd — Figure \(\PageIndex{5}\)

Generally, the null hypothesis allows for the test of a difference of a particular value, \(\delta_{0}\), just as we did for the case of differences in means.

\[H_{0} : p_{1}-p_{2}=\delta_{0}\nonumber\]

\[H_{1} : p_{1}-p_{2} \neq \delta_{0}\nonumber\]

Most common, however, is the test that the two proportions are the same. That is,

\[H_{0} : p_{\mathrm{A}}=p_{B}\nonumber\]

\[H_{a} : p_{\mathrm{A}} \neq p_{B}\nonumber\]

To conduct the test, we use a pooled proportion, \(p_c\).

\[\textbf{The pooled proportion is calculated as follows:}\nonumber\]

\[p_{c}=\frac{x_{A}+x_{B}}{n_{A}+n_{B}}\nonumber\]

\[\textbf{The test statistic (z-score) is:}\nonumber\]

\[Z_{c}=\frac{\left(p_{A}^{\prime}-p_{B}^{\prime}\right)-\delta_{0}}{\sqrt{p_{c}\left(1-p_{c}\right)\left(\frac{1}{n_{A}}+\frac{1}{n_{B}}\right)}}\nonumber\]

where \(\delta_{0}\) is the hypothesized differences between the two proportions and p_c is the pooled variance from the formula above.

Example \(\PageIndex{6}\)

A bank has recently acquired a new branch and thus has customers in this new territory. They are interested in the default rate in their new territory. They wish to test the hypothesis that the default rate is different from their current customer base. They sample 200 files in area A, their current customers, and find that 20 have defaulted. In area B, the new customers, another sample of 200 files shows 12 have defaulted on their loans. At a 10% level of significance can we say that the default rates are the same or different?

Answer

Solution 10.6

This is a test of proportions. We know this because the underlying random variable is binary, default or not default. Further, we know it is a test of differences in proportions because we have two sample groups, the current customer base and the newly acquired customer base. Let A and B be the subscripts for the two customer groups. Then p_Aand p_B are the two population proportions we wish to test.

Random Variable:

\(P_{A}^{\prime}-P_{B}^{\prime}\) = difference in the proportions of customers who defaulted in the two groups.

\(H_{0} : p_{A}=p_{B}\)

\(H_{a} : p_{A} \neq p_{B}\)

The words "is a difference" tell you the test is two-tailed.

Distribution for the test: Since this is a test of two binomial population proportions, the distribution is normal:

\(p_{c}=\frac{x_{A}+x_{B}}{n_{A}+n_{B}}=\frac{20+12}{200+200}=0.08\) \(1-p_{c}=0.92\)

\(\left(p^{\prime} A-p^{\prime} B\right)=0.04\) follows an approximate normal distribution.

Estimated proportion for group A: \(p^{\prime}_{A}=\frac{x_{A}}{n_{A}}=\frac{20}{200}=0.1\)

Estimated proportion for group B: \(p^{\prime}_{B}=\frac{x_{B}}{n_{B}}=\frac{12}{200}=0.06\)

The estimated difference between the two groups is : \(p_{A}^{\prime}-p_{B}^{\prime}=0.1-0.06=0.04\).

Normal distribution curve of the difference in the percentages of adult patients who don't react to medication A and B after 30 minutes. The mean is equal to zero, and the values -0.04, 0, and 0.04 are labeled on the horizontal axis. Two vertical lines extend from -0.04 and 0.04 to the curve. The region to the left of -0.04 and the region to the right of 0.04 are each shaded to represent 1/2(p-value) = 0.0702. — Figure \(\PageIndex{6}\)

\[Z_{c}=\frac{\left(\mathrm{P}_{A}^{\prime}-\mathrm{P}_{B}^{\prime}\right)-\delta_{0}}{P_{c}\left(1-P_{c}\right)\left(\frac{1}{n_{A}}+\frac{1}{n_{B}}\right)}=0.54\nonumber\]

The calculated test statistic is .54 and is not in the tail of the distribution.

Make a decision: Since the calculate test statistic is not in the tail of the distribution we cannot reject \(H_0\).

Conclusion: At a 1% level of significance, from the sample data, there is not sufficient evidence to conclude that there is a difference between the proportions of customers who defaulted in the two groups.

Exercise \(\PageIndex{6}\)

Two types of valves are being tested to determine if there is a difference in pressure tolerances. Fifteen out of a random sample of 100 of Valve A cracked under 4,500 psi. Six out of a random sample of 100 of Valve B cracked under 4,500 psi. Test at a 5% level of significance.