Loading [MathJax]/jax/output/HTML-CSS/jax.js
Skip to main content
Library homepage
 

Text Color

Text Size

 

Margin Size

 

Font Type

Enable Dyslexic Font
Statistics LibreTexts

9.1: Two Proportions

( \newcommand{\kernel}{\mathrm{null}\,}\)

There are times you want to test a claim about two population proportions or construct a confidence interval estimate of the difference between two population proportions. As with all other hypothesis tests and confidence intervals, the process is the same though the formulas and assumptions are different.

Hypothesis Test for Two Populations Proportion (2-Prop Test)

  1. State the random variables and the parameters in words.
    x1= number of successes from group 1
    x2 = number of successes from group 2
    p1 = proportion of successes in group 1
    p2 = proportion of successes in group 2
  2. State the null and alternative hypotheses and the level of significance
    Ho:p1=p2 or Ho:p1p2=0HA:p1<p2HA:p1p2<0HA:p1>p2HA:p1p2>0HA:p1p2HA:p1p20
    Also, state your α level here.
  3. State and check the assumptions for a hypothesis test
    1. A simple random sample of size n1 is taken from population 1, and a simple random sample of size n2 is taken from population 2.
    2. The samples are independent.
    3. The assumptions for the binomial distribution are satisfied for both populations.
    4. To determine the sampling distribution of ˆp1, you need to show that n1p15 and n1q15, where q1=1p1. If this requirement is true, then the sampling distribution of ˆp1 is well approximated by a normal curve. To determine the sampling distribution of ˆp2, you need to show that n2p25 and n2q25, where q2=1p2. If this requirement is true, then the sampling distribution of ˆp2 is well approximated by a normal curve. However, you do not know p1 and p2, so you need to use ˆp1 and instead ˆp2. This is not perfect, but it is the best you can do. Since n1ˆp1=n1x1n1=x1 (and similar for the other calculations) you just need to make sure that x1, n1x1, n2x2,and are all more than 5.
  4. Find the sample statistics, test statistic, and p-value
    Sample Proportion:
    n1= size of sample 1n2= size of sample 2ˆp1=x1n1( sample 1 proportion) ˆp2=x2n2 (sample 2 proportion) ˆq1=1ˆp1 (complement of ˆp1)ˆq2=1ˆp2 (complement of ˆp2)
    Pooled Sample Proportion, ¯p:
    ¯p=x1+x2n1+n2¯q=1¯p
    Test Statistic:
    z=(ˆp1ˆp2)(p1p2)¯p¯qn1+¯p¯qn2
    Usually p1p2=0, since Ho:p1=p2
    p-value: On TI-83/84: use normalcdf(lower limit, upper limit, 0, 1)

    Note

    If HA:p1<p2 then lower limit is 1E99 and upper limit is your test statistic. If HA:p1>p2, then lower limit is your test statistic and the upper limit is 1E99. If HA:p1p2, then find the p-value for HA:p1<p2, and multiply by 2.

    On R: use pnorm(z, 0, 1)

    Note

    If HA:p1<p2, then use pnorm(z, 0, 1). If HA:p1>p2, then use 1 - pnorm(z, 0, 1). If HA:p1p2, then find the p-value for HA:p1<p2, and multiply by 2.

  5. Conclusion This is where you write reject Ho or fail to reject Ho. The rule is: if the p-value < α, then reject Ho. If the p-value α, then fail to reject Ho.
  6. Interpretation This is where you interpret in real world terms the conclusion to the test. The conclusion for a hypothesis test is that you either have enough evidence to show HA is true, or you do not have enough evidence to show HA is true.

Confidence Interval for the Difference Between Two Population Proportion (2-Prop Interval)

The confidence interval for the difference in proportions has the same random variables and proportions and the same assumptions as the hypothesis test for two proportions. If you have already completed the hypothesis test, then you do not need to state them again. If you haven’t completed the hypothesis test, then state the random variables and proportions and state and check the assumptions before completing the confidence interval step

  1. Find the sample statistics and the confidence interval
    Sample Proportion:
    n1= size of sample 1n2= size of sample 2ˆp1=x1n1( sample 1 proportion) ˆp2=x2n2 (sample 2 proportion) ˆq1=1ˆp1( complement of ˆp1)ˆq2=1ˆp2 (complement of ˆp2)
    Confidence Interval:
    The confidence interval estimate of the difference p1p2 is
    (ˆp1ˆp2)E<p1p2<(ˆp1ˆp2)+E
    where the margin of error E is given by E=zcˆp1ˆq1n1+ˆp2ˆq2n2
    zc = critical value
  2. Statistical Interpretation: In general this looks like, “there is a C% chance that (ˆp1ˆp2)E<p1p2<(ˆp1ˆp2)+E contains the true difference in proportions.”
  3. Real World Interpretation: This is where you state how much more (or less) the first proportion is from the second proportion.

The critical value is a value from the normal distribution. Since a confidence interval is found by adding and subtracting a margin of error amount from the sample proportion, and the interval has a probability of being true, then you can think of this as the statement P((ˆp1ˆp2)E<p1p2<(ˆp1ˆp2)+E)=C. So you can use the invNorm command on the TI-83/84 calculator to find the critical value. These are always the same value, so it is easier to just look at the table A.1 in the Appendix.

Example 9.1.1 hypothesis test for two population proportions

Do husbands cheat on their wives more than wives cheat on their husbands ("Statistics brain," 2013)? Suppose you take a group of 1000 randomly selected husbands and find that 231 had cheated on their wives. Suppose in a group of 1200 randomly selected wives, 176 cheated on their husbands. Do the data show that the proportion of husbands who cheat on their wives are more than the proportion of wives who cheat on their husbands. Test at the 5% level.

  1. State the random variables and the parameters in words.
  2. State the null and alternative hypotheses and the level of significance.
  3. State and check the assumptions for a hypothesis test.
  4. Find the sample statistics, test statistic, and p-value.
  5. Conclusion
  6. Interpretation

Solution

1. x1 = number of husbands who cheat on his wife

x2 = number of wives who cheat on her husband

p1 = proportion of husbands who cheat on his wife

p2 = proportion of wives who cheat on her husband

2. Ho:p1=p2 or Ho:p1p2=0HA:p1>p2HA:p1p2>0a=0.05

3.

  1. A simple random sample of 1000 responses about cheating from husbands is taken. This was stated in the problem. A simple random sample of 1200 responses about cheating from wives is taken. This was stated in the problem.
  2. The samples are independent. This is true since the samples involved different genders.
  3. The properties of the binomial distribution are satisfied in both populations. This is true since there are only two responses, there are a fixed number of trials, the probability of a success is the same, and the trials are independent.
  4. The sampling distributions of ˆp1 and ˆp2 can be approximated with a normal distribution.
    x1=231,n1x1=1000231=769,x2=176, and
    n2x2=1200176=1024 are all greater than or equal to 5. So both sampling distributions of ˆp1 and ˆp2 can be approximated with a normal distribution.

4. Sample Proportion:

n1=1000n2=1200ˆp1=2311000=0.231ˆp2=17612000.1467ˆq1=12311000=7691000=0.769ˆq2=11761200=102412000.8533

Pooled Sample Proportion, ¯p:

¯p=231+1761000+1200=4072200=0.185¯q=14072200=17932200=0.815

Test Statistic:

z=(0.2310.1467)00.1850.8151000+0.1850.8151200

=5.0704

p-value:

On TI-83/84: normalcdf (5.0704,1E99,0,1)=1.988×107

On R: 1 pnorm (5.0704,0,1)=1.988×107

Screenshot (158).png
Figure 9.1.1: Setup for 2-PropZTest on TI-83/84 Calculator
Screenshot (159).png
Figure 9.1.2: Results for 2-PropZTest on TI-83/84 Calculator
Screenshot (160).png
Figure 9.1.3: Results for 2-PropZTest on TI-83/84: Calculator

On R: prop.test(c(x1,x2),c(n1,n2), alternative = "less" or "greater". For this example, prop.test(c(231,176), c(1000, 1200), alternative="greater")

2-sample test for equality of proportions with continuity correction

data: c(231, 176) out of c(1000, 1200)

X-squared = 25.173, df = 1, p-value = 2.621e-07

alternative hypothesis: greater

95 percent confidence interval:

0.05579805 1.00000000

sample estimates:

prop 1 prop 2

0.2310000 0.1466667

Note

Again, computer software may do a continuity correction here, leading to a slightly different answer.

5. Conclusion

Reject Ho, since the p-value is less than 5%.

6. Interpretation This is enough evidence to show that the proportion of husbands having affairs is more than the proportion of wives having affairs.

Example 9.1.2 confidence interval for two population properties

Do more husbands cheat on their wives more than wives cheat on the husbands ("Statistics brain," 2013)? Suppose you take a group of 1000 randomly selected husbands and find that 231 had cheated on their wives. Suppose in a group of 1200 randomly selected wives, 176 cheated on their husbands. Estimate the difference in the proportion of husbands and wives who cheat on their spouses using a 95% confidence level.

  1. State the random variables and the parameters in words.
  2. State and check the assumptions for the confidence interval.
  3. Find the sample statistics and the confidence interval.
  4. Statistical Interpretation
  5. Real World Interpretation

Solution

1. These were stated in Example 9.1.1, but are reproduced here for reference.

x1 = number of husbands who cheat on his wife

x2 = number of wives who cheat on her husband

p1 = proportion of husbands who cheat on his wife

p2 = proportion of wives who cheat on her husband

2. The assumptions were stated and checked in Example 9.1.1.

3. Sample Proportion:

n1=1000n2=1200ˆp1=2311000=0.231ˆp2=17612000.1467ˆq1=12311000=7691000=0.769ˆq2=11761200=102412000.8533

Confidence Interval:

zc=1.96E=1.960.2310.7691000+0.14670.85331200=0.033

The confidence interval estimate of the difference p1p2 is

(ˆp1ˆp2)E<p1p2<(ˆp1ˆp2)+E(0.2310.1467)0.033<p1p2<(0.2310.1467)+0.0330.0513<p1p2<0.1173

Screenshot (161).png
Figure 9.1.4: Setup for 2-PropZInt on TI-83/84 Calculator
Screenshot (162).png
Figure 9.1.5: Results for 2-PropZInt on TI-83/84 Calculator

On R: prop.test(c(x1,x2),c(n1,n2), conf.level =C), where C is in decimal form. For this example, prop.test(c(231,176), c(1000, 1200), conf.level=0.95)

2-sample test for equality of proportions with continuity correction

data: c(231, 176) out of c(1000, 1200)

X-squared = 25.173, df = 1, p-value = 5.241e-07

alternative hypothesis: two.sided

95 percent confidence interval:

0.05050705 0.11815962

sample estimates:

prop 1 prop 2

0.2310000 0.1466667

4. Statistical Interpretation: There is a 95% chance that 0.0505<p1p2<0.1182 contains the true difference in proportions.

5. Real World Interpretation: The proportion of husbands who cheat is anywhere from 5.05% to 11.82% higher than the proportion of wives who cheat.

Homework

Exercise 9.1.1

In each problem show all steps of the hypothesis test or confidence interval. If some of the assumptions are not met, note that the results of the test or interval may not be correct and then continue the process of the hypothesis test or confidence interval.

  1. Many high school students take the AP tests in different subject areas. In 2007, of the 144,796 students who took the biology exam 84,199 of them were female. In that same year, of the 211,693 students who took the calculus AB exam 102,598 of them were female ("AP exam scores," 2013). Is there enough evidence to show that the proportion of female students taking the biology exam is higher than the proportion of female students taking the calculus AB exam? Test at the 5% level.
  2. Many high school students take the AP tests in different subject areas. In 2007, of the 144,796 students who took the biology exam 84,199 of them were female. In that same year, of the 211,693 students who took the calculus AB exam 102,598 of them were female ("AP exam scores," 2013). Estimate the difference in the proportion of female students taking the biology exam and female students taking the calculus AB exam using a 90% confidence level.
  3. Many high school students take the AP tests in different subject areas. In 2007, of the 211,693 students who took the calculus AB exam 102,598 of them were female and 109,095 of them were male ("AP exam scores," 2013). Is there enough evidence to show that the proportion of female students taking the calculus AB exam is different from the proportion of male students taking the calculus AB exam? Test at the 5% level.
  4. Many high school students take the AP tests in different subject areas. In 2007, of the 211,693 students who took the calculus AB exam 102,598 of them were female and 109,095 of them were male ("AP exam scores," 2013). Estimate using a 90% level the difference in proportion of female students taking the calculus AB exam versus male students taking the calculus AB exam.
  5. Are there more children diagnosed with Autism Spectrum Disorder (ASD) in states that have larger urban areas over states that are mostly rural? In the state of Pennsylvania, a fairly urban state, there are 245 eight year olds diagnosed with ASD out of 18,440 eight year olds evaluated. In the state of Utah, a fairly rural state, there are 45 eight year olds diagnosed with ASD out of 2,123 eight year olds evaluated ("Autism and developmental," 2008). Is there enough evidence to show that the proportion of children diagnosed with ASD in Pennsylvania is more than the proportion in Utah? Test at the 1% level.
  6. Are there more children diagnosed with Autism Spectrum Disorder (ASD) in states that have larger urban areas over states that are mostly rural? In the state of Pennsylvania, a fairly urban state, there are 245 eight year olds diagnosed with ASD out of 18,440 eight year olds evaluated. In the state of Utah, a fairly rural state, there are 45 eight year olds diagnosed with ASD out of 2,123 eight year olds evaluated ("Autism and developmental," 2008). Estimate the difference in proportion of children diagnosed with ASD between Pennsylvania and Utah. Use a 98% confidence level.
  7. A child dying from an accidental poisoning is a terrible incident. Is it more likely that a male child will get into poison than a female child? To find this out, data was collected that showed that out of 1830 children between the ages one and four who pass away from poisoning, 1031 were males and 799 were females (Flanagan, Rooney & Griffiths, 2005). Do the data show that there are more male children dying of poisoning than female children? Test at the 1% level.
  8. A child dying from an accidental poisoning is a terrible incident. Is it more likely that a male child will get into poison than a female child? To find this out, data was collected that showed that out of 1830 children between the ages one and four who pass away from poisoning, 1031 were males and 799 were females (Flanagan, Rooney & Griffiths, 2005). Compute a 99% confidence interval for the difference in proportions of poisoning deaths of male and female children ages one to four.
Answer

For all hypothesis tests, just the conclusion is given. For all confidence intervals, just the interval using technology (Software R) is given. See solution for the entire answer.

  1. Reject Ho
  2. 0.0941<p1p2<0.0996
  3. Reject Ho
  4. 0.0332<p1p2<0.0282
  5. Fail to reject Ho
  6. 0.01547<p1p2<0.0001
  7. Reject Ho
  8. 0.0840<p1p2<0.1696

This page titled 9.1: Two Proportions is shared under a CC BY-SA 4.0 license and was authored, remixed, and/or curated by Kathryn Kozak via source content that was edited to the style and standards of the LibreTexts platform.

Support Center

How can we help?