# 9.3: Inferences for Two Population Means - Unknown Standard Deviations

- Page ID
- 13179

Skills to Develop

- To learn how to construct a confidence interval for the difference in the means of two distinct populations using independent samples with unknown population standard deviations.
- To learn how to perform a test of hypotheses concerning the difference between the means of two distinct populations using independent samples with unknown population standard deviations.

Often, the population standard deviation is unknown. In this case, a t distribution is used to draw inferences from these samples, when the samples are independent and drawn from two approximately normally distributed populations. We will assume that the variances for the populations are not necessarily equal.

## Confidence Intervals

When the two populations are normally distributed, the following formula for a confidence interval for \(\mu _1-\mu _2\) is valid.

\(100(1-\alpha )\%\) Confidence Interval for the Difference Between Two Population Means

\[(\bar{x_1}-\bar{x_2})\pm t_{\alpha /2}\sqrt{\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}} \label{eq1}\]

where the number of degrees of freedom is the smaller value of \(n_1 -1\) or \(n_2-1.\), or

\[\min(n_1 -1,n_2-1).\]

The samples must be independent, the samples are random samples, and when the sample sizes are less than 30, the populations must be normally distributed or approximately normally distributed.

Example \(\PageIndex{1}\)

A software company markets a new computer game with two experimental packaging designs. Design \(1\) is sent to \(11\) stores; their average sales the first month is \(52\) units with sample standard deviation \(12\) units. Design \(2\) is sent to \(6\) stores; their average sales the first month is \(46\) units with sample standard deviation \(10\) units. Construct a point estimate and a \(95\%\) confidence interval for the difference in average monthly sales between the two package designs. Assume the populations are approximately normal.

**Solution**:

The point estimate of \(\mu _1-\mu _2\) is

\[\bar{x_1}-\bar{x_2}=52-46-6\]

In words, we estimate that the average monthly sales for Design \(1\) is \(6\) units more per month than the average monthly sales for Design \(2\).

To apply the formula for the confidence interval (Equation \ref{eq1}), we must find \(t_{\alpha /2}\). The \(95\%\) confidence level means that \(\alpha =1-0.95=0.05\) so that \(t_{\alpha /2}=t_{0.025}\). From Figure 7.1.6, in the row with the heading \(df=\min(n_1-1, n_2-1)=\min(11-1, 6-1)=\min(10, 5)=5\) we read that

\(t_{0.025}=2.571\). From the formula we compute

\[(\bar{x_1}-\bar{x_2})\pm t_\alpha /2\sqrt{\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}}=6\pm (2.571)\sqrt{\dfrac{12^2}{11}+\dfrac{10^2}{6}}\approx 6\pm 14.0\]

We are \(95\%\) confident that the difference in the population means lies in the interval \([-8.0,20.0]\), in the sense that in repeated sampling \(95\%\) of all intervals constructed from the sample data in this manner will contain \(\mu _1-\mu _2\). Because the interval contains both positive and negative values the statement in the context of the problem is that we are \(95\%\) confident that the average monthly sales for Design \(1\) is between \(20\) units higher and \(8\) units lower than the average monthly sales for Design \(2\).

## Hypothesis Testing

Testing hypotheses concerning the difference of two population means for unknown standard deviations is done precisely as it is done for large samples, using the following standardized test statistic. The same conditions on the populations that were required for constructing a confidence interval for the difference of the means must also be met when hypotheses are tested.

Standardized Test Statistic for Hypothesis Tests Concerning the Difference Between Two Population Means: when the standard deviations are unknown

\[T=\dfrac{(\bar{x_1}-\bar{x_2})-D_0}{\sqrt{\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}}}\]

The test statistic has Student’s t-distribution with \(\min(n_1 -1,n_2-1)\) degrees of freedom.

The samples are random, the sample data must be independent, and when the sample sizes are less than 30, the populations must be normal or approximately normal.

Example \(\PageIndex{2}\)

Refer to Example \(\PageIndex{1}\) concerning the mean sales per month for the same computer game but sold with two package designs. Test at the \(1\%\) level of significance whether the data provide sufficient evidence to conclude that the mean sales per month of the two designs are different. Use the critical value approach.

**Solution**:

**Step 1**. The relevant test is

\[H_0: \mu _1-\mu _2=0\]

vs.

\[H_a: \mu _1-\mu _2\neq 0\; \; @\; \; \alpha =0.01\]

**Step 2**. Since the samples are independent and at least one is less than \(30\) the test statistic is

\[T=\dfrac{(\bar{x_1}-\bar{x_2})-D_0}{\sqrt{\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}}}\]

which has Student’s \(t\)-distribution with \(df=\min(n_1-1, n_2-1)=\min(11-1, 6-1)=\min(10, 5)=5\) degrees of freedom.

**Step 3**. Inserting the data and the value \(D_0=0\) into the formula for the test statistic gives

\[T=\dfrac{(\bar{x_1}-\bar{x_2})-D_0}{\sqrt{\dfrac{s_1^2}{n_1}+\dfrac{s_2^2}{n_2}}}=\dfrac{(52-46)-0}{\sqrt{\dfrac{12^2}{11}+\dfrac{10^2}{6}}}=1.100\]

**Step 4**. Since the symbol in \(H_a\) is “\(\neq\)” this is a two-tailed test, so there are two critical values, \(\pm t_{\alpha /2}=\pm t_{0.005}\). From the row in Figure 7.1.6 with the heading \(df=5\) we read off \(t_{0.005}=4.032\). The rejection region is \((-\infty ,-4.032]\cup [4.032,\infty )\).

** Figure \(\PageIndex{1}\)**:

*Rejection Region and Test Statistic for "Example*\(\PageIndex{2}\)"

**Step 5**. As shown in Figure \(\PageIndex{1}\) the test statistic does not fall in the rejection region. The decision is not to reject \(H_0\). In the context of the problem our conclusion is:

The data do not provide sufficient evidence, at the \(1\%\) level of significance, to conclude that the mean sales per month of the two designs are different.

Example \(\PageIndex{3}\)

Perform the test of Example \(\PageIndex{2}\) using the \(p\)-value approach.

**Solution**:

The first three steps are identical to those in Example \(\PageIndex{2}\).

**Step 4**. Because the test is two-tailed the observed significance or \(p\)-value of the test is the double of the area of the right tail of Student’st-distribution, with \(5\) degrees of freedom, that is cut off by the test statistic \(T=1.100\). We can only approximate this number. Looking in the row of Figure 7.1.6 headed \(df=5\), the number \(1.100\) is less than the number \(1.476\), corresponding to \(t_{0.100}\) . The area cut off by \(t=1.476\) is \(0.100\). Since \(1.100\) is smaller than \(1.476\), the area it cuts off must be larger than \(0.100\). Thus the \(p\)-value (since the area must be doubled) is larger than \(0.200\).**Step 5**. Since \(p>0.200>0.01,\; \; p>\alpha\), so the decision is not to reject the null hypothesis:

The data do not provide sufficient evidence, at the \(1\%\) level of significance, to conclude that the mean sales per month of the two designs are different.

Notice the conclusion for the critical value method and the \(p\)-value method are the same.

## Additional Notes

The degrees of freedom given here is a rough estimate. The formula to calculate actual degrees of freedom is used in technology, so if you try these problems on your TI-84 or in Excel, you will get a different degree of freedom for the example problem.

The formula for degrees of freedom for populations with unequal variances is

\[d.f.=\dfrac{\left(s_1^2/n_1+s_2^2/n_2\right)^2}{\left(s_1^2/n_1\right)^2/\left(n_1-1\right)+\left(s_2^2/n_2\right)^2/\left(n_2-1\right)}.\]

For hypothesis tests for normally or approximately normally distributed populations where variances are assumed to be equal, the following formula is used for the test statistic.

\[T=\dfrac{(\bar{x_1}-\bar{x_2})-D_0}{\sqrt{\dfrac{(n_1-1)s_{1}^{2}+(n_2-1)s_{2}^{2}}{n_1+n_2-2}\left ( \dfrac{1}{n_1}+\dfrac{1}{n_2}\right )}}\]

This test statistic has Student’s t-distribution with \(df=n_1+n_2-2\) degrees of freedom.

## Key Takeaway

- In the context of estimating or testing hypotheses concerning two population means when the standard deviations are unknown, we must use the t-distribution unless the samples are very large. When the samples are very large (both are greater than 30), the normal distribution can be used as an approximation to the t-distribution as the differences between them are minimal.
- A confidence interval for the difference in two population means is computed using a formula in the same fashion as was done for a single population mean.