Analysis of Factor Level Means and Contrasts
( \newcommand{\kernel}{\mathrm{null}\,}\)
1 Analysis of Factor Level Means
Suppose we reject H0:μ1=...=μr. Then, we want to investigate the nature of the differences amond the factor level means by studying the following:- One factor level mean: μi
- Difference between two factor level means: D=μi−μj
- Contrast of factor level means: L=∑ri=1ciμi where ∑ri=1ci=0
When more than one contrasts are involved, we also need to consider procedures that account for multiple comparisons, including:
- Bonferroni's procedure
- Tukey's procedure
- Scheffe's procedure
1.1 Inference about one factor level mean
- ˉYi⋅ is an unbiased estimator of E(ˉYi⋅)=μi
- Var(ˉYi⋅)=σ2ni
MSE=SSE/(nT−r) is a point estimator of σ2:
- an unbiased estimator: E(MSE)=σ2
- SSE∼χ2(nT−r) and is independent of {ˉYi⋅:i=1,...,r}
Thus,
- the estimated standard error of ˉYi⋅ is
s(ˉYi⋅)=√MSEni
- ˉYi⋅−μi√MSEni∼t(nT−r), i.e. a t-distribution with nT−r degrees of freedom
- A 100(1-α%) two sided confidence interval of μi is given by
ˉYi⋅±s(ˉYi⋅)t(1−α2;nT−r)
where t(1−α2;nT−r) denotes the 1−α/2 quantile of the t-distribution with nT−r degrees of freedom.
- Test H0:μi=c against Ha:μi≠c
- T-statistics:
T∗=ˉYi⋅−c√MSEni
- Under H0:T∗ t(nT−r)
- At significance level α, reject H0 if ‖
- Confidence Interval Approach: If c does not belong to the 100(1-\alpha%) (two-sided) confidence interval of \mu_{i}, then reject H_{0} at level \alpha
1.2 Example
- In the "package design" example, the estimate of \mu_{1} is \bar{Y}_{1\cdot} = 14.6
- MSE = 10.55 and n_{1} = 5
- Thus s(\bar{Y}_{1\cdot}) = \sqrt{10.55/5} = 1.45258
- The degrees of freedom of MSE is 19 - 4 = 15 (since n_{T} = 19 and r = 4)
- The 95% confidence interval of \mu_{1} is
14.6 \pm 1.45258 x t(0.975; 15) = 14.6 \pm 1.45258 x 2.131 = 14.6 \pm 3.09545 = (11.50455, 17.69545)
Here, from Table B.2, we get that t(0.975, 15) = 2.131 (or, use the R command: \textit{qt(0.675, 15)})
1.3 Interpretation of confidence intervals
- A wrong statement: P(11.51 \leq \mu_{1} \leq 17.69 = 0.95. Why? Since "11.51 \leq \mu_{1} \leq 17.69" is true or false as a fact.
- Interpretation of C.I.: if exactly the same study on package designs were repeated many times, and at each time a 95% C.I. for \mu_{1} were constructed as above, then about 95% of the time, the C.I. would contain the true value \mu_{1}.
- A correct statement based on the observed data, we are 95% confident that \mu_{1} is in between 11.51 and 17.69
- The difference between a random variable and its realiizations \bar{Y}_{1\cdot} is a random variable; 14.6 is the realization of \bar{Y}_{1\cdot} in the current sample
1.4 Difference between two factor level means
Let D = \mu_{i} - \mu_{j} for some i \neq j
- \hat{D} = \bar{Y}_{i\cdot} - \bar{Y}_{j\cdot} is an unbiased estimator of D
- Var(\hat{D}) = Var(\bar{Y}_{i\cdot}) + Var(\bar{Y}_{j\cdot}) = \sigma^{2}(\frac{1}{n_{i}} + \frac{1}{n_{j}}) (since \bar{Y}_{i\cdot} and bar{Y}_{j\cdot} are independent
- estimated standard error of \hat{D} = s(\hat{D}) = \sqrt{MSE(1/n_{i} + 1/n_{j}}
- for every \mu_{i} and \mu_{j}, the ratio frac{\hat{D} - D}{s(\hat{D})} has t_{(n_{T} - r)} distribution
1.5 Inference on the difference between two factor level means
- 100(1 - \alpha)% (two-sided) confidence interval of D
\hat{D} \pm s(\hat{D})t(1 - \frac{\alpha}{2}; n_{T} - r)
Test H_{0}: D = 0 against H_{a}: D \neq 0. At the significance level \alpha, check whether
\hat{D} - s(\hat{D})t(1 - \frac{\alpha}{2}; n_{T} - r) \leq 0 \leq \hat{D} + s(\hat{D})t(1 - \frac{\alpha}{2}; n_{T} - r)
If not, reject H_{0} at level \alpha and conclude that the two means are different.
1.6 Example
In a study of the effectiveness of different rust inhibitors, four brands (1, 2, 3, 4) were tested. Altogether, 40 experimental units were randomly assigned to the four brands, with 10 units assigned to each brand. The resistance to rust was evaluated in a coded form after exposing the experimental units to severe conditions.
- This is a balanced complete randomized design (CRD)
Summary statistics and ANOVA table
n_{1} = n_{2} = n_{3} = n_{4} =10 and \bar{Y}_{1\cdot} = 43.14, \bar{Y}_{2\cdot} = 89.44, \bar{Y}_{3\cdot} = 67.95, \bar{Y}_{4\cdot} = 40.47
Source of Variation | Sum of Squares (SS) | Degrees of Freedom (df) | Mean of Squares (MS) |
Between Treatments | SSTR = 15953.47 | r - 1 = 3 | MST = 5317.82 |
Within Treatments | SSE = 221.03 | n_{T} - r = 36 | MSE = 6.140 |
Total | SSTO = 16174.50 | n_{T} - 1 = 39 |
95% confidence interval for D = \mu_{1} - \mu_{2}
We compute
\hat{D} = \bar{Y}_{1\cdot} - \bar{Y}_{1\cdot} = 43.14 - 89.44 = -46.3
s(\hat{D}) = \sqrt{MSE(1/n_{1} = 1/n_{2})} = \sqrt{6.140(1/10 = 1/10)} = 1.108152
Also, since \alpha = 1 - 0.95 = 0.05, we have t(1-\alpha/2; n_{T} - r) = t(0.975; 36) = 2.028094 (use R command: \textit{qt(0.975, 36)}; or use Table B.2 and approximate the value by averaging the value of the 0.975-th quantile of t - distribution with degrees of freedom v = 30 and 40).
Therefore, the 95% confidence interval for D = \mu_{1} - \mu_{2} is
-46.3 \pm 1.108152 x 2.028094 = -46.3 \pm 2.247436 = (-48.54744, -44.05256)
2 Contrasts
A contrast is a linear combination of the factor level means: L = \sum_{i = 1}^{r}c_{i}\mu_{i} where c_{i}'s are prespecified constants with the constraint: \sum_{i=1}^{r}c_{i} = 0.
- Examples:
- Pairwise comparisons: \mu_{i} - \mu_{j}
- \frac{\mu_{1} = \mu_{2}}{2} - \mu_{3}
- Unbiased estimator:
\hat{L} = \sum_{i = 1}^{r}c_{i}\bar{Y}_{i\cdot}
- Estimated standard error:
s(\hat{L}) = \sqrt{MSE\sum_{i = 1}^{r}\frac{c^{2}_{i}}{n_{i}}}
since Var(\hat{L}) = \sum_{i = 1}^{r}\sigma^{2}c^{2}_{i}/n_{i}.
2.1 Example of a contrast for the package design problem
Suppose, designs one and two are 3-color designs, while designs three and four are 5-color designs. The goal is to compare 3-color designs to 5-color designs in terms of sales.
- Consider the contrast: L = \frac{\mu_{1} + \mu{2}}{2} - \frac{\mu_{3} + \mu_{4}}{2}
- An unbiased point estimation of L is
\hat{L} = \frac{\bar{Y}_{1\cdot} + \bar{Y}_{2\cdot}}{2} - \frac{\bar{Y}_{3\cdot} + \bar{Y}_{4\cdot}}{2}
= \frac{14.6 + 13.4}{2} - \frac{19.5 + 27.2}{2} = -9.35
- c_{1} = c_{2} = 0.5, c_{3} = c_{4} = -0.5 (note that, they add up to zero), so
s(\hat{L}) = \sqrt{MSE\sum_{i =1}^{r}\frac{c^{2}_{i}}{n_{i}}}
= \sqrt{10.55 x (\frac{(0.5)^{2}}{5} + \frac{(0.5)^{2}}{5} + \frac{(-0.5)^{2}}{5} +\frac{(-0.5)^{2}}{5})}
\sqrt{10.55 x 0.2125} = 1.5
- A 90% C.I. for L is
\hat{L} \pm t(0.95; 15) x s(\hat{L})
= -9.35 \pm 1.5 x 1.753 = [-11.98, -6.72]
- Since the 90% for L does not contain zero, we are 90% confident that 5-color designs work better than 3-color designs.
Contributors
- Joy Wei, Debashis Paul