Analysis of Factor Level Means and Contrasts

Last updated
Save as PDF

Page ID: 206

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

1 Analysis of Factor Level Means
Summary statistics and ANOVA table
2 Contrasts
1. 2.1 Example of a contrast for the package design problem
Contributors

1 Analysis of Factor Level Means

Suppose we reject \(H_{0} : \mu_{1} = ... = \mu_{r}\). Then, we want to investigate the nature of the differences amond the factor level means by studying the following:

One factor level mean: \(\mu_{i}\)
Difference between two factor level means: \(D = \mu_{i} - \mu_{j}\)
Contrast of factor level means: \(L = \sum_{i = 1}^{r}c_{i}\mu_{i}\) where \(\sum_{i = 1}^{r}c_{i} = 0\)

When more than one contrasts are involved, we also need to consider procedures that account for multiple comparisons, including:

Bonferroni's procedure
Tukey's procedure
Scheffe's procedure

1.1 Inference about one factor level mean

The \(i\)th factor level sample mean \(\bar{Y}_{i\cdot}\) is a point estimator of \(\mu_{i}\). Here are some properties of this estimator:

\(\bar{Y}_{i\cdot}\) is an unbiased estimator of \(E(\bar{Y}_{i\cdot}) = \mu_{i}\)
\(Var(\bar{Y}_{i\cdot}) = \frac{\sigma^2}{n_{i}}\)

\(MSE = SSE/(n_{T} - r)\) is a point estimator of \(\sigma^2\):

an unbiased estimator: \(E(MSE) = \sigma^2\)
\(SSE \sim \chi^2_{(n_{T} - r)}\) and is independent of \(\{\bar{Y}_{i\cdot} : i = 1, ..., r\}\)

Thus,

the estimated standard error of \(\bar{Y}_{i\cdot}\) is

\[s(\bar{Y}_{i\cdot}) = \sqrt{\frac{MSE}{n_{i}}}\]

\(\frac{\bar{Y}_{i\cdot} - \mu_{i}}{\sqrt{\frac{MSE}{n_{i}}}} \sim t_{(n_{T} - r)}\), i.e. a \(t\)-distribution with \(n_{T} - r\) degrees of freedom
A 100(1-\(\alpha\)%) two sided confidence interval of \(\mu_{i}\) is given by
\[\bar{Y}_{i\cdot} \pm s(\bar{Y}_{i\cdot})t(1 - \frac{\alpha}{2}; n_{T} - r)\]
where \(t(1 - \frac{\alpha}{2}; n_{T} - r)\) denotes the \(1 - \alpha/2\) quantile of the t-distribution with \(n_{T} - r\) degrees of freedom.

Test \(H_{0} : \mu_{i} = c\) against \(H_{a} : \mu_{i} \neq c\)
- T-statistics:

\[T^{*} = \frac{\bar{Y}_{i\cdot} - c}{\sqrt{\frac{MSE}{n_{i}}}}\]

Under \(H_{0} : T^{*} ~ t_{(n_{T} - r)}\)
At significance level \(\alpha\), reject \(H_{0}\) if \(\|T^{*}\| > t(1 - \frac{\alpha}{2}; n_{T} - r)\)
Confidence Interval Approach: If c does not belong to the 100(1-\(\alpha\)%) (two-sided) confidence interval of \(\mu_{i}\), then reject \(H_{0}\) at level \(\alpha\)

1.2 Example

In the "package design" example, the estimate of \(\mu_{1}\) is \(\bar{Y}_{1\cdot} =\) 14.6
MSE = 10.55 and \(n_{1}\) = 5
Thus \(s(\bar{Y}_{1\cdot}) = \sqrt{10.55/5} = 1.45258\)
The degrees of freedom of MSE is 19 - 4 = 15 (since \(n_{T}\) = 19 and r = 4)
The 95% confidence interval of \(\mu_{1}\) is
14.6 \(\pm\) 1.45258 x t(0.975; 15) = 14.6 \(\pm\) 1.45258 x 2.131 = 14.6 \(\pm\) 3.09545 = (11.50455, 17.69545)
Here, from Table B.2, we get that t(0.975, 15) = 2.131 (or, use the R command: \(\textit{qt(0.675, 15)}\))

1.3 Interpretation of confidence intervals

A wrong statement: P(11.51 \(\leq \mu_{1} \leq\) 17.69 = 0.95. Why? Since "11.51 \(\leq \mu_{1} \leq\) 17.69" is true or false as a fact.
Interpretation of C.I.: if exactly the same study on package designs were repeated many times, and at each time a 95% C.I. for \(\mu_{1}\) were constructed as above, then about 95% of the time, the C.I. would contain the true value \(\mu_{1}\).
A correct statement based on the observed data, we are 95% confident that \(\mu_{1}\) is in between 11.51 and 17.69
The difference between a random variable and its realiizations \(\bar{Y}_{1\cdot}\) is a random variable; 14.6 is the realization of \(\bar{Y}_{1\cdot}\) in the current sample

1.4 Difference between two factor level means

Let \(D = \mu_{i} - \mu_{j}\) for some \(i \neq j\)

\(\hat{D} = \bar{Y}_{i\cdot} - \bar{Y}_{j\cdot}\) is an unbiased estimator of D
\(Var(\hat{D}) = Var(\bar{Y}_{i\cdot})\) \(+ Var(\bar{Y}_{j\cdot}) = \sigma^{2}(\frac{1}{n_{i}} + \frac{1}{n_{j}})\) (since \(\bar{Y}_{i\cdot}\) and \(bar{Y}_{j\cdot}\) are independent
estimated standard error of \(\hat{D} = s(\hat{D}) = \sqrt{MSE(1/n_{i} + 1/n_{j}}\)
for every \(\mu_{i}\) and \(\mu_{j}\), the ratio \(frac{\hat{D} - D}{s(\hat{D})}\) has \(t_{(n_{T} - r)}\) distribution

1.5 Inference on the difference between two factor level means

100(1 - \(\alpha\))% (two-sided) confidence interval of D

\[\hat{D} \pm s(\hat{D})t(1 - \frac{\alpha}{2}; n_{T} - r)\]

Test \(H_{0}: D = 0\) against \(H_{a}: D \neq 0\). At the significance level \(\alpha\), check whether
\[\hat{D} - s(\hat{D})t(1 - \frac{\alpha}{2}; n_{T} - r) \leq 0 \leq \hat{D} + s(\hat{D})t(1 - \frac{\alpha}{2}; n_{T} - r)\]
If not, reject \(H_{0}\) at level \(\alpha\) and conclude that the two means are different.

1.6 Example

In a study of the effectiveness of different rust inhibitors, four brands (1, 2, 3, 4) were tested. Altogether, 40 experimental units were randomly assigned to the four brands, with 10 units assigned to each brand. The resistance to rust was evaluated in a coded form after exposing the experimental units to severe conditions.

This is a balanced complete randomized design (CRD)

Summary statistics and ANOVA table

\(n_{1} = n_{2} = n_{3} = n_{4} =10\) and \(\bar{Y}_{1\cdot} = 43.14, \bar{Y}_{2\cdot} = 89.44, \bar{Y}_{3\cdot} = 67.95, \bar{Y}_{4\cdot} = 40.47\)

Source of Variation	Sum of Squares (SS)	Degrees of Freedom (df)	Mean of Squares (MS)
Between Treatments	SSTR = 15953.47	r - 1 = 3	MST = 5317.82
Within Treatments	SSE = 221.03	\(n_{T} - r\) = 36	MSE = 6.140
Total	SSTO = 16174.50	\(n_{T} - 1\) = 39

95% confidence interval for \(D = \mu_{1} - \mu_{2}\)

We compute

\[\hat{D} = \bar{Y}_{1\cdot} - \bar{Y}_{1\cdot} = 43.14 - 89.44 = -46.3\]

\[s(\hat{D}) = \sqrt{MSE(1/n_{1} = 1/n_{2})} = \sqrt{6.140(1/10 = 1/10)} = 1.108152\]

Also, since \(\alpha\) = 1 - 0.95 = 0.05, we have \(t(1-\alpha/2; n_{T} - r) = t(0.975; 36) = 2.028094\) (use R command: \(\textit{qt(0.975, 36)}\); or use Table B.2 and approximate the value by averaging the value of the 0.975-th quantile of t - distribution with degrees of freedom v = 30 and 40).
Therefore, the 95% confidence interval for \(D = \mu_{1} - \mu_{2}\) is

\[-46.3 \pm 1.108152 x 2.028094 = -46.3 \pm 2.247436 = (-48.54744, -44.05256)\]

2 Contrasts

A contrast is a linear combination of the factor level means: \(L = \sum_{i = 1}^{r}c_{i}\mu_{i}\) where \(c_{i}\)'s are prespecified constants with the constraint: \(\sum_{i=1}^{r}c_{i} = 0\).

Examples:

- Pairwise comparisons: \(\mu_{i} - \mu_{j}\)

- \(\frac{\mu_{1} = \mu_{2}}{2} - \mu_{3}\)

Unbiased estimator:

\[\hat{L} = \sum_{i = 1}^{r}c_{i}\bar{Y}_{i\cdot}\]

Estimated standard error:

\[s(\hat{L}) = \sqrt{MSE\sum_{i = 1}^{r}\frac{c^{2}_{i}}{n_{i}}}\]
since \(Var(\hat{L}) = \sum_{i = 1}^{r}\sigma^{2}c^{2}_{i}/n_{i}\).

2.1 Example of a contrast for the package design problem

Suppose, designs one and two are 3-color designs, while designs three and four are 5-color designs. The goal is to compare 3-color designs to 5-color designs in terms of sales.

Consider the contrast: \(L = \frac{\mu_{1} + \mu{2}}{2} - \frac{\mu_{3} + \mu_{4}}{2}\)
An unbiased point estimation of L is

\[\hat{L} = \frac{\bar{Y}_{1\cdot} + \bar{Y}_{2\cdot}}{2} - \frac{\bar{Y}_{3\cdot} + \bar{Y}_{4\cdot}}{2}\]
\[= \frac{14.6 + 13.4}{2} - \frac{19.5 + 27.2}{2} = -9.35\]

\(c_{1} = c_{2} = 0.5, c_{3} = c_{4} = -0.5\) (note that, they add up to zero), so

\[s(\hat{L}) = \sqrt{MSE\sum_{i =1}^{r}\frac{c^{2}_{i}}{n_{i}}}\]
\[= \sqrt{10.55 x (\frac{(0.5)^{2}}{5} + \frac{(0.5)^{2}}{5} + \frac{(-0.5)^{2}}{5} +\frac{(-0.5)^{2}}{5})}\]
\[\sqrt{10.55 x 0.2125} = 1.5\]

A 90% C.I. for L is

\[\hat{L} \pm t(0.95; 15) x s(\hat{L})\]
\[= -9.35 \pm 1.5 x 1.753 = [-11.98, -6.72]\]

Since the 90% for L does not contain zero, we are 90% confident that 5-color designs work better than 3-color designs.

Contributors

Joy Wei, Debashis Paul