# Two-Factor ANOVA model with n = 1 (no replication)

[ "article:topic", "authorname:pauld" ]

### 1. Two-factor ANOVA model with n = 1 (no replication)

• For some studies, there is only one replicate per treatment, i.e., n = 1.
• ANOVA model for two-factor studies need to be modified, since
- the degrees of freedom associated with $$SSE$$ will be $$(n - 1)ab = 0$$;
- thus the error variance $$\sigma^2$$ can not be estimated by $$SSE$$ anymore.
• Idea: make the model simpler by assuming the two factors do not interact with each other. Validity of this assumption needs to be checked.

#### 1.1 Two-factor model without interaction

With n = 1.

• Model equation:

$Y_{ij} = \mu_{..} + \alpha_i + \beta_j + \epsilon_{ij}, i = 1, ..., a, j = 1, ..., b.$

• Identifiability constraints:

$\sum_{i=1}^{a}\alpha_i = 0, \sum_{j=1}^{b}\beta_j = 0.$

• Distributional assumptions: $$\epsilon_{ij}$$ are i.i.d. $$N(0,\sigma^2)$$
##### Sum of squares

Interaction sum of squares now plays the role of error sum of squares.

$SSAB = n\sum_{i=1}^{a} \sum_{j=1}^{b}(\overline{Y}_{ij.} - \overline{Y}_{i..} - \overline{Y}_{.j.} + \overline{Y}_{...})^2 = \sum_{i=1}^{a} \sum_{j=1}^{b}(\overline{Y}_{ij} - \overline{Y}_{i.} - \overline{Y}_{.j} + \overline{Y}_{..})^2$

$$MSAB = \frac{SSAB}{(a-1)(b-1)}$$ since $$d.f.(SSAB) = (a-1)(b-1)$$.

• In the general two-factor ANOVA model (when n = 1),

$E(MSAB) = \sigma^2 + \frac{\sum_{i=1}^{a}\sum_{j=1}^{b} (\alpha\beta)^2_{ij}}{(a-1)(b-1)}$

• Under the model without interaction: $$E(MSAB) = \sigma^2$$
• Thus $$MSAB$$ can be used to estimate $$\sigma^2$$.
##### ANOVA Table

ANOVA table for two-factor model without interaction and $$n=1$$

 Source of Variation SS df MS Factor A $$SSA = b\sum_i(\overline{Y}_{i.} - \overline{Y}_{..})^2$$ $$a - 1$$ $$MSA$$ Factor B $$SSB = a\sum_j(\overline{Y}_{.j} - \overline{Y}_{..})^2$$ $$b - 1$$ $$MSB$$ Error $$SSAB = \sum_{i=1}^{a}\sum_{j=1}^{b}(\overline{Y}_{ij} - \overline{Y}_{i.} - \overline{Y}_{.j} + \overline{Y}_{..})^2$$ $$(a - 1)(b - 1)$$ $$MSAB$$ Total $$SSTO = \sum_{i=1}^{a}\sum_{j=1}^{b}(\overline{Y}_{ij} - \overline{Y}_{..})^2$$ $$ab - 1$$

Expected mean squares (under no interaction):

$E(MSA) = \sigma^2 + \frac{b\sum_{i=1}^{a}\alpha_i^2}{a - 1}, E(MSB) = \sigma^2 + \frac{a\sum_{j=1}^{b}\beta_j^2}{b - 1}, E(MSAB) = \sigma^2$

##### F tests (for main effects)

Test factor A main effects: $$H_o: \alpha_1 = ... = \alpha_a = 0$$ vs. $$H_a:$$ not all $$\alpha_i$$'s are equal to zero.

• $$F_A^* = \frac{MSA}{MSAB} ~ F_{a - 1, (a - 1)(b - 1)}$$ under $$H_o$$.
• Reject $$H_o$$ at level of significance $$\alpha$$ if observed $$F_A^* > F(1 - \alpha; a - 1, (a - 1)(b - 1))$$.

Test factor B main effects: $$H_o: \beta_1 = ... = \beta_b = 0$$ vs. $$H_a:$$ not all $$\beta_j$$'s are equal to zero.

• $$F_B^* = \frac{MSB}{MSAB} ~ F_{b - 1, (a - 1)(b - 1)}$$ under $$H_o$$.
• Reject $$H_o$$ at level of significance $$\alpha$$ if observed $$F_B^* > F(1 - \alpha; b - 1, (a - 1)(b - 1))$$.
##### Estimation of means

Estimation of factor level means $$\mu_{i.}$$'s , $$\mu_{.j}$$'s.

• Proceed as before, viz., use the unbiased estimator $$\overline{Y}_{i.}$$ for $$\mu_{i.}$$ and $$\overline{Y}_{.j}$$ for $$\mu_{.j}$$, but replace $$MSE$$ by $$MSAB$$ and use the degrees of freedom of $$MSAB$$, that is $$(a - 1)(b - 1)$$. Thus, estimated standard errors:

$s(\overline{Y}_{i.}) = \sqrt{\frac{MSAB}{b}}, s(\overline{Y}_{.j}) = \sqrt{\frac{MSAB}{a}}.$

Estimation of treatment means $$\mu_{ij}$$'s.

• $$\mu_{ij} = E(Y_{ij}) = \mu_{..} + \alpha_i + \beta_j = \mu_{i.} + \mu_{.j} - \mu_{..}$$
Thus, an unbiased estimator: $$\widehat{\mu}_{ij} = \overline{Y}_{i.} + \overline{Y}_{.j} - \overline{Y}_{..}$$
Estimated standard error:

$s(\widehat{\mu}_{ij}) = \sqrt{MSAB(\frac{1}{b} + \frac{1}{a} - \frac{1}{ab})} = \sqrt{MSAB(\frac{a + b - 1}{ab})}$

#### 1.2 Example: Insurance

An analyst studied the premium for auto insurance charged by an insurance company in six cities. The six cities were selected to represent different sizes (Factor A: small, medium, large) and differentregions of the state (Factor B: east, west). There is only one city for each combination of size and region. The amounts of premiums charged for a specific type of coverage in a given risk category for each of the six cities are given in the following table.

Table 1: Numbers in parentheses are $$\widehat{\mu}_{ij} = \overline{Y}_{i.} + \overline{Y}_{.j} - \overline{Y}_{..}$$

 Factor B East West Factor A Small     140(135) Medium  210(210) Large      220(225) 100(105) 180(180) 200(195) $$\overline{Y}_{1.} = 120$$ $$\overline{Y}_{2.} = 195$$ $$\overline{Y}_{3.} = 210$$ $$\overline{Y}_{.1} = 190$$ $$\overline{Y}_{.2} = 160$$ $$\overline{Y}_{..} = 175$$

Interaction plot based on the treatment sample means $$Y_{ij}$$'s: no strong interactions.

##### Sum of squares:
• Here $$a = 3$$, $$b = 2$$, $$n = 1$$.
• $$SSA = 2[(120 - 175)^2 + (195 - 175)^2 + (210 - 175)^2] = 9300.$$.
• $$SSB = 3[(190 - 175)^2 + (160 - 175)^2] = 1350$$.
• $$SSAB = (140 - 120 - 190 + 175)^2 + ... + (200 - 210 - 160 + 175)^2 = 100$$.
• $$SSTO = SSA + SSB + SSAB = 10750$$.
Hypothesis testing:
• Test $$H_o: \mu_{1.} = \mu_{2.} = \mu_{3.}$$ (equivalently, $$H_o: \alpha_1 = \alpha_2 = \alpha_3 = 0$$) at level 0.05.
 Source of Variation SS df MS Factor A $$SSA = 9300$$ $$a - 1 = 2$$ $$MSA = 4650$$ Factor B $$SSB = 1350$$ $$b - 1 = 1$$ $$MSB = 1350$$ Error $$SSAB = 100$$ $$(a - 1)(b - 1) = 2$$ $$MSAB = 50$$ Total $$SSTO = 10750$$ $$ab - 1 = 5$$

$$F_A^* = \frac{MSA}{MSAB} = \frac{4650}{50} = 93$$ and $$F(0.95; 2, 2) = 19$$. Thus reject $$H_o$$ at level 0.05.

• Estimation of $$\mu_{ij}$$: e.g.,
$$\widehat{\mu}_{11} = \overline{Y}_{1.} + \overline{Y}_{.1} - \overline{Y}_{..} = 120 + 190 - 175 = 135$$.
• Estimation of $$\mu_{i.}$$ and $$\mu_{.j}$$: e.g.,
$$\widehat{\mu}_{1.} = \overline{Y}_{1.} = 120$$.
$$s(\overline{Y}_{1.}) = \sqrt{\frac{MSAB}{b}} = \sqrt{\frac{50}{2}} = 5$$.
The 95% C.I. for $$\mu_{1.}$$ is:
$\overline{Y}_{1.} \pm t(0.975; 2) * s(\overline{Y}_{1.}) = 120 \pm 4.3*5 = (98.5, 141.5).$

#### 1.3 Checking for the presence of interaction: Tukey's test for additivity

For a two-factor study with $$n = 1$$, decide whether or not the two factors are interacting.

• In the no-interaction model, we assume that all $$(\alpha\beta)_{ij} = 0$$.
• Idea: use a less severe restriction on the interaction effects, by assuming
$(\alpha\beta)_{ij} = D\alpha_i\beta_j, i = 1, ... , a, j = 1, ... , b,$
where $$D$$ is an unknown parameter.
• The model becomes:
$Y_{ij} = \mu_{..} + \alpha_i + \beta_j + D\alpha_i\beta_j + \epsilon_{ij}, i = 1, ... , a, j = 1, ..., b,$
under the constraints that
$\sum_{i = 1}^{a}\alpha_i = \sum_{j = 1}^{b}\beta_j = 0.$
##### Estimation of $$D$$
• Multiply $$\alpha_i\beta_j$$ on both sides of the equation:
$\alpha_i\beta_jY_{ij} = \mu_{..}\alpha_i\beta_j + \alpha_i^2\beta_j + \alpha_i\beta_j^2 + D\alpha_i^2\beta_j^2 + \epsilon_{ij}\alpha_i\beta_j$
• Sum over all pairs (i, j):
$\sum_{i=1}^{a}\sum_{j=1}^{b}\alpha_i\beta_jY_{ij}=D\sum_{i=1}^{a}\sum_{j=1}^{b}\alpha_i^2\beta_j^2 + \sum_{i=1}^{a}\sum_{j=1}^{b}\epsilon_{ij}\alpha_i\beta_j$
• Then
$\widetilde{D} := \frac{\sum_{i=1}^{a}\sum_{j=1}^{b}\alpha_i\beta_jY_{ij}}{(\sum_{i=1}^{a}\alpha_i^2)(\sum_{j=1}^{b}\beta_j^2)} \approx D$
• We have the following estimates:
$\widehat{\alpha}_i = \overline{Y}_{i.} - \overline{Y}_{..}, \widehat{\beta}_j = \overline{Y}_{.j} - \overline{Y}_{..}$
• Thus, an estimator of $$D$$ (which is also the least squares and the maximum likelihood estimator) is given by
$\widehat{D} = \frac{\sum_{i=1}^{a}\sum_{j=1}^b ( \overline{Y}_{i.} - \overline{Y}_{..})( \overline{Y}_{.j} - \overline{Y}_{..})Y_{ij}}{(\sum_{i=1}^{a}( \overline{Y}_{i.} - \overline{Y}_{..})^2)(\sum_{j=1}^{b}( \overline{Y}_{.j} - \overline{Y}_{..})^2)}.$

ANOVA decomposition

$SSTO = SSA + SSB + SSAB* + SSRem*.$

• Interaction sum of squares
$SSAB* = \sum_{i=1}^{a}\sum_{j=1}^{b}\widehat{D}^2\widehat{\alpha}_i^2\widehat{\beta}_j^2 = \frac{(\sum_{i=1}^{a}\sum_{j=1}^b ( \overline{Y}_{i.} - \overline{Y}_{..})( \overline{Y}_{.j} - \overline{Y}_{..})Y_{ij})^2}{(\sum_{i=1}^{a}( \overline{Y}_{i.} - \overline{Y}_{..})^2)(\sum_{j=1}^{b}( \overline{Y}_{.j} - \overline{Y}_{..})^2)}$
• Remainder sum of squares
$SSREM^* = SSTO - SSA - SSB - SSAB^*$
• Decomposition of degrees of freedom
$df(SSTO) = df(SSA) + df(SSB) + df(SSAB^*) + df(SSRem^*)$
$ab - 1 = (a - 1) + (b - 1) + 1 + (ab - a - b)$
• Tukey's one degree of freedom test for additivity: $$H_o: D = 0$$ (i.e., no interaction) vs. $$H_a: D \neq 0$$.
• $$F$$ ratio $$F_{Tukey}^{*} = \frac{SSAB^*/1}{SSRem^*/(ab - a - b)}\sim F_{1, ab - a - b}$$ under $$H_o$$.
• Decision rule: reject $$H_o: D = 0$$ at level of significance $$\alpha$$ if $$F_{Tukey}^{*} > F(1 - \alpha; 1, ab - a - b)$$.

Example: Insurance

• $$\sum_{ij}(\overline{Y}_{i.} - \overline{Y}_{..})( \overline{Y}_{.j} - \overline{Y}_{..})Y_{ij} = -13500.$$
• $$\sum_{i=1}^{a}( \overline{Y}_{i.} - \overline{Y}_{..})^2 = 4650$$, and $$\sum_{j=1}^{b}( \overline{Y}_{.j} - \overline{Y}_{..})^2 = 450.$$
• $$SSAB^* = \frac{(-13500)^2}{4650 * 450} = 87.1.$$
• $$SSRem^* = 10750 - 9300 - 1350 - 87.1 = 12.9.$$
• $$ab - a - b = 3*2 - 3 - 2 = 1.$$
• $$F$$-ratio for Tukey's test:
$F_{Tukey}^{*} = \frac{SSAB^*/1}{SSRem^*/1} = \frac{87.1}{12.9} = 6.8.$
• When $$\alpha = 0.05, F(0.95; 1, 1) = 161.4 > 6.8.$$
• Thus, we can not reject $$H_o: D = 0$$ at the 0.05 level, and we conclude that there is no significant interaction between the two factors.
• Indeed, the p-value is $$p = P(F_{1,1} > 6.8) = 0.23$$ which is not at all significant.

### Contributors

• Scott Brunstein (UCD)
• Debashis Paul (UCD)