# 3.1: The Model

$$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$

$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$

( \newcommand{\kernel}{\mathrm{null}\,}\) $$\newcommand{\range}{\mathrm{range}\,}$$

$$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$

$$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}[1]{\| #1 \|}$$

$$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$$

$$\newcommand{\Span}{\mathrm{span}}$$

$$\newcommand{\id}{\mathrm{id}}$$

$$\newcommand{\Span}{\mathrm{span}}$$

$$\newcommand{\kernel}{\mathrm{null}\,}$$

$$\newcommand{\range}{\mathrm{range}\,}$$

$$\newcommand{\RealPart}{\mathrm{Re}}$$

$$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$

$$\newcommand{\Argument}{\mathrm{Arg}}$$

$$\newcommand{\norm}[1]{\| #1 \|}$$

$$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$$

$$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\AA}{\unicode[.8,0]{x212B}}$$

$$\newcommand{\vectorA}[1]{\vec{#1}} % arrow$$

$$\newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$$

$$\newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vectorC}[1]{\textbf{#1}}$$

$$\newcommand{\vectorD}[1]{\overrightarrow{#1}}$$

$$\newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}}$$

$$\newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}}$$

$$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$

The effects model for one way ANOVA is a linear additive statistical model which relates the response to the treatment and can be expressed as $Y_{ij} = \mu + \tau_{i} + \epsilon_{ij}$

where $$\mu$$ is the grand mean, $$\tau_{i} \ (i = 1,2, \ldots,T)$$ are the deviations from the grand mean due to the treatment levels and $$\epsilon_{ij}$$ are the error terms. The quantities $$\tau_{i} \ (i = 1,2, \ldots, T)$$ which add to zero, are also referred to as the treatment level effects and the errors show the amount "left over" after considering the grand mean and the effect of being in a particular treatment level.

Here’s the analogy in terms of the greenhouse experiment. Think of someone who is not aware that different fertilizers have been used walking into the greenhouse to simply inquire about plant heights in general. The overall sample mean, an estimate of the grand mean, will be a suitable response to this inquiry. On the other hand, the overall mean would not be satisfactory to the experimenter of the study, who obviously suspects that there will be height differences among different fertilizer types. Instead, what is more acceptable to the experimenter are the plant height estimates after including the effect of the treatment $$\tau_{i}$$.

##### Note

The actual plant height can never be known because there is an unknown measurement error associated with any observation. This unknown error is associated with the ith treatment level, and the jth observation is denoted $$\epsilon_{ij} \ (i = 1, 2, \ldots, T, \ j = 1, 2, \ldots, n_{i})$$ is a random component (noise) that reflects the unexplained variability among plants within treatment levels.

Under the null hypothesis where the treatment effect is zero, the reduced model can be written $$Y_{ij} = \mu + \epsilon_{ij}$$.

Under the alternative hypothesis, where the treatment effects are not zero, the full model for at least one treatment level can be written $$Y_{ij} = \mu + \tau_{i} + \epsilon_{ij}$$.

If $$SSE(R)$$ denotes the error sums of squares associated with the reduced model and $$SSE(F)$$ denotes the error sums of squares associated with the full model, we can utilize the General Linear Test approach to test the null hypothesis by using the test statistic: $F = \frac{\left(\dfrac{SSE(R) - SSE(F)}{df_{R} - df_{F}} \right)}{\left(\dfrac{SSE(F)}{df_{F}}\right)}$

which under the null hypothesis has an $$F$$ distribution with the numerator and denominator degrees of freedom equal to $$df_{R}-df_{F}$$ and $$df_{F}$$ respectively, where $$df_{R}$$ is the degrees of freedom associated with $$SSE(R)$$ and $$df_{F}$$ is the degree of freedom associated with $$SSE(F)$$. It is easy to see that $$df_{R}=N-1$$ and $$df_{F}=N-T$$ where $$N = \sum_{i=1}^{N} n_{i}$$. Also, $SSE(R) = \sum_{i} \sum_{j} \left(Y_{ij} - \bar{Y}_{..}\right)^{2} = SS_{Total} \quad \text{See Section 2.2}$

Therefore, \begin{align} F &= \frac{\left(\dfrac{SS_{Total} - SSE}{T-1}\right)}{\left(\dfrac{SSE}{df_{Error}}\right)} \\[4pt] &= \frac{\left(\dfrac{SS_{Treatment}}{df_{Treatment}}\right)}{\left(\dfrac{SSE}{df_{Error}}\right)} \\[4pt] &= \frac{MS_{Trt}}{MSE} \end{align}

Note that this is the same test statistic derived in Section 2.2 for testing the treatment significance. If the null hypothesis is true, then the treatment effect is not significant. If we reject the null hypothesis, then we conclude that the treatment effect is significant, which leads to the conclusion that at least one treatment level is better than the others!

This page titled 3.1: The Model is shared under a CC BY-NC 4.0 license and was authored, remixed, and/or curated by Penn State's Department of Statistics.