# 16.5: The F test as a model comparison

- Page ID
- 4046

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)At this point, I want to talk in a little more detail about what the F-tests in an ANOVA are actually doing. In the context of ANOVA, I’ve been referring to the F-test as a way of testing whether a particular term in the model (e.g., main effect of Factor A) is significant. This interpretation is perfectly valid, but it’s not necessarily the most useful way to think about the test. In fact, it’s actually a fairly limiting way of thinking about what the F-test does. Consider the clinical trial data we’ve been working with in this chapter. Suppose I want to see if there are * any* effects of any kind that involve

`therapy`

. I’m not fussy: I don’t care if it’s a main effect or an interaction effect.^{238}One thing I could do is look at the output for

`model.3`

earlier: in this model we did see a main effect of therapy (p=.013) but we did not see an interaction effect (p=.125). That’s kind of telling us what we want to know, but it’s not quite the same thing. What we really want is a single test that *checks the main effect of therapy and the interaction effect.*

*jointly*Given the way that I’ve been describing the ANOVA F-test up to this point, you’d be tempted to think that this isn’t possible. On the other hand, if you recall the chapter on regression (in Section 15.10), we were able to use F-tests to make comparisons between a wide variety of regression models. Perhaps something of that sort is possible with ANOVA? And of course, the answer here is yes. The thing that you really need to understand is that the F-test, as it is used in both ANOVA and regression, is really a comparison of * two* statistical models. One of these models is the full model (alternative hypothesis), and the other model is a simpler model that is missing one or more of the terms that the full model includes (null hypothesis). The null model cannot contain any terms that are not in the full model. In the example I gave above, the full model is

`model.3`

, and it contains a main effect for therapy, a main effect for drug, and the drug by therapy interaction term. The null model would be `model.1`

since it contains only the main effect of drug.# test comparing two models

Let’s frame this in a slightly more abstract way. We’ll say that our full model can be written as an R formula that contains several different terms, say `Y ~ A + B + C + D`

. Our null model only contains some subset of these terms, say `Y ~ A + B`

. Some of these terms might be main effect terms, others might be interaction terms. It really doesn’t matter. The only thing that matters here is that we want to treat some of these terms as the “starting point” (i.e. the terms in the null model, `A`

and `B`

), and we want to see if including the other terms (i.e., `C`

and `D`

) leads to a significant improvement in model performance, over and above what could be achieved by a model that includes only `A`

and `B`

. In essence, we have null and alternative hypotheses that look like this:

Hypothesis | Correct model? | R formula for correct model |
---|---|---|

Null | M0 | `Y ~ A + B` |

Alternative | M1 | `Y ~ A + B + C + D` |

Is there a way of making this comparison directly?

To answer this, let’s go back to fundamentals. As we saw in Chapter 14, the F-test is constructed from two kinds of quantity: sums of squares (SS) and degrees of freedom (df). These two things define a mean square value (MS = SS/df), and we obtain our F statistic by contrasting the MS value associated with “the thing we’re interested in” (the model) with the MS value associated with “everything else” (the residuals). What we want to do is figure out how to talk about the SS value that is associated with the * difference* between two models. It’s actually not all that hard to do.

Let’s start with the fundamental rule that we used throughout the chapter on regression:

That is, the total sums of squares (i.e., the overall variability of the outcome variable) can be decomposed into two parts: the variability associated with the model SS_{M}, and the residual or leftover variability, SS_{R}. However, it’s kind of useful to rearrange this equation slightly, and say that the SS value associated with a model is defined like this…

SS_{M}=SS_{T}−SS_{R}

Now, in our scenario, we have two models: the null model (M0) and the full model (M1):

SS_{M0}=SS_{T}−SS_{R0}

SS_{M1}=SS_{T}−SS_{R1}

Next, let’s think about what it is we * actually* care about here. What we’re interested in is the

*between the full model and the null model. So, if we want to preserve the idea that what we’re doing is an “analysis of the variance” (ANOVA) in the outcome variable, what we should do is define the SS associated with the difference to be equal to the difference in the SS:*

*difference*\(\begin{aligned} S S_{\Delta} &=\mathrm{SS}_{M 1}-\mathrm{SS}_{M 0} \\ &=\left(\mathrm{SS}_{T}-\mathrm{SS}_{R 1}\right)-\left(\mathrm{SS}_{T}-\mathrm{SS}_{R 0}\right) \\ &=\mathrm{SS}_{R 0}-\mathrm{SS}_{R 1} \end{aligned}\)

Now that we have our degrees of freedom, we can calculate mean squares and F values in the usual way. Specifically, we’re interested in the mean square for the difference between models, and the mean square for the residuals associated with the * full* model (M1), which are given by

\(\begin{aligned} M S_{\Delta} &=\dfrac{\mathrm{SS}_{\Delta}}{\mathrm{df}_{\Delta}} \\ \mathrm{MS}_{R 1} &=\dfrac{\mathrm{SS}_{R 1}}{\mathrm{df}_{R 1}} \end{aligned}\)

Finally, taking the ratio of these two gives us our F statistic:

\(\ F=\dfrac{MS_{\Delta}}{MS_{R1}}\)

### Running the test in R

At this point, it may help to go back to our concrete example. The null model here is `model.1`

, which stipulates that there is a main effect of drug, but no other effects exist. We expressed this via the model formula `mood.gain ~ drug`

. The alternative model here is `model.3`

, which stipulates that there is a main effect of drug, a main effect of therapy, * and* an interaction. If we express this in the “long” format, this model corresponds to the formula

`mood.gain ~ drug + therapy + drug:therapy`

, though we often express this using the `*`

shorthand. The key thing here is that if we compare `model.1`

to `model.3`

, we’re lumping the main effect of therapy and the interaction term together. Running this test in R is straightforward: we just input both models to the `anova()`

function, and it will run the exact F-test that I outlined above.**anova**( model.1, model.3 )

```
## Analysis of Variance Table
##
## Model 1: mood.gain ~ drug
## Model 2: mood.gain ~ drug * therapy
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 15 1.39167
## 2 12 0.65333 3 0.73833 4.5204 0.02424 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Let’s see if we can reproduce this F-test ourselves. Firstly, if you go back and look at the ANOVA tables that we printed out for `model.1`

and `model.3`

you can reassure yourself that the RSS values printed in this table really do correspond to the residual sum of squares associated with these two models. So let’s type them in as variables:

```
ss.res.null <- 1.392
ss.res.full <- 0.653
```

Now, following the procedure that I described above, we will say that the “between model” sum of squares, is the difference between these two residual sum of squares values. So, if we do the subtraction, we discover that the sum of squares associated with those terms that appear in the full model but not the null model is:

```
ss.diff <- ss.res.null - ss.res.full
ss.diff
```

`## [1] 0.739`

Right. Next, as always we need to convert these SS values into MS (mean square) values, which we do by dividing by the degrees of freedom. The degrees of freedom associated with the full-model residuals hasn’t changed from our original ANOVA for `model.3`

: it’s the total sample size N, minus the total number of groups G that are relevant to the model. We have 18 people in the trial and 6 possible groups (i.e., 2 therapies × 3 drugs), so the degrees of freedom here is 12. The degrees of freedom for the null model are calculated similarly. The only difference here is that there are only 3 relevant groups (i.e., 3 drugs), so the degrees of freedom here is 15. And, because the degrees of freedom associated with the difference is equal to the difference in the two degrees of freedom, we arrive at the conclusion that we have 15−12=3 degrees of freedom. Now that we know the degrees of freedom, we can calculate our MS values:

```
ms.res <- ss.res.full / 12
ms.diff <- ss.diff / 3
```

Okay, now that we have our two MS values, we can divide one by the other, and obtain an F-statistic …

```
F.stat <- ms.diff / ms.res
F.stat
```

`## [1] 4.526799`

… and, just as we had hoped, this turns out to be identical to the F-statistic that the `anova()`

function produced earlier.