8.7: Overall F-test in multiple linear regression

Last updated
Save as PDF

Page ID: 33297

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

In the MLR summary, there is an \(F\)-test and p-value reported at the bottom of the output. For the model with Elevation and Maximum Temperature, the last row of the model summary is:

## F-statistic: 56.43 on 2 and 20 DF, p-value: 5.979e-09

This test is called the overall F-test in MLR and is very similar to the \(F\)-test in a reference-coded One-Way ANOVA model. It tests the null hypothesis that involves setting every coefficient except the \(y\)-intercept to 0 (so all the slope coefficients equal 0). We saw this reduced model in the One-Way material when we considered setting all the deviations from the baseline group to 0 under the null hypothesis. We can frame this as a comparison between a full and reduced model as follows:

Full Model: \(y_i = \beta_0 + \beta_1x_{1i} + \beta_2x_{2i}+\cdots + \beta_Kx_{Ki}+\varepsilon_i\)
Reduced Model: \(y_i = \beta_0 + 0x_{1i} + 0x_{2i}+\cdots + 0x_{Ki}+\varepsilon_i\)

The reduced model estimates the same values for all \(y\text{'s}\), \(\widehat{y}_i = \bar{y} = b_0\) and corresponds to the null hypothesis of:

\(\boldsymbol{H_0:}\) No explanatory variables should be included in the model: \(\beta_1 = \beta_2 = \cdots = \beta_K = 0\).

The full model corresponds to the alternative:

\(\boldsymbol{H_A:}\) At least one explanatory variable should be included in the model: Not all \(\beta_k\text{'s} = 0\) for \((k = 1,\ldots,K)\).

Note that \(\beta_0\) is not set to 0 in the reduced model (under the null hypothesis) – it becomes the true mean of \(y\) for all values of the \(x\text{'s}\) since all the predictors are multiplied by coefficients of 0.

The test statistic to assess these hypotheses is \(F = \text{MS}_{\text{model}}/\text{MS}_E\), which is assumed to follow an \(F\)-distribution with \(K\) numerator df and \(n-K-1\) denominator df under the null hypothesis. The output provides us with \(F(2, 20) = 56.43\) and a p-value of \(5.979*10^{-9}\) (p-value \(<0.00001\)) and strong evidence against the null hypothesis. Thus, there is strong evidence against the null hypothesis that the true slopes for the two predictors are 0 and so we would conclude that at least one of the two slope coefficients (Max.Temp’s or Elevation’s) is different from 0 in the population of SNOTEL sites in Montana on this date. While this test is a little bit interesting and a good indicator of something interesting existing in the model, the moment you see this result, you want to know more about each predictor variable. If neither predictor variable is important, we will discover that in the \(t\)-tests for each coefficient and so our general recommendation is to start there.

The overall F-test, then, is really about testing whether there is something good in the model somewhere. And that certainly is important but it is also not too informative. There is one situation where this test is really interesting, when there is only one predictor variable in the model (SLR). In that situation, this test provides exactly the same p-value as the \(t\)-test. \(F\)-tests will be important when we are mixing categorical and quantitative predictor variables in our MLR models (Section 8.12), but the overall \(F\)-test is of very limited utility.