Skip to main content
Statistics LibreTexts

12.3: - Interpreting the F-test and Post-Hoc Comparisons

  • Page ID
    57593
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    All the F-test will tell you is that something is happening. But it will not tell you exactly what is happening. If you have a significant F-test, you know there are differences between the groups, but do not know which groups are responsible for the significant F-test.

    At this point, we introduce post–hoc comparisons. Post-hoc means after the fact. We only do the post-hoc tests after we have a significant F-test, or for any significant overall statistical result. Statistical tests such as the F-test are overall tests; they indicate that somewhere among all of your groups or variables, there is a significant effect. Consider post-hoc tests as specific tests that determine where the significance is. For an ANOVA, the post-hoc comparisons indicate where the effect is, or which groups are different. If the F-test is not significant, there is no need to do the post-hoc comparisons.

    The purpose of the post-hoc is to determine which two groups are significant. You can only compare two groups' means at a time. When you compare two groups at a time, you use a post-hoc test, which, in this case, is actually a version of the t-test. Recall that a t-test examines the means between two groups. Hence, we use a t-test here to compare the means among all the groups, one pair of groups at a time.

    Why do we use post-hoc t-tests? Why not use a series of t-tests? We use the post-hoc t-test instead of the regular, or independent, t-test because we need the post-hoc test to adjust for the Type I error, or account for the experimenter-wise error rate.

    Recall that a Type I error is when you find a significant effect, but in reality, you find a null effect or nothing. In what situations would you encounter a Type I error? One situation is what we call “fishing. Recall that in fishing, you continually cast your fishing line into the lake, over and over again. If you keep fishing, eventually you will catch something. But catching something after repeated tries does not mean that you can claim to be a fisherman, which is a significant finding, because in reality, you really are not a fisherman, and catching the fish was just a random lucky effect.

    Where is the “fishing” in this situation? Conducting multiple t-tests is the same as casting your fishing line into the lake over and over again. Eventually, you will find a significant effect because you kept running t-tests after t-tests. Something will happen eventually. But that does not mean you found a significant effect. You got lucky because you kept running test after test. It is like dating. If you keep asking someone out on a date, eventually they will say yes (at least in theory, I have never found that to be the case, sigh).

    Instead of just multiple t-tests and the basic independent t-tests, statisticians devised a series of post-hoc t-tests to adjust the p-value for the experimenter-wise error, or the increased likelihood of committing a Type I error due to conducting several t-tests. The common ones you will see associated with an F-test are Tukey, Sidak, and Bonferroni. These t-tests were devised to handle anomalies across the groups, such as unequal sample sizes or different distributions across the groups. The mathematical differences among the t-tests are beyond the scope of this textbook. One way to use these t-tests is to select them as your post-hoc tests, and if all three post-hoc tests give you the same answer, then you have confidence in your results. If one says significance and the other two do not, or two say significance, and one does not, then consult with a statistician, because something is off.

    Once you get a significant F-test, all you can say is that something is significant. Then, you proceed to interpret the post-hoc tests to determine which group is significantly higher or lower than the other.

    Sometimes, not all groups might be different; sometimes, only one group is causing the F-test to be significant. It is possible to have a significant F-test, without any significant post-hoc tests. There are no clear, consistent reasons for these scenarios. If these situations happen, consult a statistician.


    This page titled 12.3: - Interpreting the F-test and Post-Hoc Comparisons is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Peter Ji.