Skip to main content
Statistics LibreTexts

12.2: The Concept Behind Analysis of Variance

  • Page ID
    57592
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Analysis of Variance is based on the partitioning of the variance. Partitioning means dividing, and as we have learned, we always partition variance into two types, true and error variance. This section examines how this process works for an ANOVA.

    12.2.1: How Does it Look?

    Figure 1 depicts an ANOVA.

    clipboard_ea725c958b5b1b79b1a2928440ea5252c.png
    Figure 1: Graph Depicting an ANOVA

    12.2.2: Why Call it Analysis of Variance?

    With these three groups, we could run three t-tests. One t-test between the red and green groups, the green and blue groups, and the red and blue groups. But that is too many t-tests. It is simpler to analyze the groups all at once and get one number to determine if there are differences among the three groups. To analyze this situation, we will further discuss the pros and cons of running three t-tests versus one overall statistical test, in this case, the ANOVA.

    Recall how statistics are basically comparing something to something else. In statistics, we compare the observations we want to see and determine if the pattern of observations we see differs from error or random observations. In statistics, when we compare, we use ratios. In ANOVA, we compare variances, specifically between-group variances with within-group variances.

    • Within groups variation: variation of scores around each group’s mean (sometimes called error).
    • Between groups variation: variation among the group means.
    • You want: between groups variation > within groups variation.

    To illustrate, in Figure 2, the top figure represents a situation where the group means are not different from each other. You do not need me to tell you the groups look similar. Their means are close together, and their score distribution spreads are overlapping, to the point where these groups look like just one big group. The bottom figure represents a situation where the group means are different from each other. You do not need me to tell you the groups do not look similar. Their means are far apart, and their score distribution spreads do not overlap to the point where these look like three separate groups.

    clipboard_e6a3d264e81746e9f6bf45c367c49a10b.png
    Figure 2: Difference in Between-Group and Within-Group Variation

    As you know, we cannot rely on narrative to tell others that we have three separate groups. We need a number to represent the observation that we have three groups overlapping with each other to the point where they look like one big group, or that we have three separate groups. That number is the F-test. The “F” stands for Fischer, who devised the formula for the ANOVA analysis (interesting how statisticians like to name statistics formulas after themselves). We say we are conducting an ANOVA analysis; the F-test is the statistical test that produces the statistical value as a result of the ANOVA analysis. Basically, when someone states or writes about the F-test, they conduct an ANOVA.

    The ratio for an F-test is between-subjects variance / within-subjects variance. The blue double arrows are the variance “within” each group. Notice how the blue double arrow goes from one end of the group’s distribution to the other end. We pool that variance together to see how much variance is within each group. The one long orange double arrow is the variance that is between each group. In contrast to the blue double arrow, the orange double arrow is specifically between the group means. We want to see if the distance between the group means is greater than the distance between the spread of scores, or the distribution, when we combine the groups. Recall for ratios, if the numerator is greater than the denominator, we get a positive number, if the denominator is greater than the numerator, we get a number less than one, and if the numerator and denominator are equal, we get one.

    For the F-test, we want to see large positive numbers. The larger the number, the better. The ANOVA or the F-test is significant if the F-test value is greater than the threshold value and the p-value is less than your alpha level. Usually, the F-test value threshold is 1.96, N >= 30 per group, p- is set at .05. Recalling the 1.96 value, just like the t-test, means the between-subject variance is twice as large as the within-subject variance. For the following situations:

    • F-test = 1.03, p- = .38. test not significant.
    • F-test = 4.39, p- < .01. test is significant.

    This page titled 12.2: The Concept Behind Analysis of Variance is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Peter Ji.