Skip to main content
Statistics LibreTexts

3.5: SAS Output for ANOVA

  • Page ID
    33438
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    The first output of the ANOVA procedure as shown below, gives useful details about the model.

    ANOVA of Greenhouse Data: The Mixed Procedure
    Model Information
    Data Set WORK.GREENHOUSE
    Dependent Variable Height
    Covariance Structure Diagonal
    Estimation Method Type 3
    Residual Variance Method Factor
    Fixed Effects SE Method Model-Based
    Degrees of Freedom Method Residual
    Class Level Information
    Class Levels Values
    fert 4 Control F1 F2 F3
    Dimensions
    Covariance Parameters 1
    Columns in X 5
    Columns in Z 0
    Subjects 0
    Max Obs Per Subject 24

    The output below titled ‘Type 3 Analysis of Variance’ is similar to the ANOVA table we are already familiar with. Note that it does not include the Total SS, however it can be computed as the sum of all SS values in the table.

    Type 3 Analysis of Variance
    Sources DF Sum of Squares Mean Square Expected Mean Square Error Term Error DF F Value Pr > F
    fert 3 251.440000 83.813333 Var(Residual)+Q(fert) MS(Residual) 20 27.46 <.0001
    Residual 20 61.033333 3.051667 Var(Residual)
    Covariance Parameter Estimates
    Cov Parm Estimate
    Residual 3.0517
    Fit Statistics
    -2 Res Log Likelihood 86.2
    AIC (smaller is better) 88.2
    AICC (smaller is better) 88.5
    BIC (smaller is better) 89.2
    Type 3 Tests of Fixed Effects
    Effect Num DF Den DF F Value Pr > F
    fert 3 20 27.46 <.0001

    The output above titled “Type 3 Tests of Fixed Effects” will display the \(F_{calculated}\) and p-value for the test of any variables that are specified in the model statement. Additional information can also be requested. For example, the method = type 3 option will include the Expected Mean Squares for each source, which will prove to be useful and will be seen in Chapter 6.

    The Mixed Procedure also produces the following diagnostic plots:

    SAS plots for residuals for heights, including residual vs predicted mean, residual vs quantile, and percent vs residual.
    Figure \(\PageIndex{1}\): Diagnostic plots for residuals for height.
    Distribution of residuals for height for the fertilizer data.
    Figure \(\PageIndex{2}\): Box plots for distribution of residuals for height.

    The following display is a result of the LSmeans statement in the PLM procedure which was included in the programming code.

    Differences of fert Least Squares Means
    fert Estimate Standard Error DF t Value Pr > |t| Alpha Lower Upper
    Control 21.0000 0.7132 20 29.45 <.0001 0.05 19.5124 22.4876
    F1 28.6000 0.7132 20 40.10 <.0001 0.05 27.1124 30.0876
    F2 25.8667 0.7132 20 36.27 <.0001 0.05 24.3790 27.3543
    F3 29.2000 0.7132 20 40.94 <.001 0.05 27.7124 30.6876

    In the "Least Squares Means" table above, note that the \(t\)-value and \(Pr >|t|\) are testing null hypotheses that each group mean= 0. (These tests usually do not provide any useful information). The Lower and Upper values are the 95% confidence limits for the group means. Note also that the least square means are the same as the original arithmetic means that were generated in the Summary procedure in Section 3.3 because all 4 groups have the same sample sizes. With unequal sample sizes or if there is a covariate present, the least square means can differ from the original sample means.

    Next, the Plot= mean plot option in the LSmeans statement yields a mean plot and also a diffogram, shown below. The confidence intervals in the mean plot are commonly used to identify the significantly different treatment levels or groups. If two confidence intervals do not overlap, then the difference between the two associated means is statistically significant, which is a valid conclusion. However, if they overlap, it may be the case that the difference might still be significant. Consequently, conclusions made based on the visual inspection of the mean plot may not match with those arrived at using the table of "Difference of Least Square Means", another output of the Tukey procedure, and is displayed below.

    Notice that this is different from the previous table because it displays the results of each pairwise comparison. For example, the first row shows the comparison between the control and F1. The interpretation of these results is similar to any other confidence interval for the difference in two means—if the confidence interval does not contain zero, then the difference between the two associated means is statistically significant.

    Differences of fert Least Squares Means

    Adjustment for Multiple Comparisons: Tukey

    fert _fert Estimate Standard Error DF t Value Pr > |t| Adj P Alpha Lower Upper Adj Lower Adj Upper
    Control F1 -7.6000 1.0086 20 -7.54 <.0001 <.0001 0.05 -9.7038 -5.4962 -10.4229 -4.7771
    Control F2 -4.8667 1.0086 20 -4.83 0.0001 0.0006 0.05 -6.9705 -2.7628 -7.6896 -2.0438
    Control F3 -8.2000 1.0086 20 -8.13 <.0001 <.0001 0.05 -10.3038 -6.0962 -11.0229 -5.3771
    F1 F2 2.7333 1.0086 20 2.71 0.0135 0.0599 0.05 0.6295 4.8372 -0.08957 5.5562
    F1 F3 -0.6000 1.0086 20 -0.59 0.5586 0.9324 0.05 -2.7038 1.5038 -3.4229 2.2229
    F2 F3 -3.3333 1.0086 20 -3.30 0.0035 .0171 0.05 -5.4372 -1.2295 -6.1562 -0.5104

    This discrepancy between the mean plot and the "Difference of Least Square Means" results occurs because the testing is done in terms of the difference of two means, using the standard error of the difference of the two-sample means, but the confidence intervals of the mean plot are computed for the individual means which are in terms of the standard error of individual sample means. Consistent results can be achieved by using the diffogram as discussed below or the confidence intervals displayed in the "difference in mean plot" available in SAS 14, but not included here.

    The diffogram has two useful features. It allows one to identify the significant mean pairs and also gives estimates of the individual means. The diagonal line shown in the diffogram is used as a reference line. Each group (or factor level) is marked on the horizontal and vertical axes and has vertical and horizontal reference lines with their intersection point falling on the diagonal reference line. The \(x\) or the \(y\) coordinates of this intersection point which are equal is the sample mean of that group. For example, the sample mean for the Control group is about 21, which matches with the estimate provided in the "Least Squares Means" table displayed above. Furthermore, each slanted line represents a mean pair. Start with any group label from the horizontal axis and run your cursor up, along the associated vertical line until it meets a slanted line, and then go across the intersecting horizontal line to identify the other group (or factor level). For example, the lowermost solid line (colored blue) represents the Control and F2. As stated at the bottom of the chart, the solid (or blue) lines indicate significant pairs, and the broken (or red) lines correspond to the non-significant pairs. Furthermore, a line corresponding to a nonsignificant pair will cross the diagonal reference line.

    Plot of height LS-Means for fert data, with 95% confidence limits.
    Figure \(\PageIndex{3}\): LS-Means plot.
    Diffogram for height comparisons for fert.
    Figure \(\PageIndex{4}\): Diffogram.

    The non-overlapping confidence intervals in the mean plot above indicate that the average plant height due to control is significantly different from those of the other 3 fertilizer levels and that the F2 fertilizer type yields a statistically different average plant height from F3. The diffogram also delivers the same conclusions and so, in this example, conclusions are not contradictory. In general, the diffogram always provides the same conclusions as derived from the confidence intervals of difference of least-square means shown in the "Difference of Least Square Means" table, but the conclusions based on the mean plot may differ.

    There are two contrasts of interest: contrast to compare the control and F3 with F1 (i.e. \(\mu_{control} - \mu_{F1} - \mu_{F2} + \mu_{F3}\)) and the contrast to compare control and F2 with F1 (i.e., \(\mu_{control} - 2 \mu_{F1} + \mu_{F2}\)). Since we are testing for two contrasts, we should adjust for multiple comparisons. We use Bonferroni adjustment. In SAS, we can use the estimate command under proc plm to make these computations.

    In general, the estimate command estimates linear combinations of model parameters and performs t-tests on them. Contrasts are linear combinations that satisfy a special condition. We will discuss the model parameters in Chapter 4.

    Estimates
    Adjustment for Multiplicity: Bonferroni
    Label Estimate Standard Error DF t Value Pr > |t| Adj P
    Compare control + F3 with F1 and F2 -4.2667 1.4263 20 -2.99 0.0072 0.0144
    Compare control + F2 with F1 -10.3333 1.7469 20 -5.92 <.0001 <.0001

    SAS returns both unadjusted and adjusted \(p\)-values. Suppose we wanted to make the comparisons at 1% level. If we ignored the multiple comparisons (i.e. using unadjusted \(p\)-values), the both comparisons are statistically significant. However, if we consider the adjusted \(p\)-values, we will fail to reject the hypothesis corresponding to the first contrast at the 1% level.


    This page titled 3.5: SAS Output for ANOVA is shared under a CC BY-NC 4.0 license and was authored, remixed, and/or curated by Penn State's Department of Statistics.

    • Was this article helpful?