10.1: ANCOVA with Quantitative Factor Levels

Last updated
Save as PDF

Page ID: 33176

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

An Extended Overview of ANCOVA

Designed experiments often contain treatment levels that have been set with increasing numerical values. For example, a chemical process may be hypothesized to vary by two factors: the Reagent type (A or B), and temperature. So the researchers conducted an experiment that investigates a response at 40, 50, 60, 70, and 80 degrees (Fahrenheit) for each of the Reagent types.

You can find the data at QuantFactorData.csv.

If temperature is considered as a categorical factor, we can proceed as usual with a 2 × 5 factorial ANOVA to evaluate the Null Hypotheses: \[H_{0}: \ \mu_{A} = \mu_{B}\] \[H_{0}: \ \mu_{40} = \mu_{50} = \mu_{60} = \mu_{70} = \mu_{80}\] and \[H_{0}: \text{ no interaction}\]

Although the above hypotheses achieve the goal of comparing response means for the process carried out at different temperatures, no conclusion can be made about the trend of the response as the temperature is increased.

In general, the trend effects of a continuous predictor are modeled using a polynomial where its non-constant terms represent the different trends such as linear, quadratic, and cubic effects. These non-constant terms in the polynomial are called trend terms. The statistical significance of these trend terms can also be tested in an ANCOVA setting by adding columns representing the trend terms and their interaction effects with the categorical factor into the design matrix (X) of the General Linear Model (see Chapter 4 for the definition of a design matrix).

Note that the design matrix representing only the categorical factor contains the column of ones representing the reference factor level and other dummy variable columns representing the remaining factor levels.

Inclusion of the trend term columns will facilitate significance testing for the overall trend effects and the columns representing the interactions can be utilized to compare differences of each trend effect among the categorical factor levels.

Getting back to the chemical process example, if the quantitative property of measured temperature is used, we can carry out an ANCOVA by fitting a polynomial regression model to express the impact of temperature on the response. If a quadratic polynomial is desired, the appropriate ANCOVA design matrix can be obtained by adding two columns representing \(temp\) and \(temp^{2}\) along with the column of ones representing the reagent type A, the reference reagent category, and one dummy variable column representing the reagent type B.

The \(temp\) and \(temp^{2}\) terms allow us to investigate the linear and quadratic trends respectively. Furthermore, the inclusion of columns representing the interactions between the reagent type and the two trend terms will facilitate the testing of differences between these two trends between the two reagent types. Note also that additional columns can be added appropriately to fit a polynomial of an even higher order.

Rule

To fit a polynomial of degree n, the response should be measured at least (n+1) distinct levels of the covariate. Preliminary graphics such as scatterplots are useful in deciding the degree of the polynomial to be fitted.

Suggestion

To reduce structural multicollinearity, centering the covariate by subtracting the mean is recommended. For more details see STAT 501 - Chapter 12: Multicollinearity

The necessary software code and/or commands along with outputs and conclusions are given below.

In SAS, this process would look like this:

/*centering the covariate creating x^2 */
data centered_quant_factor;
set quant_factor;
x = temp-60;
x2 = x**2;
run;
proc mixed data=centered_quant_factor method=type3;
class reagent;
model product=reagent x x2 reagent*x reagent*x2;
title 'Centered';
run;

Notice that we specify reagent as a class variable, but \(x\) and \(x^2\) enter the model as continuous variables. The regression coefficient of \(x\) and \(x^2\) can be used to test the significance of the linear and quadratic trends for reagent type A, the reference category and the interaction term coefficients can be used if these trends differ by categorical factor level. For example, testing the null hypothesis \(H_{0}: \ \beta_{Reagent * x} = 0\) where \(\beta_{Reagent * x}\) is the regression coefficient of the \(Reagent * x\) term is equivalent to testing that the linear effects are the same for reagent type A and B.

SAS output:

Type 3 Analysis of Variance
Source	DF	Sum of Squares	Mean Square	Expected Mean Square	Error Term	Error DF	F Value	Pr > F
reagent	1	3.066357	3.066357	Var(Residual) + Q(reagent)	MS(Residual)	24	2.97	F" class=" ">0.0977
x	1	97.600495	97.600495	Var(Residual) + Q(x,x*reagent)	MS(Residual)	24	94.52	F" class=" "><.0001
x2	1	88.832986	88.832986	Var(Residual) + Q(x2,x2*reagent)	MS(Residual)	24	86.03	F" class=" "><.0001
x*reagent	1	0.341215	0.341215	Var(Residual) + Q(x*reagent)	MS(Residual)	24	0.33	F" class=" ">0.5707
x2*reagent	1	0.067586	0.067586	Var(Residual) + Q(x2*reagent)	MS(Residual)	24	0.07	F" class=" ">0.8003
Residual	24	24.782417	1.032601	Var(Residual)	.	.	.	F" class=" ">.

The reagent effect was not significant with \(p = 0.0977\)
Only the linear and quadratic effects were significant in describing the trend in the response, and linear and quadratic effects were the same for each of the reagent types (no interactions)

Graph of product vs temperature in F for reagent A, reagent B, and polynomial regression curves for reagents A and B. — Figure \(\PageIndex{1}\): Graphing product vs temperature

Using R

Steps:

Load the Quant Factor Data.
Obtain the ANOVA table after centering the covariate and creating \(x^2\).
Plot the data.

Steps in R

1. Load the Quant Factor data, obtain the ANOVA table (after centering the covariate), and create \(x^2\) by using the following commands:

setwd("~/path-to-folder/")
QuantFactor_data <- read.table("QuantFactorData.txt",header=T)
attach(QuantFactor_data)
temp_center<-temp-60
temp_square_center<-temp_center^2
new_data<-cbind(QuantFactor_data,temp_center,temp_square_center)
ancova_model<-lm(product ~ reagent + temp_center + temp_square_center + reagent:temp_center + reagent:temp_square_center,new_data)
anova(ancova_model)
#Analysis of Variance Table
#Response: product
#                           Df  Sum Sq  Mean Sq  F value     Pr(>F)
#reagent                     1   9.239    9.239   8.9476   0.006336 **
#temp_center                 1  97.600   97.600  94.5191  8.499e-10 ***
#temp_square_center          1  88.833   88.833  86.0284  2.093e-09 ***
#reagent:temp_center         1   0.341    0.341   0.3304  0.570749
#reagent:temp_square_center  1   0.068    0.068   0.0655  0.800257
#Residuals                  24  24.782    1.033
#---
#Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Only the linear and quadratic effects were significant in describing the trend in the response, and linear and quadratic effects were the same for each of the reagent types (no interactions).

2. Plot the polynomial regression curve for reagent A and reagent B by using the following commands:

reagentA_regression <- lm(product ~ temp_center + temp_square_center,data=subset(new_data,reagent=="A"))
reagentB_regression <- lm(product ~ temp_center + temp_square_center,data=subset(new_data,reagent=="B"))
plot(temp,product,ylim=c(0,20),xlab="Temperature", ylab="Product",pch=23, col=ifelse(reagent=="A","blue","red"), lwd=2)
lines(fitted(reagentA_regression) ~ temp, data=subset(new_data,reagent=="A"), col = "blue", type="l")
lines(fitted(reagentB_regression) ~ temp, data=subset(new_data,reagent=="B"), col = "red", type="l")
text(locator(1),"reagent A",col="blue")
text(locator(1),"reagent B",col="red")
detach(QuantFactor_data)

Plot of product vs temperature, showing polynomial regression curves for reagents A and B, created in R. — Figure \(\PageIndex{2}\): Graphing product vs temperature using R