Loading [MathJax]/jax/output/HTML-CSS/jax.js
Skip to main content
Library homepage
 

Text Color

Text Size

 

Margin Size

 

Font Type

Enable Dyslexic Font
Statistics LibreTexts

Analysis of Variance

( \newcommand{\kernel}{\mathrm{null}\,}\)

1. A single factor study (continued)

A food company wanted to test four different package designs for a new breakfast cereal. 20 stores with approximately the same sales condition (such as sales volume, price, etc) were selected as experimental units. Five stores were randomly assigned to each of the 4 package designs:

  • A balanced complete randomized design.
  • A single, 4-level, qualitative factor: package design.
  • A quantitative response variable: sales - number of packets of cereal sold during the period of study.
  • Goal: exploring relationship between package design and sales.

1.1 ANOVA for single factor study

A simple statistical model the data is as follows:

Yij=μi+ϵij,j=1,...,ni;i=1,...,r;

where

  • r is the number of factor levels (treatments) and ni is the number of experimental units corresponding to the i-th factor level;
  • Yij is the measurement for the j-th experimental unit corresponding to the i-th factor level;
  • μi is the mean of all the measurements corresponding to the i-th factor level (unknown);
  • ϵij's are random errors (unobserved).

1.2 Model asumptions

The following assumptions are made about the above model:

  • ϵij are independently and identically distributed as N(0,σ2).
  • μi's are unknown fixed parameters (so called fixed effects), so that E(Yij)=μi and Var(Yij)=σ2. The above assumption is thus equivalent to assuming that Yij are independently distributed as N(μi,σ2).

1.3 Estimation of μi

Define, the sample mean for the i-th factor level:

¯Yi.=1ninij=1Yij=1niYi.

where Yi.=nij=1Yij is the sum of responses for the i-th treatment group, for i=1,...,r; and the overall sample mean:

¯Y..=1nTri=1nij=1Yij=1nTri=1ni¯Yi.=Y..nT,

where nT=ri=1ni. Then ¯Yi. is an estimate of μi for each i=1,...,r. Under the assumptions, ¯Yi. is an unbiased estimator of μi since

E(¯Yi.)=1ninij=1E(Yij)=1ninij=1μi=μi.

Table 1: Data summary: packaging of breakfast cereals
S1 S2 S3 S4 S5 (Yi.) ¯Yi.) ni
Packaging Design D1 11 17 16 14 15 73 14.6 5
Packaging Design D2 12 10 15 19 11 67 13.4 5
Packaging Design D3 23 20 18 17 Miss 78 19.5 4
Packaging Design D4 27 33 22 26 28 136 27.2 5
Total Y.. = 354

¯Y.. = 18.63

19

1.4 Comparison of factor level means

Want to check for deviations from the null hypothesis H0:μ1=...=μr, i.e., the alternative hypothesis is Ha: not all μ1's are equal.

  • Idea 1: A baseline value for comparison is the overall mean:

μ.=ri=1niμinr.

  • Idea 2: Calculate deviations from the overall mean for each factor level:

(μ1μ.)2,...,(μrμ.)2.

Under H0:μ1=...=μr, these deviations are all zero.

  • Idea 3: Use the weighted sum of the above deviations as an overall measurement of the deviation from H0:μ1=...=μr:

ri=1ni(μiμ.)2

The weight of the i-th treatment group is its sample size ni, i.e., the more data, the more importance.

Estimators

Estimate the population means by their sample counterparts:

¯Y1.μ1,...,¯Yr.μr

and

¯Y..=1nTni=1ni¯Yi.μ.

Thus,

ri=1ni(¯Yi.¯Y..)2

is a statistic to measure the deviation from H0:μ1=...=μr. However, ri=1ni(¯Yi.¯Y..)2 is not an unbiased estimator of ri=1ni(μiμ.)2. In fact

E[ri=1ni(¯Yi.¯Y..)2]=(r1)σ2+ri=1ni(μiμ.)2.

Nevertheless, we can compare the magnitude of ri=1ni(¯Yi.¯Y..)2 to that of σ2 to decide whether the deviation is large or not.

Decomposition of Total Sum of Squares

Write

Yij¯Y..=(Yij¯Yi.)+(¯Yi.¯Y..)

  • Yij¯Y.. : deviation of the response from the overall mean;
  • ¯Yi.¯Y.. : deviation of the i-th factor level mean from the overall mean;
  • Yij¯Yi. : deviation of the response from the corresponding factor level mean (residual).

Then the ANOVA decomposition of the sum of squares:

ri=1nij=1(Yij¯Y..)2=ri=1nij=1(Yij¯Yi.)2+ri=1ni(¯Yi.¯Y..)2.

This can be expressed as

SSTO=SSE+SSTR

where SSTO=ri=1nij=1(yij¯y..)2 is the Total Sum of Squares; SSE=ri=1nij=1(yij¯yi.)2 is the Error Sum of Squares and SSTR=ri=1ni(¯yi.¯y..)2 is the Treatment Sum of Squares.

Interpretation of decomposition (5)

  • SSTO: A measure of the overall variability among the responses.
  • SSTR: A measure of the variability among the factor level means. The more similar the factor level means are, the smaller is the SSTR.
  • SSE: A measure of the random variation of the responses around their corresponding factor level means. The smaller the error variance is, the smaller the SSE tends to be.
  • Overall variability is the sum of the variability due to difference in treatments and that due to random fluctuations.

For the study on the effect of package design on sales volume

Refer to table 1.3. Based on the information there:

SSTO=(1118.63)2+(1718.62)2+...+(2818.63)2=746.42

SSTR=5(14.618.63)2+5(13.418.63)2+4(19.518.63)2+5(27.218.63)2=588.22

SSE=(1114.6)2+...+(1514.6)2+...+(2727.2)2+...+(2827.2)2=158.20.

Contributors

  • Scott Brunstein (UCD)
  • Debashis Paul (UCD)

This page titled Analysis of Variance is shared under a not declared license and was authored, remixed, and/or curated by Debashis Paul.

Support Center

How can we help?