Skip to main content
Statistics LibreTexts

Multiple Comparison

  • Page ID
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    Multiple comparison refers to the situation where a family of statistical inferences are considered simultaneously.

    • Examples: construct a family of confidence intervals, or test multiple hypotheses.
    • However, "errors" are more likely to occur when one consider these inferences as a whole.

    If one conducts 100 hypotheses testing, each at the 0.05 significance level. Then even when all 100 null hypotheses are true, on average, one would reject 5 of them purely by chance. If these tests are independent, then the probability of at least one wrong rejection is 1 - 0.95100 = 99.4%.

    Therefore, we also want to simultaneously control these errors.

    • C.I. : maintain a family -wise confidence coefficient;
    • Hypothesis testing : control the family-wise Type-I error rate.

    1.1 Family-wise confidence coefficient for pairwise differences

    For r factor levels, there are r(r - 1)/2 pairwise comparisons of form \(D_{ij} = \mu_i - \mu_j\) (1 \(\leq i < j \leq\) r).

    For a given pair (i, j), denote the (1 - \(\alpha\))-C.I. for \(D_{ij}\) by \(C_{ij}\)(\(\alpha\)):

    $$ C_{ij}(\alpha)=\widehat{D}_{ij} \pm t(1-\frac{\alpha}{2};n_T-r) \times s(\widehat{D}_{ij}). $$

    • \(\widehat{D}_{ij}=\bar{Y}_{i\cdot}-\bar{Y}_{j\cdot}\) is the estimator
    • s(\(\widehat{D}_{ij}\)) is its estimated standard error
    • \(t(1-\frac{\alpha}{2};n_T-r)\) is the multiplier of the standard error that gives the desired confidence coefficient 1 - \(\alpha\)

    By definition

    $$ \mathbb{P}(D_{ij} \in C_{ij}(\alpha)) = 1-\alpha, $$

    in other words, the probability that \(D_{ij}\) falls out of this C.I. is controlled by \(\alpha\).

    Definition: Family-wise confidence coefficient

    Family-wise confidence coefficient of this family of C.I.s is defined as

    $$ \mathbb{P}(D_{ij} \in C_{ij}(\alpha), ~\hbox{for all \(1 \leq i<j \leq r\)}) $$

    i.e., the probability that these C.I.s simultaneously contain their corresponding parameter.

    It is obvious that

    $$ \begin{eqnarray*}
    && \mathbb{P}(D_{ij} \in C_{ij}(\alpha), ~ \hbox{for all $1 \leq i<j \leq r$})\\
    &\leq& \mathbb{P}(D_{i'j'} \in C_{i'j'}(\alpha))~=~1-\alpha,

    for any given pair (i', j').

    How to construct C.I.s such that the family-wise confidence coefficient is at least 1 - \(\alpha\) ?

    Idea: replace \(t(1-\frac{\alpha}{2};n_T-r)\) by another multiplier which gives a family-wise confidence coefficient (1 - \(\alpha\)).

    1.2 Bonferroni's Procedure

    Suppose we want to construct g C.I.s simultaneously and control the family-wise confidence coefficient at level 1 - \(\alpha\)

    • Bonferroni procedure : construct each C.I. at level 1 - \(\alpha\)/g.

    -- If we want to construct C.I.s for g pairwise comparisons, then Bonferroni's C.I.s are of the form

    $$ C^B_{ij}(\alpha)=\widehat{D}_{ij} \pm B \times s(\widehat{D}_{ij}). $$

    where \(B=t(1-\frac{\alpha}{2g};n_T-r).\)


    $$ \mathbb{P}(D_{ij} \in C^B_{ij}(\alpha), \hbox{for all $g$ pairs}~(i,j)) \geq 1-\alpha. $$

    This is due to the Bonferroni's inequality:

    Bonferroni's inequality

    $$ \mathbb{P}(\bigcap_{k=1}^g A_k) \geq 1- \sum_{k=1}^g \mathbb{P}(A_k) = 1-g\beta,$$

    provided that \(\mathbb(P)(A_k) = 1 - \beta\), for each k = 1, \(\cdots\) , g;

    and the fact that for every pair (i, j),

    $$ \mathbb{P}(D_{ij} \in C^B_{ij}(\alpha))=1-\frac{\alpha}{g}. $$

    1.3 Tukey's Procedure

    Tukey's procedure for families of pairwise comparisons: define C.I. for \(D_{ij}\) as

    $$ C^T_{ij}(\alpha)=\widehat{D}_{ij} \pm T \times s(\widehat{D}_{ij}), $$

    where the multiplier T : = \(\frac{1}{\sqrt{2}}q(1-\alpha;r,n_T-r)\) and q(r, \(n_T - r) is the studentized range distribution with parameters r and \(n_T - r\) (refer to Table B.9).

    For such C.I.s, the family-wise confidence coefficient is at least 1 - \(\alpha\), i.e.,

    $$ \mathbb{P}(D_{ij} \in C^T_{ij}(\alpha), ~\hbox{for all $1 \leq i<j \leq r$}) \geq 1-\alpha. $$

    In the above "=" holds for balanced designs. Tukey's procedure is conservative for unbalanced designs.

    Reasoning of Tukey's Procedure

    • Fact 1: Suppose that \(X_1\), \(\cdots\) , \(X_r\) are i.i.d N(\(\mu\), \(\sigma^2\)). Let W = max{\(X_i\)} - min{\(X_i\)} denote the range of the data. If \(s^2\) is an estimator of \(\sigma^2\), which is also independent of \(X_i\)'s and has \(\nu\) degrees of freedom ( with \(\nu\)\(s^2\)/\(\sigma^2\) having \(\chi_{(\nu)}^2\) distribution), then the quantity W/s is called the studentized range and we have

    $$ \frac{W}{s} \sim q(r,\nu). $$

    • Fact 2: When \(n_1 = \cdots = n_r = n\) (balanced design), \(\overline{Y}_{1\cdot}-\mu_1,\cdots, \overline{Y}_{r\cdot}-\mu_r\) are i.i.d N(0, \(\frac{\sigma^2}{n})\); MSE is an estimator of \(\sigma^2\) with \(n_T - r\) degrees of freedom and is independent of \(\{\overline{Y}_{i\cdot} - \mu_i\}\)'s . Therefore

    $$ \frac{\max\{\overline{Y}_{i\cdot} - \mu_i\} - \min\{\overline{Y}_{i\cdot} - \mu_i\}}{\sqrt{MSE/n}} \sim q(r,n_T - r). $$

    • Fact 3: For a given pair \(1 \leq i < j \leq r\),

    $$ |\widehat{D}_{ij} - D_{ij}| = |(\overline{Y}_{i\cdot} - \mu_i) - (\overline{Y}_{j\cdot} - \mu_j)|\\ \leq \max\{\overline{Y}_{i\cdot} - \mu_i\} - \min\{\overline{Y}_{i\cdot } - \mu_i\}. $$

    • Fact 4: \(s(\widehat{D}_{ij})=\sqrt{MSE(\frac{1}{n}+\frac{1}{n})}=\sqrt{2}\sqrt{\frac{MSE}{n}}\) and for each pair (i, j),

    $$ D_{ij} \in C^T_{ij}(\alpha) \Longleftrightarrow \frac{|\widehat{D}_{ij} - D_{ij}|}{s(\widehat{D}_{ij})} \leq T. $$

    Finally, the family-wise confidence coefficient for Tukey's C.I.s is

    $$ \begin{eqnarray*}
    &&\mathbb{P}(D_{ij} \in
    C^T_{ij}(\alpha), ~\hbox{for all $1 \leq i<j \leq r$})\\
    &=& \mathbb{P}\Bigl(\frac{|\hat{D}_{ij}-D_{ij}|}{s(\hat{D}_{ij})} \leq T, ~\hbox{for all $1 \leq i<j \leq r$}\Bigr)\\
    \leq \sqrt{2}T \Bigr)\\
    &=& \mathbb{P}(q(r,n_T-r) \leq q(1-\alpha;r, n_T-r)) ~=~ 1-\alpha.
    \end{eqnarray*} $$

    2 Simultanrous inference for contrasts: Scheffé's procedure

    • There are many contrasts (indeed infinitely many). How to achieve a family-wise confidence coefficient (say 1 - \(\alpha\)) or a type-one error rate (say \(\alpha\)) for a large number of contrasts?
    • Consider the family of all possible contrasts:

    $$ \mathcal{L}=\Bigl\{L=\sum_{i=1}^r c_i\mu_i: \quad \sum_{i=1}^r c_i=0 \Bigr\}. $$

    • Scheffé's procedure: define the C.I. for a contrast L as

    $$ C^S_L(\alpha) :=\hat{L} \pm S \times s(\hat{L}), $$

    where \(S^2=(r-1)F(1-\alpha;r-1,n_T-r)\).

    2.1 Interpretation of Scheffé's procedure

    • The family-wise confidence coefficient of \({C^S_L(\alpha)}\) is

    \mathbb{P}(L \in C^S_{L}(\alpha), \ \hbox{for all $L \in \mathcal{L}$}) = 1-\alpha. (1)

    • Interpretation: If the study were repeated many times and each time these C.I.s were constructed, then in (1 - \(\alpha\))100% of times, all contrasts would fall into their respective C.I.s.
    • Simultaneous testing: reject \(H_{0L}\) : L = 0, if zero is not contained in the corresponding C.I.: \({C^S_L(\alpha)}\). Such a decision rule has a family-wise significance level \(\alpha\), i.e.,

    $$ \mathbb{P}(\hbox{at least one of \(H_{0L}\) is rejected} | \hbox {all \(H_{0L}\) are true}) \leq \alpha. $$

    2.2 Application to package design example

    Suppose we want to maintain a family-wise confidence coefficient at 90% for all possible contrasts simultaneously. Then for example, the Scheffé's C.I. of

    $$ L=\frac{\mu_1+\mu_2}{2}-\frac{\mu_3+\mu_4}{2}, $$

    is constructed by:

    • \(S^2=(r-1)F(1-\alpha;r-1,n_T-r)=3 \times F(0.9;3,15)=7.47\), which means \(S=\sqrt{7.47}=2.73.\)
    • The Scheffé's C.I. of L is

    $$ -9.35 \pm 1.50 \times 2.73=[-13.4, \ -5.3]. $$

    • Note that the Scheffé multiplier S = 2.73 is larger than the multiplier t(0.95;15) = 1.753 if we are only interested in L.

    2.3 Justification of Scheffé procedure

    Consider an arbitrary sequence \(c_1, \cdots, c_r\) satisfying \(\sum_{i=1}^r c_i = 0.\) Then, with L = \(\sum_{i=1}^r c_i \mu_i\), and \(\widehat{L}\) = \(\sum_{i=1}^r c_i \overline{Y}_{i\cdot} \) we have

    \widehat L - L &=& \sum_{i=1}^r c_i (\overline{Y}_{i\cdot} - \mu_i) \\
    &=& \sum_{i=1}^r c_i [(\overline{Y}_{i\cdot} - \mu_i) -
    (\overline{Y}_{\cdot\cdot} - \mu_{\cdot})] \qquad (\mbox{since}~\sum_{i=1}^r
    c_i = 0)\\
    &=& \sum_{i=1}^r c_i (\overline{\varepsilon}_{i\cdot} -

    $$ \overline{\varepsilon}_{i\cdot} = \frac{1}{n_i} \sum_{j=1}^{n_i} \varepsilon_{ij} \qquad\mbox{and} \qquad \varepsilon_{\cdot\cdot} = \frac{1}{n_T} \sum_{i=1}^r \sum_{j=1}^{n_i} \varepsilon_{ij} $$

    where \(\varepsilon_{ij} = Y_{ij} - \mu_i are i.i.d. N(0,\sigma^2)\).

    Cauchy-Schwarz inequality

    et \(a_1, \cdots, a_r\) and \(b_1, \cdots, b_r\) be real numbers. Then

    $$ |\sum_{i=1}^r a_i b_i | \leq \sqrt{\sum_{i=1}^r a_i^2}\sqrt{\sum_{i=1}^r b_i^2}. $$

    Taking aking \(a_i = c_i/\sqrt{n_i}\) and \(b_i = \sqrt{n_i}(\overline{\varepsilon}_{i\cdot} - \overline{\varepsilon}_{\cdot\cdot})\), and applying the Cauchy-Schwarz inequality, we obtain

    |\widehat L - L| &=& |\sum_{i=1}^r \frac{c}{\sqrt{n_i}}
    \sqrt{n_i}(\overline{\varepsilon}_{i\cdot} -
    \overline{\varepsilon}_{\cdot\cdot})| \nonumber\\
    &\leq& \sqrt{\sum_{i=1}^r \frac{c_i^2}{n_i}} \sqrt{\sum_{i=1}^r n_i
    (\overline{\varepsilon}_{i\cdot} - \overline{\varepsilon}_{\cdot\cdot})^2} . (2)

    $$ s(\widehat L) = \sqrt{MSE} \sqrt{\sum_{i=1}^r \frac{c_i^2}{n_i}} $$

    from equation (2), we have

    $$ \frac{|\widehat{L} - L|}{s(\widehat{L})} \leq \sqrt{\frac{\sum_{i=1}^r n_i (\overline{\varepsilon}_{i\cdot} - \overline{\varepsilon}_{\cdot\cdot})^2}{MSE}}~. $$

    Observe that \(\sum_{i=1}^r n_i (\overline{\varepsilon}_{i\cdot} - \overline{\varepsilon}_{\cdot\cdot})^2\) has the same distribution as SSTR under the hypothesis \(\mu_1 = \cdots = \mu_r.\) Thus,

    $$ \frac{\sum_{i=1}^r n_i (\overline{\varepsilon}_{i\cdot} - \overline{\varepsilon}_{\cdot\cdot})^2}{\sigma^2} \sim \chi_{(r-1)}^2. $$

    Also, since \(\sum_{i=1}^r n_i (\overline{\varepsilon}_{i\cdot} - \overline{\varepsilon}_{\cdot\cdot})^2\) is defined in terms of \(\overline{Y}_{1\cdot},\ldots,\overline{Y}_{r\cdot}\), it is independent of MSE. Moreover, \(SSE/\sigma^2 \sim \chi_{(n_T-r)}^2\). Therefore,

    \frac{\sum_{i=1}^r n_i (\overline{\varepsilon}_{i\cdot} -
    \overline{\varepsilon}_{\cdot\cdot})^2/ (r-1) }{MSE} =
    \frac{\chi_{(r-1)}^2/(r-1)}{\chi_{(n_T-r)}^2/(n_T - r)} \sim F_{(r-1,n_T-r)} (3)
    This proves that if we choose \(S = \sqrt{(r-1) F(1-\alpha;r-1,n_T-r)}\), then the intervals \(C^{S}_{L}\) defined by \(\widehat{L} \pm S \times s(\widehat{L})\) satisfy (1).

    3 Comparison of different multiple comparison procedures

    In this section, we analyze the performance of Bonferroni's, Tukey's and Scheffé procedure for finding confidence intervals for multiple parameters (pairwise diffeneces of treatment means or more general contrasts).

    3.1 Rust inhibitors example revisited

    In a study of the effectiveness of different rust inhibitors, four brands (1,2,3,4) were tested. Altogether, 40 experimental units were randomly assigned to the four brands, with 10 units assigned to each brand (balanced design). The resistance to rust was evaluated in a coded form after exposing the experimental units to severe conditions.

    Summary statistics and ANOVA table: \(n_1 = n_2 = n_3 = n_4 = 10\) and \(\overline{Y}_{1\cdot} = 43.14\), \(\overline{Y}_{2\cdot} = 89.44\), \(\overline{Y}_{3\cdot} = 67.95\) and \(\overline{Y}_{4\cdot} = 40.47.\)

    Source of Variation Sum of Squares (SS) Degrees of Freedom (df) MS
    Between treatments SSTR = 15953.47 r - 1 = 3 MSTR = 5317.82
    Within treatments SSE = 221.03 \(n_T - r = 36\) MSE = 6.140
    Total SSTO = 16174.50 \(n_T - 1 = 39\)

    Example \(\PageIndex{1}\)

    All 6 pairwise comparisons \(D_{ij} = \mu_i - \mu_j$, $1\leq i < j \leq 4\), are of interest.

    First we construct the Tukey's multiple comparison confidence intervals for all pairwise comparisons with a family-wise confidence coefficient 95%.

    • Using linear interpolation based on the quantiles given in Table B.9, q(0.95;4,36) \(\approx\) 3.814. A more accurate value of 3.809 may be obtained by using the R command qtukey (0.95,4,36). Thus, \(T=\frac{1}{\sqrt{2}}q(1-\alpha;r,n_T-r)=\frac{1}{\sqrt{2}}q(0.95;4,36)=\frac{1}{\sqrt{2}}3.809=2.69\).
    • Note that T = 2.69 > 2.03 = t(0.975;36).
    • Tukey's C.I. for \(\mu_1 - \mu_2\) is

    $$ -46.3 \pm 1.11 \times 2.69 =[-49.31,-43.29]. $$

    • All six confidence intervals are:

    &&-49.3 \leq \mu_1-\mu_2 \leq -43.3, \qquad -27.8 \leq \mu_1-\mu_3 \leq
    && -0.3 \leq \mu_1-\mu_4 \leq 5.7, \qquad 18.5 \leq \mu_2-\mu_3 \leq
    && 46.0 \leq \mu_2-\mu_4 \leq 52.0, \qquad 24.5 \leq \mu_3-\mu_4 \leq 30.5.

    • Zero is contained in one of the C.I.s (corresponding to \(\mu_1 - \mu_4)\), but is not in the other five C.I.s. Therefore, at the family-wise significance level 0.05, we should not reject \(H_{0,14} : \mu_1 = \mu_4\), but should reject the other five null hypotheses of the form \(H_{0,ij} : \mu_i = \mu_j.)\)

    Let us compare with the simultuneous C.I.s formed by the Tukey's and Scheffé's procedure.

    • Tukey's multiplier:

    $$ T=\frac{1}{\sqrt{2}}q(1-\alpha;r,n_T-r)=\frac{1}{\sqrt{2}}q(0.95;4,36)=2.69. $$

    • Bonferroni's multiplier: since g = 6,

    $$ B=t(1-\frac{\alpha}{2g};n_T-r)=t(1-\frac{0.05}{12};36)=2.79. $$

    • Scheffé's multiplier:

    $$ S = \sqrt{(r-1)F(1-\alpha;r-1,n_T-r)} = \sqrt{3 F(0.95;3,36)}= 2.93. $$

    • T < B < S, and so Tukey's procedure is the best (gives rise to the narrowest confidence intervals).
    • For example, the C.I.s for \(\mu_1 - \mu_2\) are

    $$ T: [-49.3,-43.3]; \qquad S: [-49.6,-43.0]; \qquad B: [-49.4,-43.2]. $$

    Example \(\PageIndex{2}\)

    Suppose that only 4 pairwise comparisons are of interest. Then

    • T and S won't change
    • Bonferroni's multiplier B decreases. Now g = 4, and so

    $$ B=t(1-\frac{\alpha}{2g};n_T-I)=t(1-\frac{0.05}{8};36)=2.63. $$

    • B < T < S, and so Bonferroni's procedure is the best.
    • For example, the Bonferroni's C.I. for \(\mu_1 - \mu_2\) narrow down to [-49.2, -43.4].

    4 Comparative merits of Tukey's, Bonferroni's and Scheffé's procedures

    • All three procedures give confidence intervals of the form

    $$ {\rm estimator} \pm {\rm multiplier} \times {\rm SE} $$

    where multiplier = T, B or S and SE stands for the estimated standard error of the estimator.

    • Tukey's procedure: applicable to families of pairwise comparisons.
    • Bonferroni's procedure: applicable to families of finite number of pre-specified linear combinations (whether these are contrasts or not).
    • Scheffé's procedure: applicable to families of finite/infinite number of contrasts.
    • If the family of interest consists of all pairwise comparisons, then Tukey's procedure is the best.
    • If the family consists of some (but not all) of the pairwise comparisons, Bonferroni's procedure may or may not be better than Tukey's depending on the number of pairwise comparisons of interests.
    • If the family consists of finite number of contrasts no larger than the number of factor level means, then Bonferroni's procedure is better than Scheffé's. Otherwise, Bonferroni's procedure may or may not be better than Scheffé's.
    • In practice, one can compute all applicable multipliers and use the smallest number to construct the C.I.s.


    • Yingwen Li (UCD), Debashis Paul (UCD)

    This page titled Multiple Comparison is shared under a not declared license and was authored, remixed, and/or curated by Debashis Paul.