Concepts Related to Hypothesis Tests

Last updated
Save as PDF

Page ID: 249

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $

$ \newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$

( \newcommand{\kernel}{\mathrm{null}\,}\) $ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$ \newcommand{\Span}{\mathrm{span}}$

$ \newcommand{\id}{\mathrm{id}}$

$ \newcommand{\Span}{\mathrm{span}}$

$ \newcommand{\kernel}{\mathrm{null}\,}$

$ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$

$ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$

$ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\AA}{\unicode[.8,0]{x212B}}$

$ \newcommand{\vectorA}[1]{\vec{#1}} % arrow$

$ \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$

$ \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vectorC}[1]{\textbf{#1}} $

$ \newcommand{\vectorD}[1]{\overrightarrow{#1}} $

$ \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} $

$ \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} $

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $

$\newcommand{\avec}{\mathbf a}$ $\newcommand{\bvec}{\mathbf b}$ $\newcommand{\cvec}{\mathbf c}$ $\newcommand{\dvec}{\mathbf d}$ $\newcommand{\dtil}{\widetilde{\mathbf d}}$ $\newcommand{\evec}{\mathbf e}$ $\newcommand{\fvec}{\mathbf f}$ $\newcommand{\nvec}{\mathbf n}$ $\newcommand{\pvec}{\mathbf p}$ $\newcommand{\qvec}{\mathbf q}$ $\newcommand{\svec}{\mathbf s}$ $\newcommand{\tvec}{\mathbf t}$ $\newcommand{\uvec}{\mathbf u}$ $\newcommand{\vvec}{\mathbf v}$ $\newcommand{\wvec}{\mathbf w}$ $\newcommand{\xvec}{\mathbf x}$ $\newcommand{\yvec}{\mathbf y}$ $\newcommand{\zvec}{\mathbf z}$ $\newcommand{\rvec}{\mathbf r}$ $\newcommand{\mvec}{\mathbf m}$ $\newcommand{\zerovec}{\mathbf 0}$ $\newcommand{\onevec}{\mathbf 1}$ $\newcommand{\real}{\mathbb R}$ $\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}$ $\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}$ $\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}$ $\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}$ $\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$ $\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$ $\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$ $\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$ $\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}$ $\newcommand{\laspan}[1]{\text{Span}\{#1\}}$ $\newcommand{\bcal}{\cal B}$ $\newcommand{\ccal}{\cal C}$ $\newcommand{\scal}{\cal S}$ $\newcommand{\wcal}{\cal W}$ $\newcommand{\ecal}{\cal E}$ $\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}$ $\newcommand{\gray}[1]{\color{gray}{#1}}$ $\newcommand{\lgray}[1]{\color{lightgray}{#1}}$ $\newcommand{\rank}{\operatorname{rank}}$ $\newcommand{\row}{\text{Row}}$ $\newcommand{\col}{\text{Col}}$ $\renewcommand{\row}{\text{Row}}$ $\newcommand{\nul}{\text{Nul}}$ $\newcommand{\var}{\text{Var}}$ $\newcommand{\corr}{\text{corr}}$ $\newcommand{\len}[1]{\left|#1\right|}$ $\newcommand{\bbar}{\overline{\bvec}}$ $\newcommand{\bhat}{\widehat{\bvec}}$ $\newcommand{\bperp}{\bvec^\perp}$ $\newcommand{\xhat}{\widehat{\xvec}}$ $\newcommand{\vhat}{\widehat{\vvec}}$ $\newcommand{\uhat}{\widehat{\uvec}}$ $\newcommand{\what}{\widehat{\wvec}}$ $\newcommand{\Sighat}{\widehat{\Sigma}}$ $\newcommand{\lt}{<}$ $\newcommand{\gt}{>}$ $\newcommand{\amp}{&}$ $\definecolor{fillinmathshade}{gray}{0.9}$

Review of concepts related to hypothesis tests

1.1 Type I and Type II errors

In hypothesis testing, there are two types of errors:

Type I error: reject null hypothesis when it is true

Type I error rate

\[P(reject \, H_0 | H_0\, true)\]

When testing $H_0$ at a pre-specified level of significance $\alpha$, the Type I error rate is controlled to be no larger than $\alpha$.

Type II error: accept the null hypothesis when it is wrong.

Type II error rate

P(accept $H_0$ | $H_0$ wrong).

Power : probability of rejecting $H_0$ when it is wrong

Power = P(reject $H_0$ | $H_0$ wrong)

= 1 - Type II error rate.

The power of a testing procedure depends on

Significance level $\alpha$ - the maximum allowable Type I error - the larger $\alpha$ is , the higher is the power.
Deviation from $H_0$ - the strength of signal - the larger the deviation is, the higher is the power.
Sample size: the larger the sample size is, the higher is the power.

Power of an F-test

2.1 Power calculation for F-test

Test $H_0$ : $\mu_1$ = $\cdots$ = $\mu_r$ under a single factor ANOVA model: given the significance level $\alpha$ :

Decision rule

$$\left\{\begin{array}{ccc}{\rm reject} H_0 & if & F^{\ast}> F(1-\alpha;r-1,n_T-r)\\{\rm accept} H_0 & if & F^{\ast} \leq F(1-\alpha;r-1,n_T-r)\end{array}\right.$$

The Type I error rate is at most $\alpha$.
Power depends on the noncentrality parameter

$$ \phi=\frac{1}{\sigma}\sqrt{\frac{\sum_{i=1}^r n_i(\mu_i-\mu_{\cdot})^2}{r}}.$$

Note $\phi$ depends on sample size (determined by the $n_i$'s) and signal size (determined by the $(\mu_i - \mu.)^2$'s).

2.2 Distribution of F-ratio under the alternative hypothesis

The distribution of F* under an alternative hypothesis.

When the noncentrality parameter is $\phi$, then

$$ F^{\ast} \sim F_{r-1,n_T-r}(\phi), $$

i.e., a noncentral F-distribution with noncentrality parameter $\phi$.

Power = P($\sim F_{r-1,n_T-r}(\phi)$ > F(1 - $\alpha$;r - 1, $n_T - r$)).
Example: if $\alpha$ = 0.01, r = 4, $n_T$ = 20 and $\phi$ = 2, then Power = 0.61. (Use Table B.11 of the textbook.)

2.3 How to calculate power of the F test using R

The textbook defines the noncentrality parameter for a single factor ANOVA model as

$$ \phi = \frac{1}{\sigma} \sqrt{\frac{\sum_{i=1}^r n_i (\mu_i - \mu_{\cdot})^2}{r}} $$

where r is number of treatment group (factor levels), $\mu_i$'s are the factor level means, $n_i$ is the sample size (number of replicates) corresponding to the i-th treatment group, and $\sigma^2$ is the variance of the measurements.

For a balanced design, i.e., when $n_1$ = $\cdots$ = $n_r$ = n, the formula for $\phi$ reduces to

$$ \phi = \frac{1}{\sigma} \sqrt{(n/r) \sum_{i=1}^r (\mu_i - \mu_{\cdot})^2}~. $$

Table B.11 gives the power of the F test given the values of the numerator degree of freedom $v_1$ = r - 1, denominator degree of freedom $v_2$ = $n_T - r$, level of significance $\alpha$ and noncentrality parameter $\phi$.

Example: For r = 3, n = 5, (so that $v_1$ = 2 and $v_2$ = 12), $\alpha$ = 0.05 and $\phi$ = 2, the value of power from Table B.11 is 0.78.

However, if you want to use R to compute the power of the F-test, you need to be aware that the noncentrality parameter for F distribution in R is defined differently. Indeed, compared to the above setting, the noncentrality parameter to used in the function in R will be r x $\phi^2$ instead of $\phi$. Here is the R code to be used for computing the power in the example described above: r = 3, n = 5, $\alpha$ = 0.05 and $\phi$ = 2:

Critical value for the F-test when $\alpha$ = 0.05, $v_i$ = r - 1 = 2 and $v_2$ = $n_T$ - r = 12 is

F.crit = qf(0.95,2,12)

Then the power of the test, when will be computed as

F.power = 1 - pf(F.crit, 2, 12, 3$*$2^2)

Note that the function qf is used to compute the quantile of the central F-distribution. Its second and third arguments are the numerator and denominator degrees of freedom of the F distribution.
The function pf is used to calculate the probability under the noncentral F-density curve to the left of a given value (in this case F.crit). Its second and third arguments are the numerator and denominator degrees of freedom of the F distribution, while the fourth argument is the noncentrality parameter r x $\phi^2$ (we specify this explicitly in the above example).
The values of F.crit and F.power are 3.885294 and 0.7827158, respectively.

Calculating sample size

God: find the smallest sample size needed to achieve

a pre-specified power $\gamma$;
with a pre-specified Type I error rate $\alpha$;
for at least a pre-specific signal leval $s$.

The idea behind the sample size calculation is as follows:

On one hand, we want the sample size to be large enough to detect practically important deviations ( with a signal size to be at least s) from $H_0$ with high probability (with a power at least $\gamma$), and we only allow for a pre-specified low level of Type I error rate (at most $\alpha$) when there is no signal.
On the other hand, the sample size should not be unnecessarily large such that the cost of the study is too high.

Example $\PageIndex{1}$: sample size calculation

For a single factor study with 4 levels and assuming a balanced design, i.e., the $n_1 = n_2 = n_3 = n_4$ (=n, say), the goal is to test $H_0$: all the factor level means $\mu_i$ are the same.

Question: What should be the sample size for each treatment group under a balanced design, such that the F-test can achieve $\gamma$ = 0.85 power with at most $\alpha$ = 0.05 Type I error rate when the deviation from $H_0$ has at least $s=\sum_{i=1}^{r}(\mu_i-\mu_{\cdot})^2=40$ ?
One additional piece of information needed in order to answer this question is the residual variance $\sigma^2$.
Suppose from a pilot study, we know the residual variance is about $\sigma^2$ = 10.
Use a trial-and-error strategy to search Table B.11. This means, for a given n (starting with n = 1),

(i) calculate $\phi = (1/\sigma) \sqrt{(n/r)\sum_{i=1}^r(\mu_i - \mu_{\cdot})^2} = (1/\sigma) \sqrt{(n/r) s}$;
(ii) fix the numerator degree of freedom $v_1$ = r - 1 = 3;

(iii) check the power of the test when the denominator degree of freedom $v_2 = n_T - r$ (where $n_T$ = nr), with the given $\phi$ and $\alpha$ ;

(iv) keep increasing n until the power of the test is closest to (equal or just above) the given value of $\gamma$.

3.2 An alternative approach to sample size calculation

Suppose that we want to determine the minimum sample size required to attain a certain power of the test subject to a specified value of the maximum discrepancy among the factor level means. In other words, we want the test to attain power $\gamma$ (= 1 - $\beta$, where $\beta$ is the probability of Type II error) when the minimum range of the treatment group means

\[ \Delta = \max_{1\leq i \leq r} \mu_i - \min_{1\leq i \leq r}\mu_i ~. \]

Suppose we have a balanced design, i.e., $n_1 = \cdots = n_r$ = n, say. We want to determine the minimum value of n such that the power of the F test for testing $H_0$ : $\mu_1 = \cdots = \mu_r$ is at least a prespecified value $\gamma = 1 - \beta$.
We need to also specify the level of significance $\alpha$ and the standard deviation of the measurements $\sigma$.
Table B.12 gives the minimum value of n needed to attain a given power 1 - $\beta$ for a given value of $\alpha$, for a given number of treatments r and a given "effect size" $\Delta/\sigma$.
Example : For r = 4, $\alpha$ = 0.05, in order that the F-test achieves the power 1 - $\beta$ = 0.9 when the effect size is $\Delta/\sigma$ = 1.5, we need n to be at least 14. That is, we need a balanced design with at least 14 experimental units in each treatment group.

Contributors

Yingwen Li (UCD)
Debashis Paul (UCD)

Search

Text Color

Text Size

Margin Size

Font Type