Concepts Related to Hypothesis Tests

Last updated
Save as PDF

Page ID: 245

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $ $ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $$\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$ $\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$$\newcommand{\AA}{\unicode[.8,0]{x212B}}$

1 Review of concepts related to hypothesis tests
1. 1.1 Type I and Type II errors
2. 1.2 What determines the power?
2 Power of an F-test
3 Calculating sample size
1. 3.1 An example of sample size calculation
2. 3.2 An alternative approach to sample size calculation
Contributors

1 Review of concepts related to hypothesis tests

1.1 Type I and Type II errors

In hypothesis testing, there are two types of errors

Type I error: reject null hypothesis when it is true
Type I error rate

P(reject $H_0$ | $H_0$ true)

When testing $H_0$ at a pre-specified level of significance $\alpha$, the Type I error rate is controlled to be no larger than $\alpha$.
Type II error: accept the null hypothesis when it is wrong.
Type II error rate

P(accept $H_0$ | $H_0$ wrong).

Power : probability of rejecting $H_0$ when it is wrong

Power = P(reject $H_0$ | $H_0$ wrong)

= 1 - Type II error rate.

1.2 What determines the power?

The power of a testing procedure depends on

Significance level $\alpha$ - the maximum allowable Type I error - the larger $\alpha$ is , the higher is the power.
Deviation from $H_0$ - the strength of signal - the larger the deviation is, the higher is the power.
Sample size: the larger the sample size is, the higher is the power.

2 Power of an F-test

2.1 Power calculation for F-test

Test $H_0$ : $\mu_1$ = $\cdots$ = $\mu_r$ under a single factor ANOVA model: given the significance level $\alpha$ :

Decision rule

$$\left\{\begin{array}{ccc}{\rm reject} H_0 & if & F^{\ast}> F(1-\alpha;r-1,n_T-r)\\{\rm accept} H_0 & if & F^{\ast} \leq F(1-\alpha;r-1,n_T-r)\end{array}\right.$$

The Type I error rate is at most $\alpha$.
Power depends on the noncentrality parameter

$$ \phi=\frac{1}{\sigma}\sqrt{\frac{\sum_{i=1}^r n_i(\mu_i-\mu_{\cdot})^2}{r}}.$$

Note $\phi$ depends on sample size (determined by the $n_i$'s) and signal size (determined by the $(\mu_i - \mu.)^2$'s).

2.2 Distribution of F-ratio under the alternative hypothesis

The distribution of F* under an alternative hypothesis.

When the noncentrality parameter is $\phi$, then

$$ F^{\ast} \sim F_{r-1,n_T-r}(\phi), $$

i.e., a noncentral F-distribution with noncentrality parameter $\phi$.

Power = P($\sim F_{r-1,n_T-r}(\phi)$ > F(1 - $\alpha$;r - 1, $n_T - r$)).
Example: if $\alpha$ = 0.01, r = 4, $n_T$ = 20 and $\phi$ = 2, then Power = 0.61. (Use Table B.11 of the textbook.)

2.3 How to calculate power of the F test using R

The textbook defines the noncentrality parameter for a single factor ANOVA model as

$$ \phi = \frac{1}{\sigma} \sqrt{\frac{\sum_{i=1}^r n_i (\mu_i - \mu_{\cdot})^2}{r}} $$

where r is number of treatment group (factor levels), $\mu_i$'s are the factor level means, $n_i$ is the sample size (number of replicates) corresponding to the i-th treatment group, and $\sigma^2$ is the variance of the measurements.

For a balanced design, i.e., when $n_1$ = $\cdots$ = $n_r$ = n, the formula for $\phi$ reduces to

$$ \phi = \frac{1}{\sigma} \sqrt{(n/r) \sum_{i=1}^r (\mu_i - \mu_{\cdot})^2}~. $$

Table B.11 gives the power of the F test given the values of the numerator degree of freedom $v_1$ = r - 1, denominator degree of freedom $v_2$ = $n_T - r$, level of significance $\alpha$ and noncentrality parameter $\phi$.

Example: For r = 3, n = 5, (so that $v_1$ = 2 and $v_2$ = 12), $\alpha$ = 0.05 and $\phi$ = 2, the value of power from Table B.11 is 0.78.

However, if you want to use R to compute the power of the F-test, you need to be aware that the noncentrality parameter for F distribution in R is defined differently. Indeed, compared to the above setting, the noncentrality parameter to used in the function in R will be r x $\phi^2$ instead of $\phi$. Here is the R code to be used for computing the power in the example described above: r = 3, n = 5, $\alpha$ = 0.05 and $\phi$ = 2:

Critical value for the F-test when $\alpha$ = 0.05, $v_i$ = r - 1 = 2 and $v_2$ = $n_T$ - r = 12 is

F.crit = qf(0.95,2,12)

Then the power of the test, when will be computed as

F.power = 1 - pf(F.crit, 2, 12, 3$*$2^2)

Note that the function qf is used to compute the quantile of the central F distribution. Its second and third arguments are the numerator and denominator degrees of freedom of the F distribution.
The function pf is used to calculate the probability under the noncentral F- density curve to the left of a given value (in this case F.crit). Its second and third arguments are the numerator and denominator degrees of freedom of the F distribution, while the fourth argument is the noncentrality parameter r x $\phi^2$ (we specify this explicitly in the above example).
The values of F.crit and F.power are 3.885294 and 0.7827158, respectively.

3 Calculating sample size

God: find the smallest sample size needed to achieve

a pre-specified power $\gamma$;
with a pre-specified Type I error rate $\alpha$;
for at least a pre-specifiec signal leval s.

The idea behind the sample size calculation is as follows:

On one hand, we want the sample size to be large enough to detect practically important deviations ( with a signal size to be at least s) from $H_0$ with high probability (with a power at least $\gamma$), and we only allow for a pre-specified low level of Type I error rate (at most $\alpha$) when there is no signal.
On the other hand, the sample size should not be unnecessarily large such that the cost of the study is too high.

3.1 An example of sample size calculation

For a single factor study with 4 levels and assuming a balanced design, i.e., the $n_1 = n_2 = n_3 = n_4$ (=n, say), the goal is to test $H_0$: all the factor level means $\mu_i$ are the same.
Question: What should be the sample size for each treatment group under a balanced design, such that the F-test can achieve $\gamma$ = 0.85 power with at most $\alpha$ = 0.05 Type I error rate when the deviation from $H_0$ has at least $s=\sum_{i=1}^{r}(\mu_i-\mu_{\cdot})^2=40$ ?
One additional piece of information needed in order to answer this question is the residual variance $\sigma^2$.
Suppose from a pilot study, we know the residual variance is about $\sigma^2$ = 10.
Use a trial-and-error strategy to search Table B.11. This means, for a given n (starting with n = 1),

(i) calculate $\phi = (1/\sigma) \sqrt{(n/r)\sum_{i=1}^r(\mu_i - \mu_{\cdot})^2} = (1/\sigma) \sqrt{(n/r) s}$;
(ii) fix the numerator degree of freedom $v_1$ = r - 1 = 3;

(iii) check the power of the test when the denominator degree of freedom $v_2 = n_T - r$ (where $n_T$ = nr), with the given $\phi$ and $\alpha$ ;

(iv) keep increasing n until the power of the test is closest to (equal or just above) the given value of $\gamma$.

3.2 An alternative approach to sample size calculation

Suppose that we want to determine the minimum sample size required to attain a certain power of the test subject to a specified value of the maximum discrepancy among the factor level means. In other words, we want the test to attain power $\gamma$ (= 1 - $\beta$, where $\beta$ is the probability of Type II error) when the minimum range of the treatment group means

\[ \Delta = \max_{1\leq i \leq r} \mu_i - \min_{1\leq i \leq r}\mu_i ~. \]

Suppose we have a balanced design, i.e., $n_1 = \cdots = n_r$ = n, say. We want to determine the minimum value of n such that the power of the F test for testing $H_0$ : $\mu_1 = \cdots = \mu_r$ is at least a prespecified value $\gamma = 1 - \beta$.
We need to also specify the level of significance $\alpha$ and the standard deviation of the measurements $\sigma$.
Table B.12 gives the minimum value of n needed to attain a given power 1 - $\beta$ for a given value of $\alpha$, for a given number of treatments r and a given "effect size" $\Delta/\sigma$.
Example : For r = 4, $\alpha$ = 0.05, in order that the F-test achieves the power 1 - $\beta$ = 0.9 when the effect size is $\Delta/\sigma$ = 1.5, we need n to be at least 14. That is, we need a balanced design with at least 14 experimental units in each treatment group.

Contributors

Yingwen Li (UCD)
Debashis Paul (UCD)