Concepts Related to Hypothesis Tests

Last updated
Save as PDF

Page ID: 249

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $ $ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $$\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$ $\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$$\newcommand{\AA}{\unicode[.8,0]{x212B}}$

Review of concepts related to hypothesis tests

1.1 Type I and Type II errors

In hypothesis testing, there are two types of errors:

Type I error: reject null hypothesis when it is true

Type I error rate

\[P(reject \, H_0 | H_0\, true)\]

When testing $H_0$ at a pre-specified level of significance $\alpha$, the Type I error rate is controlled to be no larger than $\alpha$.

Type II error: accept the null hypothesis when it is wrong.

Type II error rate

P(accept $H_0$ | $H_0$ wrong).

Power : probability of rejecting $H_0$ when it is wrong

Power = P(reject $H_0$ | $H_0$ wrong)

= 1 - Type II error rate.

The power of a testing procedure depends on

Significance level $\alpha$ - the maximum allowable Type I error - the larger $\alpha$ is , the higher is the power.
Deviation from $H_0$ - the strength of signal - the larger the deviation is, the higher is the power.
Sample size: the larger the sample size is, the higher is the power.

Power of an F-test

2.1 Power calculation for F-test

Test $H_0$ : $\mu_1$ = $\cdots$ = $\mu_r$ under a single factor ANOVA model: given the significance level $\alpha$ :

Decision rule

$$\left\{\begin{array}{ccc}{\rm reject} H_0 & if & F^{\ast}> F(1-\alpha;r-1,n_T-r)\\{\rm accept} H_0 & if & F^{\ast} \leq F(1-\alpha;r-1,n_T-r)\end{array}\right.$$

The Type I error rate is at most $\alpha$.
Power depends on the noncentrality parameter

$$ \phi=\frac{1}{\sigma}\sqrt{\frac{\sum_{i=1}^r n_i(\mu_i-\mu_{\cdot})^2}{r}}.$$

Note $\phi$ depends on sample size (determined by the $n_i$'s) and signal size (determined by the $(\mu_i - \mu.)^2$'s).

2.2 Distribution of F-ratio under the alternative hypothesis

The distribution of F* under an alternative hypothesis.

When the noncentrality parameter is $\phi$, then

$$ F^{\ast} \sim F_{r-1,n_T-r}(\phi), $$

i.e., a noncentral F-distribution with noncentrality parameter $\phi$.

Power = P($\sim F_{r-1,n_T-r}(\phi)$ > F(1 - $\alpha$;r - 1, $n_T - r$)).
Example: if $\alpha$ = 0.01, r = 4, $n_T$ = 20 and $\phi$ = 2, then Power = 0.61. (Use Table B.11 of the textbook.)

2.3 How to calculate power of the F test using R

The textbook defines the noncentrality parameter for a single factor ANOVA model as

$$ \phi = \frac{1}{\sigma} \sqrt{\frac{\sum_{i=1}^r n_i (\mu_i - \mu_{\cdot})^2}{r}} $$

where r is number of treatment group (factor levels), $\mu_i$'s are the factor level means, $n_i$ is the sample size (number of replicates) corresponding to the i-th treatment group, and $\sigma^2$ is the variance of the measurements.

For a balanced design, i.e., when $n_1$ = $\cdots$ = $n_r$ = n, the formula for $\phi$ reduces to

$$ \phi = \frac{1}{\sigma} \sqrt{(n/r) \sum_{i=1}^r (\mu_i - \mu_{\cdot})^2}~. $$

Table B.11 gives the power of the F test given the values of the numerator degree of freedom $v_1$ = r - 1, denominator degree of freedom $v_2$ = $n_T - r$, level of significance $\alpha$ and noncentrality parameter $\phi$.

Example: For r = 3, n = 5, (so that $v_1$ = 2 and $v_2$ = 12), $\alpha$ = 0.05 and $\phi$ = 2, the value of power from Table B.11 is 0.78.

However, if you want to use R to compute the power of the F-test, you need to be aware that the noncentrality parameter for F distribution in R is defined differently. Indeed, compared to the above setting, the noncentrality parameter to used in the function in R will be r x $\phi^2$ instead of $\phi$. Here is the R code to be used for computing the power in the example described above: r = 3, n = 5, $\alpha$ = 0.05 and $\phi$ = 2:

Critical value for the F-test when $\alpha$ = 0.05, $v_i$ = r - 1 = 2 and $v_2$ = $n_T$ - r = 12 is

F.crit = qf(0.95,2,12)

Then the power of the test, when will be computed as

F.power = 1 - pf(F.crit, 2, 12, 3$*$2^2)

Note that the function qf is used to compute the quantile of the central F-distribution. Its second and third arguments are the numerator and denominator degrees of freedom of the F distribution.
The function pf is used to calculate the probability under the noncentral F-density curve to the left of a given value (in this case F.crit). Its second and third arguments are the numerator and denominator degrees of freedom of the F distribution, while the fourth argument is the noncentrality parameter r x $\phi^2$ (we specify this explicitly in the above example).
The values of F.crit and F.power are 3.885294 and 0.7827158, respectively.

Calculating sample size

God: find the smallest sample size needed to achieve

a pre-specified power $\gamma$;
with a pre-specified Type I error rate $\alpha$;
for at least a pre-specific signal leval $s$.

The idea behind the sample size calculation is as follows:

On one hand, we want the sample size to be large enough to detect practically important deviations ( with a signal size to be at least s) from $H_0$ with high probability (with a power at least $\gamma$), and we only allow for a pre-specified low level of Type I error rate (at most $\alpha$) when there is no signal.
On the other hand, the sample size should not be unnecessarily large such that the cost of the study is too high.

Example $\PageIndex{1}$: sample size calculation

For a single factor study with 4 levels and assuming a balanced design, i.e., the $n_1 = n_2 = n_3 = n_4$ (=n, say), the goal is to test $H_0$: all the factor level means $\mu_i$ are the same.

Question: What should be the sample size for each treatment group under a balanced design, such that the F-test can achieve $\gamma$ = 0.85 power with at most $\alpha$ = 0.05 Type I error rate when the deviation from $H_0$ has at least $s=\sum_{i=1}^{r}(\mu_i-\mu_{\cdot})^2=40$ ?
One additional piece of information needed in order to answer this question is the residual variance $\sigma^2$.
Suppose from a pilot study, we know the residual variance is about $\sigma^2$ = 10.
Use a trial-and-error strategy to search Table B.11. This means, for a given n (starting with n = 1),

(i) calculate $\phi = (1/\sigma) \sqrt{(n/r)\sum_{i=1}^r(\mu_i - \mu_{\cdot})^2} = (1/\sigma) \sqrt{(n/r) s}$;
(ii) fix the numerator degree of freedom $v_1$ = r - 1 = 3;

(iii) check the power of the test when the denominator degree of freedom $v_2 = n_T - r$ (where $n_T$ = nr), with the given $\phi$ and $\alpha$ ;

(iv) keep increasing n until the power of the test is closest to (equal or just above) the given value of $\gamma$.

3.2 An alternative approach to sample size calculation

Suppose that we want to determine the minimum sample size required to attain a certain power of the test subject to a specified value of the maximum discrepancy among the factor level means. In other words, we want the test to attain power $\gamma$ (= 1 - $\beta$, where $\beta$ is the probability of Type II error) when the minimum range of the treatment group means

\[ \Delta = \max_{1\leq i \leq r} \mu_i - \min_{1\leq i \leq r}\mu_i ~. \]

Suppose we have a balanced design, i.e., $n_1 = \cdots = n_r$ = n, say. We want to determine the minimum value of n such that the power of the F test for testing $H_0$ : $\mu_1 = \cdots = \mu_r$ is at least a prespecified value $\gamma = 1 - \beta$.
We need to also specify the level of significance $\alpha$ and the standard deviation of the measurements $\sigma$.
Table B.12 gives the minimum value of n needed to attain a given power 1 - $\beta$ for a given value of $\alpha$, for a given number of treatments r and a given "effect size" $\Delta/\sigma$.
Example : For r = 4, $\alpha$ = 0.05, in order that the F-test achieves the power 1 - $\beta$ = 0.9 when the effect size is $\Delta/\sigma$ = 1.5, we need n to be at least 14. That is, we need a balanced design with at least 14 experimental units in each treatment group.

Contributors

Yingwen Li (UCD)
Debashis Paul (UCD)