8.1: Why hypothesis testing and Type I and Type II Errors
- Page ID
- 58924
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Before we dive into numbers or formulas, we need to zoom out and ask an essential question:
In hypothesis testing, we start with a belief or assumption about the world and then use data to test whether that belief is reasonable or needs to be reconsidered.
Example Scenarios: What Are We Testing?
Let’s look at four common scenarios where hypothesis testing appears in real life. Each one sets up a basic question we're trying to answer and helps illustrate what a hypothesis really is.
- Is this new drug more effective than the standard treatment?
This question is not about a single patient; it’s about a larger population. We’re comparing two treatments, looking for evidence that one is better than the other, and verifying that any observed difference is unlikely to be due to chance.
- Does an after-school program improve math scores?
We hypothesize that students who attend the program will perform better than those who do not. But even if we see higher scores, we have to ask: Is this improvement due to the program, or is it just random variation?
- Is this machine producing parts that meet design standards?
The manufacturer assumes the process is working correctly. A hypothesis test might be triggered if defective parts are suspected, to test whether the process has shifted away from acceptable levels.
- Do more than 50% of local voters support the new policy?
A political scientist sets up a hypothesis involving population proportion: they want to know if support exceeds a threshold. The test helps estimate whether the opinion measured in a sample reflects the entire population.
In all of these examples, we are doing some form of comparison: new vs. old, program vs. no program, actual vs. expected. And in each case, we are trying to use evidence (our sample) to make a judgment about the entire population. A hypothesis gives us something to test.
Definition: Null Hypothesis
The null hypothesis, written as \( H_0 \), is a formal statement that there is no effect, no difference, or no change in the population. It represents a starting assumption that we test using data.
In a hypothesis test, we assume the null hypothesis is true and then use sample data to evaluate whether there is strong enough evidence to reject it.
Examples of Null Hypotheses:
- The new drug has the same recovery rate as the current one.
- The average score this year is still 75 (same as last year).
- There is no difference between Group A and Group B.
- The true population proportion is 50%.
Key idea: The null hypothesis is the claim we test against — and either reject or fail to reject based on the strength of the evidence.
Key idea: Every hypothesis test starts with a question. But that leads to an important follow-up: What if we’re wrong?
What If We’re Wrong?
When we run a hypothesis test, we examine a claim using sample data. Based on this, we’ll make a decision: Do we reject the claim or not? But no matter how carefully we run the test, we have to face this key idea:
Our conclusion might not match what's actually true in the population.
We can think of the universe as having a “back of the book” answer. It's hidden from us. We only get to see a sample, and then we try to infer what the correct answer is. Sometimes, we’ll get it right. But sometimes we’ll make mistakes.
There Are Two Kinds of Mistakes
When we work through a hypothesis test, two outcomes can happen:
- We correctly reject a false claim or fail to reject a true claim.
- Or, we make one of two classic errors:
Type I Error
This happens when the null hypothesis is actually true but our test leads us to wrongly reject it.
“We thought there was an effect, but there wasn’t.”
Type II Error
This happens when the null hypothesis is actually false but the test fails to detect the difference, and we wrongly keep it.
“There really was a difference, but we didn’t find it.”
Error Decision Table
This table shows all four possible situations:
| Reality (Truth) | We Reject Null Hypothesis | We Fail to Reject Null Hypothesis |
|---|---|---|
| Null is True | ❌ Type I Error | ✅ Correct Decision |
| Null is False | ✅ Correct Decision | ❌ Type II Error |
Understanding error helps us stay humble. Statistics will never “prove” anything with absolute certainty, but it can help us make informed decisions and assess how much risk of error we’re willing to tolerate.
Think of a time when someone made a wrong assumption based on partial information. What kind of error would that be? How might they have avoided it?


