Skip to main content
Statistics LibreTexts

8.1: Why hypothesis testing and Type I and Type II Errors

  • Page ID
    58924
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Before we dive into numbers or formulas, we need to zoom out and ask an essential question:

    What are we trying to figure out?
    In hypothesis testing, we start with a belief or assumption about the world and then use data to test whether that belief is reasonable or needs to be reconsidered.

    Example Scenarios: What Are We Testing?

    Let’s look at four common scenarios where hypothesis testing appears in real life. Each one sets up a basic question we're trying to answer and helps illustrate what a hypothesis really is.

    1. Is this new drug more effective than the standard treatment?

      This question is not about a single patient; it’s about a larger population. We’re comparing two treatments, looking for evidence that one is better than the other, and verifying that any observed difference is unlikely to be due to chance.

    2. Does an after-school program improve math scores?

      We hypothesize that students who attend the program will perform better than those who do not. But even if we see higher scores, we have to ask: Is this improvement due to the program, or is it just random variation?

    3. Is this machine producing parts that meet design standards?

      The manufacturer assumes the process is working correctly. A hypothesis test might be triggered if defective parts are suspected, to test whether the process has shifted away from acceptable levels.

    4. Do more than 50% of local voters support the new policy?

      A political scientist sets up a hypothesis involving population proportion: they want to know if support exceeds a threshold. The test helps estimate whether the opinion measured in a sample reflects the entire population.

    In all of these examples, we are doing some form of comparison: new vs. old, program vs. no program, actual vs. expected. And in each case, we are trying to use evidence (our sample) to make a judgment about the entire population. A hypothesis gives us something to test.

    Definition: Null Hypothesis

    The null hypothesis, written as \( H_0 \), is a formal statement that there is no effect, no difference, or no change in the population. It represents a starting assumption that we test using data.

    In a hypothesis test, we assume the null hypothesis is true and then use sample data to evaluate whether there is strong enough evidence to reject it.

    Examples of Null Hypotheses:

    • The new drug has the same recovery rate as the current one.
    • The average score this year is still 75 (same as last year).
    • There is no difference between Group A and Group B.
    • The true population proportion is 50%.

    Key idea: The null hypothesis is the claim we test against — and either reject or fail to reject based on the strength of the evidence.

    Hypothesis testing, showing options to "Reject H₀" and "Fail to reject H₀" with labels.

    Key idea: Every hypothesis test starts with a question. But that leads to an important follow-up: What if we’re wrong?


    What If We’re Wrong?

    When we run a hypothesis test, we examine a claim using sample data. Based on this, we’ll make a decision: Do we reject the claim or not? But no matter how carefully we run the test, we have to face this key idea:

    There’s always a chance we’re wrong.
    Our conclusion might not match what's actually true in the population.

    We can think of the universe as having a “back of the book” answer. It's hidden from us. We only get to see a sample, and then we try to infer what the correct answer is. Sometimes, we’ll get it right. But sometimes we’ll make mistakes.

    There Are Two Kinds of Mistakes

    When we work through a hypothesis test, two outcomes can happen:

    • We correctly reject a false claim or fail to reject a true claim.
    • Or, we make one of two classic errors:

    Type I Error

    This happens when the null hypothesis is actually true but our test leads us to wrongly reject it.

    “We thought there was an effect, but there wasn’t.”

    Type II Error

    This happens when the null hypothesis is actually false but the test fails to detect the difference, and we wrongly keep it.

    “There really was a difference, but we didn’t find it.”

    Error Decision Table

    This table shows all four possible situations:

    Table for reference for type 1 and type 2 errors.
    Reality (Truth) We Reject Null Hypothesis We Fail to Reject Null Hypothesis
    Null is True ❌ Type I Error ✅ Correct Decision
    Null is False ✅ Correct Decision ❌ Type II Error

    Understanding error helps us stay humble. Statistics will never “prove” anything with absolute certainty, but it can help us make informed decisions and assess how much risk of error we’re willing to tolerate.

    Reflection:
    Think of a time when someone made a wrong assumption based on partial information. What kind of error would that be? How might they have avoided it?

    This page titled 8.1: Why hypothesis testing and Type I and Type II Errors is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Mathematics Department.

    • Was this article helpful?