Skip to main content
Statistics LibreTexts

8: Probability, Hypothesis Testing, Type I and Type II Errors

  • Page ID
    50670
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)
    Learning Objectives

    By the end of this chapter, you will be able to:

    • Understand how to test if a result is significant
    • Distinguish between Type I and Type II errors

    Key Terms

    • Probability
    • Hypothesis

    Recap: Statistical inference (AKA all about that p value)

    We are in a descriptive statistics camp. So far, we have established the normal distribution, central tendency, variability, standard error, and central limit theorem. These concepts are the building blocks that will lead us to the basis for the next camp: inferential statistics.

    The hallmark of inferential statistics is inferring. Inferring is similar to choosing, deciding, and making the best guess possible if we have a significant or non-significant effect.

    A significant effect is something likely not happening based on chance alone. Chance alone is synonymous with randomness. In statistics, we hate random. That means nothing is happening. Anything that does happen is out of our control. Anything can happen.

    In research, we want to find something that explains what we see. In psychology, we want to say that what we do makes a difference in our clients' lives. Clients do not get randomly better. Clients who see psychologists do much better than those who do something on their own. In social justice, we want to say we are doing something to create change for the better. We do not like relying on time to heal social problems. We need to do something to make the positive changes we want to see.

    That something, that interesting finding, that nonrandom finding, that significant finding, is usually in the form of a pattern. Patterns are usually in these two forms.

    • Finding differences among groups
      • Group A is higher / lower than Group B (your F-test, t-tests).
      • If there is no pattern, Group A is similar to Group B.
    • Finding associations among variables
      • As Variable X increases, Variable Y increases / decreases (your correlations, regression)
      • If there is no pattern, we say, there is no relationship between Variables X and Y.

    These are the patterns found in the conventional ubiquitous parametric statistics. These statistics are your t-tests, F-tests, correlations, and regressions. Parametric statistics means the statistical test is based on population parameters, which are based on the central limit theorem and include the normal distribution and standard error of the mean. Yes, there are other patterns based on non-parametric statistics, such as chi-square, and other patterns based on multivariate statistics. To keep it simple, we see two broad patterns using conventional parametric statistics.

    • 8.1: The Role of Probability
      This page explains the concept of significance in statistics, emphasizing that statistical results should be compared to a control group rather than random chance. It highlights the importance of how comparisons are designed for research validity and warns of potential biases that can arise, particularly in political discussions like climate change. While results may appear statistically significant, it's essential to interpret them cautiously, recognizing the role of random chance in outcomes.
    • 8.2: Statistical Decision - Is a Result Significant or Not Significant?
      This page discusses the significance of research results, highlighting that significant results are rare and meaningful, while non-significant results are common and indicate randomness. Significant outcomes are desirable for demonstrating treatment effects, but there are situations where non-significant results are preferred, such as showing equivalency between groups or no relationship between variables. The context of the results is vital for proper interpretation.
    • 8.3: Hypotheses for Statistical Inference
      This page discusses structuring decision-making in hypothesis testing, highlighting the roles of the null hypothesis (H0) and alternative hypothesis (HA). It emphasizes the goal of rejecting H0 to signify significant findings and suggests that framing hypotheses as clear predictions enhances understanding. For clarity in reporting, it recommends stating predictions succinctly and confirming them in the results section.
    • 8.4: How to Determine if a Result is Significant
      This page explains the alpha level (α or p), which is the probability threshold for statistical significance in research, usually set at 0.05. This indicates a 5% chance of results being due to random chance, leading to the rejection of the null hypothesis if results are in the distribution's low-probability area. Common results are near the mean, while significant findings are rare, occurring in the distribution's tails. The 5% standard balances the risks of Type I and Type II errors.
    • 8.5: Interpreting a Statistical Test Result
      This page discusses statistical inference, emphasizing the importance of test results and p values. Significant findings are indicated by higher test results (above 1.96 for t-tests) and p values below .05. There is an inverse relationship between test results and p values; as one increases, the other decreases. It highlights the need to prioritize p values for accurate interpretation, as test outcomes can be influenced by factors such as sample size.
    • 8.6: Testing a Hypothesis
      This page provides guidelines for interpreting t-test results when comparing treatment and control groups. It advises setting a significance level (α) at 0.05, calculating the t-test and p-values, and making decisions on the null hypothesis based on these metrics. It also emphasizes writing results in APA style, including how to report both significant and non-significant findings, while cautioned against using terms like "highly significant" and specifies formatting for p-values.
    • 8.7: How Do We Get the P Value for the Statistical Test?
      This page discusses how p values are generated in statistical tests like t-tests, emphasizing the reliance on t-distributions due to unknown population parameters. It notes that researchers establish a significance level (e.g., p < .05) and that the t-value adjusts according to sample size, being 1.96 for large samples (n >= 30). The text highlights the complexity of p value calculations, which are mostly handled by statistical software, rendering manual calculations unnecessary for researchers.
    • 8.8: Confidence Intervals
      This page highlights the necessity of reporting confidence intervals (CIs) with p values in statistical analysis for better reliability assessment of results. CIs provide ranges for population parameters and their exclusion of zero indicates significance, paralleling p value interpretations. The American Psychological Association advocates for this practice alongside effect sizes to deliver a thorough understanding of statistical findings, emphasizing precision and uncertainty in estimates.
    • 8.9: Type I and Type II Errors
      This page explores significance testing in statistics, which assesses whether observations result from chance or reflect a true relationship. It underscores the challenges of estimating population parameters from samples and the potential for error. The author uses an analogy of choosing a life partner to illustrate decision-making under uncertainty and emphasizes the need to avoid Type I (false positives) and Type II (false negatives) errors in statistical analysis.
    • 8.10: The Definition of Type I and Type II Errors
      This page analyzes Type I and Type II errors in statistics, emphasizing the limitations of relying solely on p-values, particularly the traditional .05 threshold. It discusses how these errors impact research validity, the importance of replication for confidence in results, and the potential manipulation of p-values leading to misleading conclusions.
    • 8.11: Discussion Questions
      This page outlines the steps to determine the significance of a result, which include defining a hypothesis, conducting an experiment, calculating a p-value, and comparing it to a significance level. It highlights the importance of understanding significance to assess reliability and explains Type I and II errors. The page cautions against overemphasizing p-values, as they can be affected by sample size and may not represent practical significance or real-world relevance.


    This page titled 8: Probability, Hypothesis Testing, Type I and Type II Errors is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Peter Ji.