Skip to main content
Statistics LibreTexts

8.1: Hypothesis Testing with t-Tests

  • Page ID
    56648
    • Linda R. Cote, Rupa G. Gordon, Chrislyn E. Randell, Judy Schmitt, and Helena Marvin
    • University of Missouri System

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    In Chapter 7, we made a big leap from basic descriptive statistics into full hypothesis testing and inferential statistics. For the rest of the unit, we will be learning new tests, each of which is just a small adjustment on the test before it. In this chapter, we will learn about the first of three t tests, and we will learn a new method of testing the null hypothesis: confidence intervals.

    The t-statistic

    In Chapter 7, we were introduced to hypothesis testing using the z statistic for sample means that we learned in Unit 1. This was a useful way to link the material and ease us into the new way of looking at data, but it isn’t a very common test because it relies on knowing the population’s standard deviation, \(\sigma \), which is rarely going to be the case. Instead, we will estimate that parameter \(\sigma \) using the sample statistic s in the same way that we estimate \(\mu \) using M (\(\mu \) will still appear in our formulas because we suspect something about its value, and that is what we are testing). Our new statistic is called t, and for testing one population mean using a single sample (called a one-sample t-test), it takes the form:

    \[t = \frac{M-\mu}{s_m} = \frac{M-\mu}{s/\sqrt{n}} \nonumber \]

    Notice that t looks almost identical to z; this is because they test the exact same thing: the value of a sample mean compared to what we expect of the population. The only difference is that the standard error is now denoted \(s_M \) to indicate that we use the sample statistic for standard deviation, s, instead of the population parameter \(\sigma \). The process of using and interpreting the standard error and the full test statistic remains exactly the same.

    In Chapter 3, we learned that the formulas for sample standard deviation and population standard deviation differ by one key factor: the denominator for the parameter is A black, uppercase letter N in a serif font on a light gray background. but the denominator for the statistic is N − 1, also known as degrees of freedom, df. Because we are using a new measure of spread, we can no longer use the standard normal distribution and the z table to find our critical values. For t-tests, we will use the t distribution and t table (available in section 16.2) to find these values.

    The t distribution, like the standard normal distribution, is symmetric and normally distributed with a mean of 0 and standard error (as the measure of standard deviation for sampling distributions) of 1. However, because the calculation of standard error uses degrees of freedom, there will be a different t distribution for every degree of freedom. Luckily, they all work exactly the same, so in practice, this difference is minor.

    Figure \(\PageIndex{1}\) shows four curves: a normal distribution curve labeled z, and three t distribution curves for 2, 10, and 30 degrees of freedom. Two things should stand out: First, for lower degrees of freedom (e.g., 2), the tails of the distribution are much fatter, meaning a larger proportion of the area under the curve falls in the tail. This means that we will have to go farther out into the tail to cut off the portion corresponding to 5% or \(\alpha = .05 \), which will in turn lead to higher critical values. Second, as the degrees of freedom increase, we get closer and closer to the z curve. Even the distribution with df = 30, corresponding to a sample size of just 31 people, is nearly indistinguishable from z. In fact, a t distribution with infinite degrees of freedom (theoretically, of course) is exactly the standard normal distribution. Because of this, the bottom row of the t table also includes the critical values for z tests at the specific significance levels. Even though these curves are very close, it is still important to use the correct table and critical values, because small differences can add up quickly.

    A graph shows three overlapping bell curves labeled σ² = 30, σ² = 50, and σ² = 80, with increasing spread as σ² increases, centered at the same mean.
    Figure \(\PageIndex{1}\): Distributions comparing effects of degrees of freedom. (“Distributions Comparing Effects of Degrees of Freedom” by Judy Schmitt is licensed under CC BY-NC-SA 4.0.)

    The t distribution table lists critical values for one- and two-tailed tests at several levels of significance, arranged into columns. The rows of the t table list degrees of freedom up to df = 100 in order to use the appropriate distribution curve. It does not, however, list all possible degrees of freedom in this range, because that would take too many rows. Above df = 40, the rows jump in increments of 10. If a problem requires you to find critical values and the exact degrees of freedom is not listed, you always round down to the next smallest number. For example, if you have 48 people in your sample, the degrees of freedom are N − 1 = 48 − 1 = 47; however, 47 doesn’t appear on our table, so we round down and use the critical values for df = 40, even though 50 is closer. We do this because it avoids inflating Type I error (false positives, see Chapter 7) by using criteria that are too lax.

    Video: z-Statistics vs. t-Statistics

    z-Statistics vs. t-Statistics on YouTube.

    Hypothesis Testing with t

    Hypothesis testing with the t statistic works exactly the same way as z tests did, following the four-step process of (1) stating the hypotheses, (2) finding the critical values, (3) computing the test statistic and effect size, and (4) making the decision.

    Example Oil Change Speed

    We will work through an example: Let’s say that you move to a new city and find an auto shop to change your oil. Your old mechanic did the job in about 30 minutes (although you never paid close enough attention to know how much that varied), and you suspect that your new shop takes much longer. After four oil changes, you think you have enough evidence to demonstrate this.

    Step 1: State the Hypotheses

    Our hypotheses for one-sample t-tests are identical to those we used for z-tests. We still state the null and alternative hypotheses mathematically in terms of the population parameter and written out in readable English. For our example:

    \[
    \begin{aligned}
    H_0 &: \text{There is no difference in the average time to change a car's oil} \\
    H_0 &: \mu = 30 \\[2.5ex]
    H_A &: \text{This shop takes longer to change oil than your old mechanic} \\
    H_A &: \mu > 30 \\[2.5ex]
    \end{aligned}
    \nonumber
    \]

    Step 2: Find the Critical Values

    As noted above, our critical values still delineate the area in the tails under the curve corresponding to our chosen level of significance. Because we have no reason to change significance levels, we will use \(\alpha = .05 \), and because we suspect a direction of effect, we have a one-tailed test. To find our critical values for t, we need to add one more piece of information: the degrees of freedom. For this example:

    \[\Large df = N - 1 = 4 - 1 = 3 \nonumber \]

    Going to our t table, a portion of which is found in Table 8.1, we locate the column corresponding to our one-tailed significance level of .05 and find where it intersects with the row for 3 degrees of freedom. As we can see in Table 8.1, our critical value is t* = 2.353. (The complete t table can be found in section 16.2.)

    The \(t\) table lists critical values for one- and two-tailed tests at several levels of significance, arranged into columns. Each column contains two levels of significance, separated by a slash (/). These two values refer to the level of significance for a one-tailed test (i.e., the proportion in one tail) / the level of significance for a two-tailed test (i.e., the proportion in two tails combined).

    Table \(\PageIndex{1}\): Snippet of the t distribution table (t table).
    df .25 / .50 .20 / .40 .15 / .30 .10 / .20 .05 / .10 .025 / .05 .01 / .02 .005 / .01 .0005 / .001
    1 1.000 1.376 1.963 3.078 6.314 12.706 31.821 63.657 636.578
    2 0.816 1.061 1.386 1.886 2.920 4.303 6.965 9.925 31.600
    3 0.765 1.978 1.250 1.638 2.353 3.182 4.541 5.841 12.924
    4 0.741 1.941 1.190 1.533 2.132 2.776 3.747 4.604 8.610
    5 0.727 0.920 1.156 1.476 2.015 2.571 3.365 4.032 6.869

    We can then shade this region on our t distribution to visualize our rejection region (Figure \(\PageIndex{2}\)).

    A bell curve with a shaded red area under the tail to the right of t = 2.353, representing the probability in that region.
    Figure \(\PageIndex{2}\): Rejection region. (“Rejection Region t = 2.353” by Judy Schmitt is licensed under CC BY-NC-SA 4.0.)

    Step 3: Calculate the Test Statistic and Effect Size

    The four wait times you experienced for your oil changes at the new shop were 46 minutes, 58 minutes, 40 minutes, and 71 minutes. We will use these to calculate M and s by first filling in the sum of squares in Table \(\PageIndex{2}\).

    Table \(\PageIndex{2}\): Sum of squares
    X X - M (X - M)2
    46 -7.75 60.06
    58 4.25 18.06
    40 -13.75 189.06
    71 17.25 297.56
    \(\Sigma_X = 215\) \(\Sigma_{(X-M)} = 0\) \(\Sigma_{(X-M)^2}=564.74\)

    After filling in the first row to get \(\Sigma_X = 215\), we find that the mean is M = 53.75 (215 divided by sample size 4), which allows us to fill in the rest of the table to get our sum of squares SS = 564.74, which we then plug in to the formula for standard deviation from Chapter 3:

    \[\Large s = \sqrt{\frac{\sum(X-M)^{2}}{N-1}} = \sqrt{\frac{SS}{df}} = \sqrt{\frac{564.74}{3}}= \sqrt{188.24\overline{6}} = 13.72 \nonumber \]

    Next, we take this value and plug it into the formula for standard error:

    \[\Large s_M = \frac{s}{\sqrt{n}} = \frac{13.72}{2} = 6.86 \nonumber \]

    And, finally, we put the standard error, sample mean, and null hypothesis value into the formula for our test statistic t:

    \[\Large t = \frac{M-\mu}{s_M} = \frac{53.75-30}{6.86} = \frac{23.75}{6.86} = 3.46 \nonumber \]

    This may seem like a lot of steps, but it is really just taking our raw data to calculate one value at a time and carrying that value forward into the next equation: data sample size/degrees of freedom mean sum of squares standard deviation standard error test statistic. At each step, we simply match the symbols of what we just calculated to where they appear in the next formula to make sure we are plugging everything in correctly.

    Next, we need to calculate an effect size, which is still Cohen’s d, but now we use s in place of \(\sigma \):

    \[\Large d = \frac{M-\mu}{s} = \frac{53.75-30.00}{13.72} = 1.73 \nonumber \]

    This is a large effect. It should also be noted that for some things, like the minutes in our current example, we can also interpret the magnitude of the difference we observed (23 minutes and 45 seconds) as an indicator of importance, since time is a familiar metric.

    Step 4: Make the Decision

    Now that we have our critical value and test statistic, we can make our decision using the same criteria we used for a z-test. Our obtained t statistic was t = 3.46 and our critical value was t* = 2.353: t > t*, so we reject the null hypothesis and conclude:

    Based on our four oil changes, the new mechanic takes longer on average (M = 53.75, SD = 13.72) to change oil than our old mechanic, and the effect size was large, t(3) = 3.46, p < .05, d = 1.74.

    Notice that we also include the degrees of freedom in parentheses next to t. Figure \(\PageIndex{3}\) shows the output from JASP.

    Screenshot of a One Sample T-Test output showing wait time data; t=4.87, df=3, p=0.020, mean wait time=53.75, SD=13.23, n=4; test is Students t-test.
    Figure \(\PageIndex{3}\): Output from JASP for the one-sample t test described in this example. (“JASP 1-sample t test” by Rupa G. Gordon/Judy Schmitt is licensed under CC BY-NC-SA 4.0.)
    Test Your Knowledge

    Question \(\PageIndex{1}\)

    Question \(\PageIndex{2}\)

    Question \(\PageIndex{3}\)

    Question \(\PageIndex{4}\)


    This page titled 8.1: Hypothesis Testing with t-Tests is shared under a not declared license and was authored, remixed, and/or curated by Linda R. Cote, Rupa G. Gordon, Chrislyn E. Randell, Judy Schmitt, and Helena Marvin via source content that was edited to the style and standards of the LibreTexts platform.