Skip to main content
Statistics LibreTexts

10.1: The One-Sample z-test

  • Page ID
    29507
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    In this section, we'll discuss one of the most useless tests in all of statistics: the z-test. Seriously – this test is almost never used in real life. Its only real purpose is that, when teaching statistics, it’s a very convenient stepping stone along the way towards the t-test, which is probably the most (over)used tool in all statistics.

    The Inference Problem that the Test Addresses

    To introduce the idea behind the z-test, let’s use a simple example. A friend of mine, Dr. Zeppo, grades his introductory statistics class on a curve. Let’s suppose that the average grade in his class is 67.5, and the standard deviation is 9.5. Of his many hundreds of students, it turns out that 20 of them also take psychology classes. Out of curiosity, I find myself wondering: do the psychology students tend to get the same grades as everyone else (i.e., mean 67.5), or do they tend to score higher or lower? He emails me the zeppo.sav file, which contains the grades of those students, and calculate the mean:

    clipboard_e8c9575cef1e5cf12fb4f5f62aa2716ac.png

    Hm. It might be that the psychology students are scoring a bit higher than normal: that sample mean of \(\bar{X}\) = 72.3 is a fair bit higher than the hypothesized population mean of μ=67.5, but on the other hand, a sample size of N=20 isn’t all that big. Maybe it’s pure chance.

    To answer the question, it helps to be able to write down what it is that we think we know. Firstly, we know that the sample mean is \(\bar{X}\) =72.3. If we're willing to assume that the psychology students have the same standard deviation as the rest of the class then we can say that the population standard deviation is σ=9.5. We’ll also assume that since Dr. Zeppo is grading to a curve, the psychology student grades are normally distributed.

    Next, it helps to be clear about what we want to learn from the data. In this case, the research hypothesis relates to the population mean μ for the psychology student grades, which is unknown. Specifically, I want to know if μ=67.5 or not. Given that this is what I know, can we devise a hypothesis test to solve our problem? The data, along with the hypothesized distribution from which they are thought to arise, are shown in Figure 10.1. Not entirely obvious what the right answer is, is it? For this, we are going to need some statistics.

    zeppo-1.png
    Figure 10.1: The theoretical distribution (solid line) from which the psychology student grades (grey bars) are supposed to have been generated.

    Constructing the Hypothesis Test

    The first step in constructing a hypothesis test is to be clear about what the null and alternative hypotheses are. This isn’t too hard to do. Our null hypothesis, H0, is that the true population mean μ for psychology student grades is 67.5%; and our alternative hypothesis is that the population mean isn’t 67.5%. If we write this in mathematical notation, these hypotheses become,

    H0:μ=67.5

    H1:μ≠67.5

    though to be honest this notation doesn’t add much to our understanding of the problem, it’s just a compact way of writing down what we’re trying to learn from the data. The null hypothesis H0 and the alternative hypothesis H1 for our test are both illustrated in Figure 10.2. In addition to providing us with these hypotheses, the scenario outlined above provides us with a fair amount of background knowledge that might be useful. Specifically, there are two special pieces of information that we can add:

    • The psychology grades are normally distributed.
    • The true standard deviation of these scores σ is known to be 9.5.

    For the moment, we’ll act as if these are absolutely trustworthy facts. In real life, this kind of absolutely trustworthy background knowledge doesn’t exist, and so if we want to rely on these facts we’ll just have to make the assumption that these things are true. However, since these assumptions may or may not be warranted, we might need to check them. For now, though, we’ll keep things simple.

    ztesthyp-1.png
    Figure 10.2: Graphical illustration of the null and alternative hypotheses assumed by the one-sample z-test (the two-sided version, that is). The null and alternative hypotheses both assume that the population distribution is normal, and additionally assume that the population standard deviation is known (fixed at some value σ0). The null hypothesis (left) is that the population mean μ is equal to some specified value μ0. The alternative hypothesis is that the population mean differs from this value, μ≠μ0.

    The next step is to figure out what would be a good choice for a diagnostic test statistic; something that would help us discriminate between H0 and H1. Given that the hypotheses all refer to the population mean μ, you’d feel pretty confident that the sample mean \(\bar{X}\) would be a pretty useful place to start. What we could do, is look at the difference between the sample mean \(\bar{X}\) and the value that the null hypothesis predicts for the population mean. In our example, that would mean we calculate \(\bar{X}\) - 67.5. More generally, if we let μ0 refer to the value that the null hypothesis claims is our population mean, then we’d want to calculate

    \(\bar{X}-\mu_{0}\)

    If this quantity equals or is very close to 0, things are looking good for the null hypothesis. If this quantity is a long way away from 0, then it’s looking less likely that the null hypothesis is worth retaining. But how far away from zero should it be for us to reject H0?

    To figure that out, we need to be a bit more sneaky, and we’ll need to rely on those two pieces of background knowledge that we wrote down previously, namely that the raw data are normally distributed, and we know the value of the population standard deviation σ. If the null hypothesis is actually true, and the true mean is μ0, then these facts together mean that we know the complete population distribution of the data: a normal distribution with mean μ0 and standard deviation σ. Adopting the notation from Section ???.5, a statistician might write this as:

    X∼Normal(μ02)

    Okay, if that’s true, then what can we say about the distribution of \(\bar{X}\)? Well, as we discussed earlier (see Section ????.3.3), the sampling distribution of the mean \(\bar{X}\) is also normal and has mean μ. But the standard deviation of this sampling distribution SE (\(\bar{X}\)), which is called the standard error of the mean, is

    \(\operatorname{SE}(\bar{X})=\frac{\sigma}{\sqrt{N}}\)

    In other words, if the null hypothesis is true then the sampling distribution of the mean can be written as follows:

    \(\bar{X}\)∼Normal(μ0,SE(\(\bar{X}\)))

    Now comes the trick. What we can do is convert the sample mean \(\bar{X}\) into a standard score (Section ??.6). This is conventionally written as z, but for now I’m going to refer to it as \(z_{\bar{X}}\). (The reason for using this expanded notation is to help you remember that we’re calculating standardized version of a sample mean, not a standardized version of a single observation, which is what a z-score usually refers to). When we do so, the z-score for our sample mean is

    \(\ z_{\bar{X}} = {{\bar{X}-\mu_{0}} \over SE(\bar{X})}\)

    or, equivalently

    \(\ z_{\bar{X}} = {{\bar{X}-\mu_{0}} \over \sigma/ \sqrt{N} }\)

    This z-score is our test statistic. The nice thing about using this as our test statistic is that like all z-scores, it has a standard normal distribution:

    \(\ z_{\bar{X}}\)∼Normal(0,1)

    (again, see Section ???.6 if you’ve forgotten why this is true). In other words, regardless of what scale the original data are on, the z-statistic itself always has the same interpretation: it’s equal to the number of standard errors that separate the observed sample mean\(\bar{X}\) from the population mean μ0 predicted by the null hypothesis. Better yet, regardless of what the population parameters for the raw scores actually are, the 5% critical regions for the z-test are always the same, as illustrated in Figures 10.4 and 10.3. And what this meant, way back in the days when people did all their statistics by hand, is that someone could publish a table like this:

    desired α level two-sided test one-sided test
    .1 1.644854 1.281552
    .05 1.959964 1.644854
    .01 2.575829 2.326348
    .001 3.290527 3.090232

    which in turn meant that researchers could calculate their z-statistic by hand, and then look up the critical value in a text book. That was an incredibly handy thing to be able to do back then, but it’s kind of unnecessary these days, since it’s trivially easy to do it with software like R.

    ztest2-1.png
    Figure 10.3: Rejection regions for the two-sided z-test
    ztest1-1.png
    Figure 10.4: Rejection regions for the one-sided z-test

    A Worked Example Using SPSS

    Now, as I mentioned earlier, the z-test is almost never used in practice. It’s so rarely used in real life that the drop-down windows in SPSS don’t have a built-in function for it. However, the test is so incredibly simple that it’s really easy to do one manually. Let’s go back to the data from Dr. Zeppo’s class. Having loaded the zeppo.sav data, the first thing I need to do is calculate the sample mean:

    clipboard_ea832950152f00b1a08e1ad74ada9ed9d.png

     

    Then, using syntax (yes, I know we haven't talked about that), we can create a set of variables corresponding to the sample size, known population standard deviation (σ=9.5), and the value of the population mean that the null hypothesis specifies (μ0=67.5):

    data list list / n sample_mean population_mean population_sd.
    begin data
    20 72.3 67.5 9.5
    end data.
    

     

    The next bit of syntax (courtesy of www.how2stats.net) will compute the necessary stuff for a z-test and barf out the z-score, the p-value, and Cohen's d:

    Compute mean_difference = sample_mean - population_mean.
    Compute square_root_n =SQRT(n).
    Compute standard_difference = population_sd/square_root_n.
    Compute z_statistic = mean_difference/standard_difference.
    Compute chi_square = z_statistic*z_statistic.
    Compute p_value = SIG.CHISQ(chi_square, 1).
    Compute cohens_d = mean_difference/population_sd.
    EXECUTE.
    Formats z_statistic p_value cohens_d (f8.5).
    LIST z_statistic p_value cohens_d. 
    

    The results of all these mathematical shenanigans is:

    z_statistic  p_value cohens_d
    
       2.25961    .02385   .50526
     
    

    At this point, we would traditionally look up the value 2.26 in our table of critical values. Our original hypothesis was two-sided (we didn’t really have any theory about whether psych students would be better or worse at statistics than other students) so our hypothesis test is two-sided (or two-tailed) also. Looking at the little table that I showed earlier, we can see that 2.26 is bigger than the critical value of 1.96 required to be significant at α=.05, but smaller than the value of 2.58 that would be required to be significant at a level of α=.01. Therefore, we can conclude that we have a significant effect, which we might write up by saying something like this:

    With a mean grade of 73.2 in the sample of psychology students, and assuming a true population standard deviation of 9.5, we can conclude that the psychology students have significantly different statistics scores to the class average (z=2.26, N=20, p<.05).

    If you would like to report the exact p-value, this bit of syntax has got you covered. If you look, the output includes the p-value of .02385. Oh, and for good measure we got Cohen's d (the effect size) of .50526.

    Assumptions of the z-test

    As stated before, all statistical tests make assumptions. Some tests make reasonable assumptions, while other tests do not. The test we’ve just discussed – the one-sample z-test – makes three basic assumptions. These are:

    • Normality. As usually described, the z-test assumes that the true population distribution is normal. This is often pretty reasonable, and not only that, it’s an assumption that we can check if we feel worried about it (see Section 10.9).
    • Independence. The second assumption of the test is that the observations in your data set are not correlated with each other, or related to each other in some funny way. This isn’t as easy to check statistically: it relies a bit on good experimental design. An obvious (and stupid) example of something that violates this assumption is a data set where you “copy” the same observation over and over again in your data file: so you end up with a massive “sample size”, consisting of only one genuine observation. More realistically, you have to ask yourself if it’s really plausible to imagine that each observation is a completely random sample from the population that you’re interested in. In practice, this assumption is never met; but we try our best to design studies that minimize the problems of correlated data.
    • Known standard deviation. The third assumption of the z-test is that the true standard deviation of the population is known to the researcher. This is just stupid. In no real-world data analysis problem do you know the standard deviation σ of some population, but are completely ignorant about the mean μ. In other words, this assumption is always wrong.

    In view of the stupidity of assuming that σ is known, let’s see if we can live without it. This takes us out of the dreary domain of the z-test, and into the magical kingdom of the t-test, with unicorns and fairies and leprechauns, and um…


    This page titled 10.1: The One-Sample z-test is shared under a CC BY-SA 4.0 license and was authored, remixed, and/or curated by Danielle Navarro.