Skip to main content
Statistics LibreTexts

6.2: Two Important Axioms

  • Page ID
    56641
    • Chanler Hilley, Kennesaw State University
    • University of Missouri System

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    We just learned that the sampling distribution is theoretical: we never actually see it. If that is true, then how can we know it works? How can we use something we don’t see? The answer lies in two very important mathematical facts: the central limit theorem and the law of large numbers. We will not go into the math behind how these statements were derived, but knowing what they are and what they mean is important to understanding why inferential statistics work and how we can draw conclusions about a population based on information gained from a single sample.

    Central Limit Theorem

    The central limit theorem states:

    For samples of a single size n, drawn from a population with a given mean \(\mu\) and variance \(s^2\), the sampling distribution of sample means will have a mean \(\mu_M=\mu\) and variance \(\sigma_X^2=\sigma^2/n\). This distribution will approach normality as n increases.

    From this, we are able to find the standard deviation of our sampling distribution, the standard error. As you can see, just like any other standard deviation, the standard error is simply the square root of the variance of the distribution.

    The last sentence of the central limit theorem states that the sampling distribution will be more normal as the sample size of the samples used to create it increases. What this means is that bigger samples will create a more normal distribution, so we are better able to use the techniques we developed for normal distributions and probabilities. So, how large is large enough? In general, a sampling distribution will be normal if either of two characteristics is true: (1) the population from which the samples are drawn is normally distributed or (2) the sample size is equal to or greater than 30. This second criterion is very important because it enables us to use methods developed for normal distributions even if the true population distribution is skewed.

    Video: Central limit theorem

    Central limit theorem on YouTube.

    Law of Large Numbers

    The law of large numbers simply states that as our sample size increases, the probability that our sample mean is an accurate representation of the true population mean also increases. It is the formal mathematical way to state that larger samples are more accurate.

    The law of large numbers is related to the central limit theorem, specifically the formulas for variance and standard error. Notice that the sample size appears in the denominators of those formulas. A larger denominator in any fraction means that the overall value of the fraction gets smaller (i.e., 1/2 = 0.50, 1/3 = 0.33, 1/4 = 0.25, and so on). Thus, larger sample sizes will create smaller standard errors. We already know that standard error is the spread of the sampling distribution and that a smaller spread creates a narrower distribution. Therefore, larger sample sizes create narrower sampling distributions, which increases the probability that a sample mean will be close to the center and decreases the probability that it will be in the tails. This is illustrated in Figures \(\PageIndex{1}\) and \(\PageIndex{2}\).

    Line graph showing four curves centered at N₀, labeled N = 10, 30, 50, 100. The curves become taller and narrower as N increases.
    Figure \(\PageIndex{1}\): Sampling distributions from the same population with \(\mu=50\) and \(\sigma=10\) but different sample sizes (N = 10, N = 30, N = 50, N = 100). (“Sampling Distributions with Different Sample Sizes” by Judy Schmitt is licensed under CC BY-NC-SA 4.0.)
    Line graph showing a curve that decreases rapidly and flattens as sample size increases, with sample size (n) on the x-axis and epsilon (ε) on the y-axis.
    Figure \(\PageIndex{2}\): Relationship between sample size and standard error for a constant \(\sigma=10\). (“Relationship between Sample Size and Standard Error” by Judy Schmitt is licensed under CC BY-NC-SA 4.0.)

    Using Standard Error for Probability

    In this chapter, we saw that we can use z-scores to split up a normal distribution and calculate the proportion of the area under the curve in one of the new regions, giving us the probability of randomly selecting a z-score in that range. We can follow the exact sample process for sample means, converting them into z-scores and calculating probabilities. The only difference is that instead of dividing a raw score by the standard deviation, we divide the sample mean by the standard error.

    \[ z = \frac{M-\mu}{\sigma_M} = \frac{M-\mu}{\frac{\sigma}{\sqrt{n}}} \nonumber \]

    Let’s say we are drawing samples from a population with a mean of 50 and a standard deviation of 10 (the same values used in Figure \(\PageIndex{1}\)). What is the probability that we get a random sample of size 10 with a mean greater than or equal to 55? That is, for n = 10, what is the probability that M ≥ 55? First, we need to convert this sample mean score into a z-score:

    \[ z = \frac{55-50}{\frac{10}{\sqrt{10}}} = \frac{5}{3.16} = 1.58 \nonumber \]

    Now we need to shade the area under the normal curve corresponding to scores greater than z = 1.58, as in Figure \(\PageIndex{3}\).

    A normal distribution curve with the area under the curve shaded in purple to the right of x = 1.5, representing the right-tail probability.
    Figure \(\PageIndex{3}\): Area under the curve greater than z = 1.58. (“Area under the Curve Greater than z = 1.58” by Judy Schmitt is licensed under CC BY-NC-SA 4.0.)

    Now we go to our z table and find that the area to the left of z = 1.58 is .9429. Finally, because we need the area to the right (per our shaded diagram), we simply subtract this from 1 to get 1.00 − .9429 = .0571. So, the probability of randomly drawing a sample of 10 people from a population with a mean of 50 and standard deviation of 10 whose sample mean is 55 or more is p = .0571, or 5.71%. Notice that we are talking about means that are 55 or more. That is because, strictly speaking, it’s impossible to calculate the probability of a score taking on exactly 1 value since the “shaded region” would just be a line with no area to calculate.

    Now let’s do the same thing, but assume that instead of only having a sample of 10 people, we took a sample of 50 people. First, we find z:

    \[ z = \frac{55-50}{\frac{10}{\sqrt{50}}} = \frac{5}{1.41} = 3.55 \nonumber \]

    Then we shade the appropriate region of the normal distribution, as shown in Figure \(\PageIndex{4}\).

    A standard normal distribution curve centered at zero, with a vertical dashed line at x = 3.5 on the x-axis.
    Figure \(\PageIndex{4}\): Area under the curve greater than z = 3.55. (“Area under the Curve Greater Than z3.55” by Judy Schmitt is licensed under CC BY-NC-SA 4.0.)

    Notice that no region of Figure \(\PageIndex{4}\) appears to be shaded. That is because the area under the curve that far out into the tail is so small that it can’t even be seen (the red line has been added to show exactly where the region starts). Thus, we already know that the probability must be smaller for N = 50 than N = 10 because the size of the area (the proportion) is much smaller.

    We run into a similar issue when we try to find z = 3.55 on our Standard Normal Distribution Table. The table only goes up to 3.09 because everything beyond that is almost 0 and changes so little that it’s not worth printing values. The closest we can get is subtracting the largest value, .9990, from 1 to get .001. We know that, technically, the actual probability is smaller than this (since 3.55 is farther into the tail than 3.09), so we say that the probability is p < .001, or less than 0.1%.

    This example shows what an impact sample size can have. From the same population, looking for exactly the same thing, changing only the sample size took us from roughly a 5% chance (or about 1/20 odds) to a less than 0.1% chance (or less than 1 in 1000). As the sample size n increased, the standard error decreased, which in turn caused the value of z to increase, which finally caused the p value (a term for probability we will use a lot in Unit 2) to decrease. You can think of this relationship like gears: turning the first gear (sample size) clockwise causes the next gear (standard error) to turn counterclockwise, which causes the third gear (z) to turn clockwise, which finally causes the last gear (probability) to turn counterclockwise. All of these pieces fit together, and the relationships will always be the same:

    \[ \displaystyle n \uparrow \qquad \sigma_M \downarrow \qquad z \uparrow \qquad p \downarrow \nonumber \]

    Let’s look at this one more way. For the same population of sample size 50 and standard deviation 10, what proportion of sample means fall between 47 and 53 if they are of sample size 10 and sample size 50?

    We’ll start again with n = 10. Converting 47 and 53 into z-scores, we get z = −0.95 and z = 0.95, respectively. From our z table, we find that the proportion between these two scores is .6578 (the process here is left off for the student to practice converting M to z and z to proportions). So, 65.78% of sample means of sample size 10 will fall between 47 and 53. For n = 50, our z-scores for 47 and 53 are ±2.13, which gives us a proportion of the area as .9668, almost 97%! Shaded regions for each of these sampling distributions are displayed in Figure \(\PageIndex{5}\). The sampling distributions are shown on the original scale, rather than as z scores, so you can see the effect of the shading and how much of the body falls into the range, which is marked off with thin dotted lines.

    A graph shows two normal distribution curves centered at 50, one with a smaller spread (σ = 5), the other with a wider spread (σ = 10), both shaded under the curves.
    Figure \(\PageIndex{5}\): Areas between 47 and 53 for sampling distributions of n = 10 and n = 50. (“Areas between 47 and 53” by Judy Schmitt is licensed under CC BY-NC-SA 4.0.)

    Sampling Distribution, Probability, and Inference

    We’ve seen how we can use the standard error to determine probability based on our normal curve. We can think of the standard error as how much we would naturally expect our statistic—be it a mean or some other statistic—to vary. In our formula for z based on a sample mean, the numerator \((M-\mu)\) is what we call an observed effect. That is, it is what we observe in our sample mean versus what we expected based on the population from which that sample mean was calculated.

    Because the sample mean will naturally move around due to sampling error, our observed effect will also change naturally. In the context of our formula for z, then, our standard error is how much we would naturally expect the observed effect to change. Changing by a little is completely normal, but changing by a lot might indicate something is going on. This is the basis of inferential statistics and the logic behind hypothesis testing, the subject of Unit 2.

    Test Your Knowledge

    Question \(\PageIndex{1}\)

    Question \(\PageIndex{2}\)

    Question \(\PageIndex{3}\)


    This page titled 6.2: Two Important Axioms is shared under a not declared license and was authored, remixed, and/or curated by Chanler Hilley, Kennesaw State University via source content that was edited to the style and standards of the LibreTexts platform.