# 7.2: Sample Variance

- Page ID
- 13702

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

In Section 6.2, we introduced the sample mean \(\bar{X}\) as a tool for understanding the mean of a population. In this section, we formalize this idea and extend it to define the *sample variance*, a tool for understanding the *variance* of a population.

## Estimating \(\mu\) and \(\sigma^2\)

Up to now, \(\mu\) denoted the mean or expected value of a random variable. In other words, it represented a parameter of a probability distribution. In the context of statistics, the main focus is more generally a population of objects, where the objects could be actual individuals and we are interested in a certain characteristic of the individuals, e.g., height or IQ. Oftentimes, we can model the distribution of values in a population using a certain probability distribution, and so it makes sense that the mean of a population is also denoted with \(\mu\). However, we may not always have a specific probability distribution in mind when considering the values of a population. Thus, we can generalize the interpretation of \(\mu\) as the mean of a population as provided in the following definition. Note that the definition also provides a more general interpretation of \(\sigma^2\) the variance of a population.

### Definition \(\PageIndex{1}\)

Suppose that a population has \(N\) elements, denoted \(x_1, x_2, \ldots, x_N\). Then the ** population mean **\(\mu\) is given by

$$\mu = \frac{1}{N}\sum^N_{i=1} x_i,\label{mu}$$

and the

**\(\sigma^2\) is given by**

*population variance*$$\sigma^2 = \frac{1}{N}\sum^N_{i=1} (x_i-\mu)^2.\label{sigma}$$

As we saw in Section 6.2, we can collect a random sample from a population and use the sample mean to estimate the population mean. More formally, let \(X_1, \ldots, X_n\) be a collection of independent random variables representing a random sample of observations drawn from a population of interest. Then the sample mean, given by

$$\bar{X} = \frac{1}{n}\sum^n_{i=1}X_i,\label{xbar}$$

can be used to estimate the value of the population mean \(\mu\).

Note the use of lower case letters "\(x_i\)" in Definition 7.2.1 for the elements in the population. This is in contrast to the upper case letters "\(X_i\)" used to denote the elements of the random sample. Because the values in a population are fixed, though unknown in practice, it would not be appropriate to represent them with capital letters which are reserved for random variables per convention.

We argue that the sample mean \(\bar{X}\) is the "obvious" estimate of the population mean \(\mu\) because the population elements in Equation \ref{mu} are simply replaced by the corresponding sample elements in Equation \ref{xbar}. In addition to being the natural choice for estimating \(\mu\), \(\bar{X}\) has another desirable property, which has to do with the following result, stated in Corollary 6.2.1 for normally distributed populations.

### Theorem \(\PageIndex{1}\)

For a random sample of size \(n\) from a population with mean \(\mu\) and variance \(\sigma^2\), it follows that

\begin{align*}

\text{E}[\bar{X}] &= \mu, \\

\text{Var}(\bar{X}) &= \frac{\sigma^2}{n}.

\end{align*}

**Proof**-
Let \(X_1, \ldots, X_n\) denote the elements of the random sample. Then \(X_1, \ldots, X_n\) are independent random variables each having the same distribution as the population. In other words, we know that \(\text{E}[X_i] = \mu\) and \(\text{Var}(X_i) = \sigma^2\), for \(i=1, \ldots, n\). Given this, and using the linearity of expected value and the independence of the sample elements, we have the following:

\begin{align*}

\text{E}[\bar{X}] &= \text{E}\left[\frac{1}{n}\sum^n_{i=1}X_i\right] = \frac{1}{n}\sum^n_{i=1} \text{E}[X_i] = \frac{1}{n} \sum^n_{i=1} \mu = \frac{1}{n}(n\mu) = \mu \\

\text{Var}(\bar{X}) &= \text{Var}\left(\frac{1}{n}\sum^n_{i=1} X_i \right) = \frac{1}{n^2}\sum^n_{i=1} \text{Var}(X_i) = \frac{1}{n^2}\sum^n_{i=1} \sigma^2 = \frac{1}{n^2}(n\sigma^2) = \frac{\sigma^2}{n}

\end{align*}

Theorem 7.2.1 provides formulas for the expected value and variance of the sample mean, and we see that they both depend on the mean and variance of the population. The fact that the expected value of the sample mean is exactly equal to the population mean indicates that the sample mean is an ** unbiased** estimator of the population mean. This is because on average, we expect the value of \(\bar{X}\) to equal the value of \(\mu\), which is precisely the value it is being used to estimate. This is a very desirable property for estimators to have as it lends more confidence to using their values in understanding the unknown population characteristic. We will keep the goal of using unbiased estimators as we now consider estimating the population variance.

Before we tackle the problem of estimating population variance, we again point out that the variance of the sample mean depends on the population variance. Thus, if we are interested in using the variance of \(\bar{X}\) to quantify its accuracy in estimating the population mean, we need to know the value of \(\sigma^2\), which is unlikely. (We talked about this at the end of Section 6.2 in the context of computing error probabilities. See Example 6.2.5.) So we have a specific need for an estimate of \(\sigma^2\), not just for understanding the distribution of the population better.

Given Equation \ref{sigma} in Definition 7.2.1, an "obvious" estimate of \(\sigma^2\) is given by simply replacing the population elements by the corresponding sample elements, as we did for estimating \(\mu\). This gives the following formula for \(\hat{\sigma}^2\) (note the "hat" ^), which is our first attempt at estimating \(\sigma^2\):

$$\hat{\sigma}^2 = \frac{1}{n}\sum^n_{i=1}(X_i - \bar{X})^2.\notag$$

The problem with this "obvious" estimate is that it is not unbiased. The following theorem (stated without proof) gives the expected value of \(\hat{\sigma}^2\).

### Theorem \(\PageIndex{2}\)

For a random sample of size \(n\) from a population with mean \(\mu\) and variance \(\sigma^2\), it follows that

$$\text{E}\left[\hat{\sigma}^2\right] = \sigma^2\left(\frac{n-1}{n}\right).\notag$$

As we can see in Theorem 7.2.2, the expected value of \(\hat{\sigma}^2\) does not equal \(\sigma^2\), so it is not an unbiased estimator. Furthermore, note that \(\hat{\sigma}^2\) actually * underestimates* the value of \(\sigma^2\), on average, since its expected value is multiplied by a factor less than 1: \((n-1)/n < 1\). This is not good. Putting this again in the context of using the variance of the sample mean to quantify its accuracy in estimating the population mean, if we use \(\hat{\sigma}^2\) to estimate \(\sigma^2\), we would consistently report

*higher*accuracy than what is actually being obtained, since smaller variance means less spread or greater confidence in our estimate of \(\mu\). This would make our analysis unreliable and misleading.

We can find an *unbiased* estimate of \(\sigma^2\) by modifying our first attempt in \(\hat{\sigma}^2\). The modification is to simply multiply by the reciprocal of the factor on \(\sigma^2\) in the expected value of \(\hat{\sigma}^2\). In doing this, we note that expected value of the modification will equal \(\sigma^2\), following from the linearity of expected value:

$$\text{E}\left[\left(\frac{n}{n-1}\right)\hat{\sigma}^2\right] = \left(\frac{n}{n-1}\right)\text{E}\left[\hat{\sigma}^2\right] = \left(\frac{n}{n-1}\right)\sigma^2\left(\frac{n-1}{n}\right) = \sigma^2\notag$$

We can simplify the modification of \(\hat{\sigma}^2\) algebraically as follows:

$$\left(\frac{n}{n-1}\right)\hat{\sigma}^2 = \left(\frac{n}{n-1}\right)\frac{1}{n}\sum^n_{i=1}(X_i-\bar{X})^2 = \frac{1}{n-1}\sum^n_{i=1}(X_i-\bar{X})^2\notag$$

This leads to the following definition of the * sample variance*, denoted \(S^2\), our unbiased estimator of the population variance:

$$\boxed{S^2 = \frac{1}{n-1}\sum^n_{i=1} (X_i - \bar{X})^2}\notag$$

The next theorem provides a sampling distribution for the sample variance *in the case that the population is normally distributed*.

### Theorem \(\PageIndex{3}\)

Let \(X_1, \ldots, X_n\) be independent \(N(\mu, \sigma^2)\) random variables. Then, it follows that

$$\frac{(n-1)S^2}{\sigma^2}\sim\chi^2_{n-1}.\notag$$

Theorem 7.2.3 states that the distribution of the sample variance, when sampling from a normally distributed population, is chi-squared with \((n-1)\) degrees of freedom. Note that without knowing that the population is normally distributed, we are not able to say anything about the distribution of the sample variance, not even approximately. There is no "CLT-like" result for the sample variance. Further note that it is not the distribution of \(S^2\) alone, but rather, we multiply by one less than the sample size and divide by the population variance to get the result. This may not seem like a very useful result, given that it is the distribution of a quantity involving both the estimator \(S^2\) and the parameter it is estimating \(\sigma^2\). But you will see in the study of statistics how this can be utilized to quantify the error in using \(S^2\) to estimate \(\sigma^2\). But we can use Theorem 2.7.3 to help us in the context of understanding the accuracy of our estimate given by the sample mean. Before we state the result, we need two additional properties regarding the probabilistic qualities of \(\bar{X}\) and \(S^2\) as random variables. We state these properties without proof.

### Theorem \(\PageIndex{4}\)

- \(\bar{X}\) is independent of the collection of random variables given by \(X_1 - \bar{X}, X_2 - \bar{X}, \ldots, X_n - \bar{X}\).
- \(\bar{X}\) and \(S^2\) are independent.

Note that the second property in Theorem 7.2.4 follows immediately from the first one given our definition of \(S^2\). Using these properties, we can prove the following result.

### Theorem \(\PageIndex{5}\)

Let \(X_1, \ldots, X_n\) be independent \(N(\mu, \sigma^2)\) random variables. Then, it follows that

$$\frac{\bar{X} - \mu}{\sqrt{S^2/n}} \sim t_{n-1}\label{t}$$

**Proof**-
We rewrite the quotient by dividing top and bottom by the quantity \(\sqrt{\sigma^2/n}\):

$$\frac{\bar{X} - \mu}{\sqrt{S^2/n}} = \frac{\displaystyle{\left(\frac{\bar{X} - \mu}{\sqrt{\sigma^2/n}}\right)}}{\displaystyle{\sqrt{\frac{S^2/n}{\sigma^2/n}}}} = \frac{\displaystyle{\left(\frac{\bar{X} - \mu}{\sqrt{\sigma^2/n}}\right)}}{\displaystyle{\sqrt{\frac{S^2}{\sigma^2}}}}\label{quotient}$$

Note that the quantity in the numerator is the standardization of a normally distributed random variable, thus it has the standard normal distribution. For the denominator, we can further modify the expression under the square root by multiplying top and bottom by the quantity \((n-1)\):

$$\sqrt{\frac{(n-1)S^2}{(n-1)\sigma^2}} = \sqrt{\left(\frac{(n-1)S^2}{\sigma^2}\right)\frac{1}{n-1}}\notag$$

We know from Theorem 7.2.3 that \((n-1)S^2/\sigma^2 \sim \chi^2_{n-1}\), and so the denominator in Equation \ref{quotient} is the square root of a chi-squared distributed random variable divided by its degrees of freedom. Also note from Theorem 7.2.4 that the numerator and denominator in Equation \ref{t} are independent random variables, since they are functions of \(\bar{X}\) and \(S^2\), respectively. Thus, we have shown that the quantity we started with in Equation \ref{t} is equal to a random variable with a standard normal distribution divided by the square root of an independent random variable with a chi-squared distribution divided by its degrees of freedom. This is precisely the definition of the \(t\) distribution given in Definition 7.1.3.

Notice what the result of Theorem 7.2.5 says: when sampling from a normally distributed population, if we take the sample mean and subtract its expected value \(\mu\) and divide by its standard deviation *where the population variance \(\sigma^2\) is estimated by the sample variance \(S^2\)*, then the resulting random variable has a \(t\) distribution with \((n-1)\) degrees of freedom. The distribution is no longer the standard normal distribution because we have now estimated the population variance, which has the effect of increasing the overall variability in the quantity given in Equation \ref{t}. To account for that increased variability, we need a distribution with *thicker tails*, which is precisely what the \(t\) distribution provides. Notice also that the degrees of freedom of the \(t\) distribution that models the quantity in Equation \ref{t} is one less than the sample size because we lose a degree of freedom by using the sample variance to estimate the population variance. This result provides the foundation for many statistical inference techniques.