Skip to main content
Statistics LibreTexts

12.3: Estimation and Evidence

  • Page ID
    64766

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Once the null and alternative hypotheses have been set, the scientific method proceeds to test the theory. Testing the theory in social justice studies usually consists of observing data from the population of interest based on a specified sampling plan. Once the data are observed, we proceed to the analysis step in the scientific method. That is, the observed data must be analyzed in some way that will let the researcher decide whether the data support the null or alternative hypotheses. Because of the structure of the hypotheses, what the researcher is really looking for is how much evidence is contained in the data that the null hypothesis is false. What this means is that the researcher needs to find a way of looking at the observed data that shows how much evidence exists against the null hypothesis.  

    Let us consider the example of the test grades once again. The null hypothesis for this situation is that the mean grade for the entire class is no greater than 80 and the alternative hypothesis is that the mean grade is greater than 80. Suppose the professor happens to sample the exams numbered 2, 6, 12, 14, and 22, whose scores are 76, 59, 74, 75, and 81, respectively. In this sample there does not seem to be much evidence against the null hypothesis that the class mean is less than or equal to 80. All but one of the exam scores is less than 80. In fact, if we compute the mean of the sampled scores comes out to be  

    \[(76+59+74+75+81)\div 5=365\div 5=73. \nonumber \]

    If the mean score in the class was greater than 80, we would expect that our sample mean would also be greater than 80. In this case the sample mean is much less, so this is not evidence against the null hypothesis. So, what would be evidence against the null hypothesis? It makes logical sense that any time the mean of the sample is greater than 80 that this would be evidence against the null hypothesis—for example, if the professor happened to sample the exams numbered 2, 16, 17, 19, and 25, whose scores are 76, 79, 88, 82, and 97. In this sample there does seem to be some evidence against the null hypothesis. Several of the exam scores are greater than 80, and if we compute the mean of the sampled scores, it is 

    \[ (76+79+88+82+97)\div 5=422\div 5=84.4. \nonumber \]

    From this example we begin to see how evidence against the null hypothesis can be assessed. In this exposition one should notice that we are taking the mean of the sample to be an indication of the mean of the population. With the lack of any other information, we have assumed that the mean of the sample is in some way the best guess as to what the mean of the population is. That is, the mean of the sample is being used to estimate the unknown population mean.  

    Definition: Estimator

    A statistical estimator is a summary computed on a sample that is taken to be the best indication of what a corresponding unknown parameter value may be.

    Note that an estimator is likely not equal to the parameter value in the population. In the example considering the five sampled exams, we considered several samples in the previous section and none of the sample mean values were exactly equal to the true population mean. This is true in most applications, and the difference between the estimator and the true parameter value for the population is known as estimation error. 

    Definition: Estimation Error

    The difference between the value of an estimator computed on a sample and the parameter value for the corresponding population is called estimation error.

    For the five sampled exams, we first considered the case where exams numbered 2, 6, 12, 14, and 22 whose scores are 76, 59, 74, 75, and 81 were taken as the sample. The mean of the sampled scores comes out to be 73. The actual mean value for all the exams is 78, which is the value of the population parameter in this case. The estimation error is then \(|73−78|=5\), so that the estimate is 5 points away from the true value. 

    In practice the true population parameter is not known, and hence researchers are never completely sure how much error they are making when estimating a population parameter. However, for many estimators, specialized methods have been developed so that the researcher can get an idea as to how large the error might be. This measure is called the standard error of the estimator.  

    Definition: Standard Error

    The standard error of an estimator is an estimate of how large the estimation error of an estimator may be.  

    This estimate can be computed using the observed data and is interpreted in much the same way as a standard deviation. The standard error can be taken to be an estimate of the typical size of the error, and in an analog to the Empirical Rule discussed earlier, one would expect that the distance between the true population parameter value and the sample estimate to be less than two or three times the value of the standard error. It should be noted that there are some estimates where the standard error is interpreted differently. The problem is that the variation is not always centered on the true value of the population parameter. In most of these cases the standard error is usually not quoted by itself.  

    To measure the amount of evidence against the null hypothesis, a researcher will compare the estimator of a population parameter to the form of the null hypothesis. If the relationship between the estimator is consistent with the form of the null hypothesis, there is very little evidence against the truth of the null hypothesis. This corresponds to the first case with the exam data where the mean of the sample was computed to be 73, which is less than 80, and therefore is consistent with the form of the null hypothesis \(H_0:\mu\leq 80\). On the other hand, if the relationship between the estimator and the parameter is not consistent with the form of the null hypothesis, then there is evidence against the null hypothesis. This corresponds to the second case with the exam data where the mean of the sample was computed to be 84.4, which is greater than 80, and therefore is not consistent with the form of the null hypothesis \(H_0:\mu\leq 80\). 

    The question now is to determine how much evidence exists against the null hypothesis. To understand how evidence is measured, consider again the sample of exams. If the professor had happened to sample the exams numbered 17, 20, 21, 23, and 25 whose scores are 88, 91, 96, 97, and 97, the mean of the sampled scores becomes  

    \[ (88+91+96+97+97)\div 5=469\div 5=93.8. \nonumber \]

    The previous sample mean of 84.4 would provide more evidence against the null hypothesis. The reason for this is that while both sample means are consistent with the form of the alternative hypothesis and not the null hypothesis, the second sample mean is farther away from the important crossover point of 80, and therefore we should be surer that the null hypothesis should be rejected.  

    Think of it this way. Because our estimate is based on a sample, we know it is likely to have some error. Knowing this, it could happen that even though the sample mean is larger than 80, which is consistent with the alternative hypothesis, it could be that the error is large enough so the actual population mean is less than 80, which would be consistent with the null hypothesis. In the second case, the sample mean is farther away from the crossover point of 80, and hence in that case the error would have to be larger for this to take place. When the sample value is farther away from what the null hypothesis specifies, more evidence exists that the null hypothesis is false.  

    A statistical hypothesis test measures the amount of evidence by computing the distance between the estimator and the closest parameter value possible when the null hypothesis is true. This distance function is called a test statistic. In many cases the test statistic will also consider the standard error of the estimator.  

    Definition: Test Statistic

    test statistic is a function of the data that measures the evidence against the null hypothesis by computing the distance between an estimator and the closest possible parameter value specified by the null hypothesis. 

     For the example of the five exams sampled from the class papers, the test statistic could be written as \(\text{sample mean}−80\), which gives the raw distance between the mean of the sample and the closest value in the null hypothesis, which is 80. In most statistics textbooks the test statistic would be written as

    \[\frac{\text{sample mean}-80}{\text{standard error}}. \nonumber \]

    This second form of the test statistic is preferred because it tells the researcher how many standard errors the mean observed in the sample is from the closest parameter value specified by the null hypothesis. For example, if the test statistic is equal to 2, then


    This page titled 12.3: Estimation and Evidence is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by .

    • Was this article helpful?