Skip to main content
Statistics LibreTexts

2.4: Quantifying Variability Relative to the Mean

  • Page ID
    51636
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    In the previous lesson, we examined variability with respect to the median. In this lesson, we will develop a measure of variability that depends on the mean. Recall that the mean is the average of all values in a given set. We use all data values in its computation. The sample mean is denoted by \(\bar{x}\) and is found by adding all the values in the sample and dividing by the sample size. It’s formula is \(\bar{x}=\dfrac{\Sigma x_i}{n}\). When referring to a population mean, we use the symbol \(\mu\) (“mu”). It is computed in the same way; we add all values in the population, and divide by the population size.

    We have seen a few measures of variability so far. The range is the difference between the maximum and minimum values in a set of data. The range is not resistant to outliers because its calculation involves only the two most extreme values in the set of data. The interquartile range or IQR uses only two values as well, but it is resistant because it is the difference between the third and first quartiles. Quartiles are found using the resistant measure of center, the median. Therefore, the IQR depends on the median. We will now develop a measure of variability that depends on the mean and uses all values in a given set of data.


    The table and dotplots below show the 2022 salaries (in millions of dollars) of a sample of players from the Kansas City Royals and Los Angeles Dodgers.



    2022 Salaries (in millions of dollars)

    Kansas City Royals

    4.83

    0.7

    4.75

    0.71

    1.3

    0.71

    Los Angeles Dodgers

    6

    0.73

    2.75

    1.35

    0.72

    21



    Kansas City Royals

    AD_4nXfCUCHpw_mhbh87RwWdPc937EO16j8jNmDwaZqdshQF8MgGW0cKtZndYAGecQm2BodYrS4eH5YwIMGat0COQ2KIE8pHKv99rVRLo_RaKH8YyD2Dz76wADrHb229GyzuW3IsufWwDZW-4VVa0JJTDyIlEcJ6keyi1XJeTDlU718V25snr3PRQ

    Images are created with the graphing calculator, used with permission from Desmos Studio PBC.

    Los Angeles Dodgers

    AD_4nXetqzAE0zDLydEVFql__3YfYSA9mHUaF0WRtj08bLfbeZqlFiBaaP_I9lqYqvQXLliUxBjzH07pW3ypkr17SbCsAjlFtEMNIm_FlmxZwFxCpCtYBsrsx3Htigf7jnD-V9Qimq__ZhNC6fY90aZn9f9zg5m-keyi1XJeTDlU718V25snr3PRQ

    Images are created with the graphing calculator, used with permission from Desmos Studio PBC.

     

    1. Let’s start by computing the mean salary for each sample. How do they compare?







       

    A measure of variability should tell us how spread out data is. Since we are using the mean as our center, it would be useful to know the distances between each value and the mean. These distances are called deviations. Data values that are above the mean have positive deviations. Data values that are below the mean have negative deviations. Formulaically, we say

    \[\text { deviation }=(\text { data value }- \text { mean })=x_i-\bar{x}\nonumber\]

    1. Calculate the deviations for each of the samples and enter them in the tables below.


      Kansas City Royals

      Value

      Deviation

      4.83

       

      0.7

       

      4.75

       

      0.71

       

      1.3

       

      0.71

       


      Los Angeles Dodgers

      Value

      Deviation

      6

       

      0.73

       

      2.75

       

      1.35

       

      0.72

       

      21

       
    2. Based on the deviations, which sample is more spread out?







       

    Data values with large deviations (that are farther away from the mean) contribute more to the variability in the data set. Values with small deviations do not contribute as much to the overall variability of a data set. To measure the total amount of variability, we need to combine the deviations into a single number.

    1. One way we might do this is by finding the average deviation from the mean. Let’s try it! Add all the deviations for the Kansas City Royals. Interpret the result.




       

    Standard Deviation

    The standard deviation is a measure of variability that describes the typical deviation from the mean for all values in a set of data. To compute the standard deviation of a sample, we complete the following steps:


    1. Complete the table with the square deviations for each value in the sample. Find the total square deviations.

      Kansas City Royals

      Value

      Deviation

      (Deviation)2

      4.83

       
       

      0.7

       
       

      4.75

       
       

      0.71

       
       

      1.3

       
       

      0.71

       
       

      Total square deviations:

    2. The sum of squared deviations is one way to represent the variability in a distribution. But it is not a commonly used measure. A more commonly used measure is the sample variance which is found by dividing the sum of squared deviations by one less than the sample size. In other words, the sample variance is an average of the squared deviations. Formulaically, we say

      \[s^2=\dfrac{\text { sum of squared deviations }}{\text { sample size minus } 1}=\dfrac{\Sigma\left(x_i-\bar{x}\right)^2}{n-1}\nonumber\]

      Compute the sample variance for the Kansas City Royals.


       

    3. The sample standard deviation, denoted by \(s\), is the square root of the sample variance. Calculate the sample standard deviation for the Kansas City Royals. Include units in your answer.





       
    4. Can the standard deviation be negative? Explain.






       
    5. Can the standard deviation be zero? Explain.






       

    The process of calculating the standard deviation by hand has many steps and is time consuming, even for small sets of data. Therefore, we will usually use technology to compute the mean and standard deviation for us.

    1. Use the following instructions to use the desmos graphing calculator to compute the sample standard deviation for the Los Angeles Dodgers.
      1. Go to https://www.desmos.com/calculator.
      2. Copy the values from the data set into the first line. Type \(D=[6,0.73,2.75,1.35,0.72,21]\). Hit enter on your keyboard to go to the next line.
      3. Type mean(D) to compute the sample mean. Hit enter on your keyboard to go to the next line. \(\bar{x}=\) __________
      4. Type stdev(D) to compute the sample standard deviation. \(s=\)___________
         
    2. Compare the sample standard deviations found in 7 and 8. Which standard deviation is higher? Which distribution is more spread out? Do these values adequately represent the spread/variability in each sample of salaries?







       
    3. In the sample of salaries from the Los Angeles Dodgers, which value impacts the standard deviation the most?




       
    4. Let’s say that 21 was incorrectly recorded as 12. Use desmos to find the standard deviation of the set with the error in it: \(D=[6,0.73,2.75,1.35,0.72,12]\). How does this error affect the sample standard deviation?

      Old standard deviation:

      New standard deviation (from set with error):




       

    Outliers and skewing have a large effect on the standard deviation. Therefore, we say that the standard deviation is not a resistant measure of variability.

    Summary

    We have examined in detail two measures of center (mean and median) and three measures of variability or spread (range, interquartile range, and standard deviation). We calculated the IQR using the median. We calculated the standard deviation using the mean. In deciding which measures of center and spread to use, we need to remember two things:

    • Mean and standard deviation go together. Median and IQR go together.
    • Both the mean and standard deviation are influenced by outliers and skew.

    When the data are skewed or contain outliers, we usually use the median and IQR to summarize the data. When the data are reasonably symmetric we use the mean and standard deviation. In addition, these summary values are never enough. We should always look at a graph as well. This can be a dotplot, histogram, or boxplot.

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     

     


    2.4: Quantifying Variability Relative to the Mean is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by LibreTexts.

    • Was this article helpful?