Skip to main content
Statistics LibreTexts

9.4: Variation

  • Page ID
    64255

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    The second most important characteristic of a set of data is how far away from the location the data points typically are found. This characteristic is called variation. To demonstrate what differences in variation look like given different data sets, consider the data found in Tables 9.4 and 9.5. The data in these tables correspond to simulated data that represents the waiting times for emergency services at two hospitals. The data in Tables 9.4 and 9.5 correspond to the waiting times of 25 randomly sampled cases at a suburban and urban hospital, respectively. A comparison of the locations of the data in these tables show that the mean waiting time at the suburban hospital is about 100, which matches that of the urban hospital. Similarly, the median waiting time at the suburban hospital is about 101 and the median of the urban hospital is 103. Hence, both sets of data have very similar locations. However, a close look at the data listed in the two tables reveals some fundamental differences.

    Table 9.4 Twenty-five simulated waiting times (in minutes) in the emergency room of a suburban hospital emergency room.

    91

    116

    98

    104

    81

    103

    114

    105

    99

    105

    91

    84

    81

    79

    105

    103

    103

    125

    98

    104

    94

    101

    101

    101

    110

    Table 9.5 Twenty-five simulated waiting times (in minutes) in the emergency room of an urban hospital emergency room.

    134

    122

    125

    114

    102

    39

    72

    97

    110

    103

    119

    109

    74

    124

    124

    109

    86

    68

    102

    75

    104

    93

    126

    86

    80

    In Table 9.4 the lowest data point is 79, while the highest data point is 125, and in Table 9.5 the lowest data point is 39, while the highest data point is 134. But it is not with just these extreme data points that we see a difference. If we look at the data in Table 9.5, there seems to be a wider “variety” in the data. The difference between the data in the two tables is that the data in Table 9.5 have more variation than the data in Table 9.4.

    Definition: Variation

    A measure of variation of a set of data is a measure that summarizes the set of data with a single value that represents typically how far data points can be found from the location of the data.

    We will discuss two methods for measuring the variation in a set of data. The first one is simply computed by taking the largest datum in the dataset and subtracting from it the smallest datum. This measure is called the range.

    Definition: Range

    To compute the range of a set of data, subtract the smallest value in the set of data from the largest value in the set of data.

    Example \(\PageIndex{1}\)

    In a previous example we were given five incomes (in thousands of dollars): 23, 54, 44, 77, and 26. Find the range.

    Solution

    To find the range of the incomes we first note that the largest value is 77 and the smallest value is 23. The range is then computed as \(77−23=54\).

    Example \(\PageIndex{2}\)

    In a follow-up example, the largest value is replaced by 97 so that the data are 23, 54, 44, 97, and 26. Find the range.

    Solution

    To find the range for this data we first note that the largest value is now 97, and the smallest value is still 23. The range is then computed as \(97−23=74\).

    Example \(\PageIndex{3}\)

    As another example, the largest value is replaced by 135 so that the data is 23, 54, 44, 135, and 26. Find the range.

    Solution

    To find the range for this data we first note the largest value is 135 and the smallest value in the data is still 23. The range is then computed as \(135−23=112\).

    In each of the calculations in the previous example the range increases as the largest value increases. This is not a surprise as the range is computed based on this value. Let us consider once again the hypothetical study of first-year salaries for those graduating with a bachelor's degree at a particular university with the data given in Tables 9.1 and 9.2. For the data in Table 9.1, the largest value is 66 while the smallest value is 33, and therefore the range is computed as \(66−33=33\). For the data in Table 9.2, the largest value is 82 while the smallest value is 42, and the range is computed as \(82−42=40\). Therefore we can conclude that there is slightly more variation in the data in Table 9.1 than in Table 9.2. For the simulated waiting room data in Table 9.5, the largest value is 134 while the smallest value is 39 so that the range is \(134−39=95\), while for the data in Table 9.4, the largest value is 125 while the smallest value is 79 so that the range is \(125−79=46\). The calculations verify the general trend observed earlier in that the waiting times for the urban hospital emergency room have much more variation than the emergency room waiting times for the suburban hospital emergency room.

    In the previous example calculations, the range appears to do a decent job of telling us about variation in the data, but it is not used very often as a measure of variation in research studies. There are many reasons for this that have to do with statistical theory, but from an intuitive viewpoint it is easy to see that the range only uses information from the data corresponding to two data points: the maximum and minimum data points. In the definition of variation, we get a sense that variation is about how far all the values in the data are typically found from the location. We will consider in some detail a measure of variation that is used in most statistical applications: the standard deviation.

    To develop the idea of the standard deviation, we will consider the simple set of data from a previous example consisting of the values 23, 54, 44, 77, and 26. Variation describes how far the data points are from a measure of location, and to develop the standard deviation we will use the mean as the measure of location. It is easy to compute how far away each data value is from the mean. These distances, as we will compute them here, are called the absolute deviations from the mean. Referring to Figure 9.4 will aid in this development. As before, the data are indicated on the number line using the boxes and the mean is indicated by the tip of the triangle below the number line. We begin with the first value, which is equal to 23. The difference between the mean, which is equal to 46.8, and this value will tell us how far the data is from the mean. In this case the value of the absolute deviation is \(46.8−23.0=23.8\). This is indicated by the distance between the dashed green vertical line and the dash blue vertical line in Figure \(\PageIndex{1}\). The second value is equal to 36. The difference between the mean and this value is \(46.8−36.0=10.8\). This is indicated by the distance between the dashed orange vertical line and the dash blue vertical line in Figure \(\PageIndex{1}\). Similarly, the absolute deviation for the next value is \(46.8−44.0=2.8\), as indicated in grey in Figure \(\PageIndex{1}\).

    A diagram of numbers and a triangle

AI-generated content may be incorrect.
    Figure \(\PageIndex{1}\): Computing the absolute deviations for the data from the example. The vertical dashed line indicates the location of the data as measured by the mean. The absolute deviations are indicated by the distances between each point and the mean (Public domain figure created by Alan M. Polansky).

    The calculation of the absolute deviations changes slightly when we get to the next value, which is 54. Because distances are always greater than zero, to get the absolute deviation we will take the mean and subtract it from 54. That is, the absolute deviation is \(54.0−46.8=7.2\), as indicated in purple in Figure \(\PageIndex{1}\). The last absolute deviation is \(77.0−46.8=30.2\), indicated in light blue in Figure \(\PageIndex{1}\). The absolute deviations are listed in Table 9.6.

    Table 9.6 The calculations used in computing the standard deviation of the of the data from the example. The data values are given in the first column, while for convenience the mean is listed in the second column. The squared absolute deviations are given in the fourth column.

    Value

    Mean

    Absolute

    Deviation

    Squared Absolute Deviation

    23

    46.8

    23.8

    566.44

    36

    46.8

    10.8

    116.64

    44

    46.8

    2.8

    7.84

    54

    46.8

    7.2

    51.84

    77

    46.8

    30.2

    912.04

    Sum

    1654.80

    Now that we have measured the distance between each point, what do we do with them? We would like to summarize the variation in the data with a single number, but at this point we have computed the distance between each data value and the mean, so we have as many of these absolute deviations as data values. One very tempting possibility is to compute the mean of the absolute deviations to get a summary of variation, called the mean absolute deviation. This measure seems to have a good intuitive foundation, but you will find that it is almost never used in practice. The reason for this is complicated, but it turns out that the mean absolute deviation has statistical properties that make it very inconvenient to work with (Casella and Berger 2024). So, what do we do? The standard deviation is based on computing something like the mean square distance between each value and the mean. For example, the first absolute deviation is equal to 23.8, and if we multiply that value by itself, that is we square it, we get \(23.8\times 23.8=23.82^2=566.44\).

    Continuing, the next absolute deviation is equal to 10.8, and if we multiply that value by itself, that is we square it, we get \(10.8\times 10.8=10.82^2=116.64\). The remaining squared absolute deviations are computed in a similar manner and are listed in the third column of Table 9.6.

    Now that we have computed all the squared absolute deviations we can add them together, indicated as the sum in Table 9.6 as 1654.80. We then take the sum of the squared absolute deviations and by divide by one less than the number of data points, to get \(1654.80\div 4=413.7\). Once again, it might be a little confusing as to why we would divide by one less than the number of data points. It has been shown mathematically that dividing by the sample size understates the amount of variation in the data by a very specific amount. It has also been shown mathematically that dividing by one less than the sample size provides a good measure of the variation in the data (Casella and Berger 2024). This value is called the variance of the data. However, we are not quite done as there is still a problem with this measure in that it is in square units, instead of the same units that we started with. For example, for this data we are considering here the variance is measured in squared thousands of dollars, whereas the original data is in thousands of dollars. This means that we cannot compare the variation of the data to its mean value, which is measured in dollars. To take care of this, we take the square root of the variance to obtain \(\sqrt{413.7}=20.33\), to get the standard deviation.

    Technically the standard deviation is the square root of the unbiased mean square deviation, but this complicated name does not help us interpret what is going on with the data too much. What is important is getting an approximate intuitive feeling for what the standard deviation is telling us about the data. The easiest interpretation is based on what is commonly known as the empirical rule or also as the three-sigma rule which states that for many sets of data, the standard deviation can be used to imply how much of the data is within a certain distance from the mean.

    Theorem \(\PageIndex{1}\) [The Empirical Rule]

    For most data:

    • Approximately 68% of the data is within one standard deviation of the mean.
    • Approximately 95% of the data is within two standard deviations of the mean.
    • Approximately 99% of the data is within three standard deviations of the mean.

    The first thing to note about the result is that the conclusions are approximate, and do not hold exactly. If the data has a strange structure, or if there aren’t a lot of data points, then the percentage stated in the result will not be exact, and in some cases are not accurate at all. Regardless, the empirical rule provides a handy method for intuitively interpreting the standard deviation even if the result is not always exact.

    We first need to understand what the result means starting with the first statement that approximately 68% of the data is within one standard deviation of the mean. Data points that are within one standard deviation of the mean include data points as small as the standard deviation subtracted from the mean, or

    \[\text{mean}−\text{standard deviation},\]

    or as large as the standard deviation added to the mean, that is,

    \[\text{mean}+\text{standard deviation}.\]

    What this means is that the distance between any data point in this range and the mean is less than the standard deviation of the data. The empirical rule then states that about 68% of the data points will have a distance to the mean less than the standard deviation.

    Example \(\PageIndex{4}\)

    The mean and the standard deviation of the data presented in Table 9.4 are 99.8 and 11.1, respectively. Use the Empirical Rule to find ranges that contain 68%, 95%, and 99% of the data.

    Solution

    In Table 9.7 we have listed each data value, along with the distance between the data value and the mean. Values that are within one standard deviation of the mean will have a value as small as

    \[\text{mean}-\text{standard deviation}=99.8-11.1=88.7,\]

    and as large as

    \[\text{mean}+\text{standard deviation}=99.8+11.1=110.9.\]

    Therefore, any datum with a value between 88.7 and 110.9 is within one standard deviation of the mean. For this data, the eighteen values 91, 98, 104, 103, 105, 99, 105, 91, 105, 103, 103, 98, 104, 94, 101, 101, 101, and 110 are within one standard deviation of the mean. This corresponds to about 72% of the data, a little more than is predicted by the empirical rule.

    Table 9.7 The data from Table 9.4 along with the distance between each value and the mean (99.8), and whether the datum is within one, two, or three standard deviations (11.1) of the mean.

    Datum

    Distance

    Standard Deviations

    91

    8.8

    1

    116

    16.2

    2

    98

    1.8

    1

    104

    4.2

    1

    81

    18.8

    2

    103

    3.2

    1

    114

    14.2

    2

    105

    5.2

    1

    99

    8.8

    1

    105

    5.2

    1

    91

    8.8

    1

    84

    15.8

    2

    81

    18.8

    2

    79

    20.8

    2

    105

    5.2

    1

    103

    3.2

    1

    103

    3.2

    1

    125

    25.2

    3

    98

    1.8

    1

    104

    4.2

    1

    94

    5.8

    1

    101

    1.2

    1

    101

    1.2

    1

    101

    1.2

    1

    110

    10.2

    1

    Now we can consider the meaning of the remaining results for the Empirical Rule. The second statement states that approximately 95% of the data is within two standard deviations of the mean. This means that the datum could be as small as two times the standard deviation subtracted from the mean, that is, the datum could be as small as

    \[\text{mean}-2\times\text{standard deviation}=99.8-2\times 11.1=77.6,\]

    or as large as two times the standard deviation added to the mean, that is, the datum could be as large as

    \[\text{mean}+2\times\text{standard deviation}=99.8+2\times 11.1=122.0.\]

    Therefore, any datum with a value between 77.6 and 122.0 is within two standard deviations of the mean. For this data there are 24 values within two standard deviations of the mean. This corresponds to about 96% of the data, only a little more than is predicted by the empirical rule.

    Similarly, the third statement that approximately 99% of the data is within three standard deviations of the mean indicates that the datum could be as small as three times the standard deviation subtracted from the mean, that is, the datum could be as small as

    \[\text{mean}-3\times\text{standard deviation}=99.8-3\times 11.1=66.5,\]

    or as large as three times the standard deviation added to the mean, that is, the datum could be as large as

    \[\text{mean}+3\times\text{standard deviation}=99.8+3\times 11.1=133.1.\]

    Therefore, any datum with a value between 66.5 and 133.1 is within three standard deviations of the mean. All the data is within three standard deviations of the mean, again very close to what is predicted by the empirical rule.

    What all this means is that when you observe the mean and the standard deviation for a set of data, you now have information about how much data is clumped within certain ranges of the mean. From intuition, this interpretation can be taken to be an informal rule to give you an idea about what can be expected in the data. For example, suppose that you have taken an exam and the instructor reports that the mean grade on the exam was 72 with a standard deviation equal to 5. You look at your score and find that you have scored a 97. This is a good score, but how does it compare with your classmates? From the empirical rule you know that roughly two thirds of the class (68%) scored within one standard deviation of the mean. That is, roughly two thirds of the class scored between \(72.0−5.0=67.0\) and \(72.0+5.0=77.0\). This tells you that you scored better than anyone in that range, which is two thirds of the class. That is pretty good! But the empirical rule tells you even more, that roughly 95% scored within two standard deviations of the mean. That is, roughly 95% of the class scored between \(72.0−10.0=62.0\) and \(72.0+10=82.0\). This tells you that you scored better than anyone in that range, which is roughly 95% of the class. That is very good! But the empirical rule is not done yet. It tells you even more in that roughly 99% scored within three standard deviations of the mean. That is, roughly 99% of the class scored between \(72.0−15=57.0\) and \(72.0+15.0=87.0\). This tells you that you scored better than anyone in that range, which is roughly 99% of the class. That is excellent!

    We will conclude our discussion by assessing the location and variation of the data considered in this section. Recall that Tables 9.1 and 9.2 contain the first-year salaries for people of color and white graduates from a university, respectively. Table 9.8 shows the mean, median, range, and standard deviation for the data in these tables. From the information in the table, we can observe that the typical person of color will have a salary about 10,000 less than the white alumni. This is consistently indicated by both the mean and the median. Both measures of variation indicate that the white salary data have slightly more variation.

    Table 9.8 Summary measures of location and variation for the first-year salaries of graduates from a university.

    Alumni Group

    Mean

    Median

    Range

    Standard Deviation

    People of Color

    49.5

    49.0

    33.0

    7.2

    White

    60.0

    60.0

    40.0

    7.7

    For the simulated salary data given in Table 9.3 we had previously computed the mean and the median to be 50.1 and 30.0 (in thousands of dollars), respectively. The range and standard deviation of the data are 900 and 95.7, respectively. For this data we can assess how unusual the top salaries for this company really are. According to the empirical rule, virtually all the data (99%) should be within three standard deviations of the mean. Hence, 99% of the data should be between \(50.1-3\times 95.7=-237.0\) and \(50.1+3\times 95.7=337.2\).

    We can already see a problem with the lower number as it is negative, and we know that salaries cannot be negative. So, we will cut this value off at zero, and we then conclude that all the salaries should essentially be between 0 and 337. We can see from Table 9.3 that there is one salaries above this value: 919. The salary of 919 is nearly three times the upper limit set by the empirical rule. This is an indication of just how usual this value is—it is well outside what would usually be expected. Such a conclusion may not be too surprising as we had commented before that this very large value was dragging the mean toward it, and why we suggested using the median instead in this case.

    Finally, let us consider the waiting room data contained in Tables 9.4 and 9.5. The summary location and variation measures are presented in Table 9.9. We have already seen that both the mean and medians indicate that the typical waiting for both data sets is around 100 minutes. The ranges and the standard deviations however indicate that there is considerably more variation in the waiting times for the urban hospital. So, for example, a waiting time of 125 minutes at the urban hospital might not be too unusual since that waiting time is only a little more than one standard deviation above the mean, whereas the same waiting time at the suburban hospital would be unusual as 125 is more than two standard deviations above the mean.

    Table 9.9 Summary measures of location and variation the waiting times in the emergency rooms and a suburban and an urban hospital.

    Setting

    Mean

    Median

    Range

    Standard Deviation

    Suburban

    99.9

    101.0

    46.0

    11.1

    Urban

    99.9

    103.0

    95.0

    22.9


    This page titled 9.4: Variation is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by .

    • Was this article helpful?