Skip to main content

Registration is now open for this year's LibreFest! Join us virtually the week of July 13.

Register here
Statistics LibreTexts

3.2: Measures of the Center of the Data

  • Page ID
    10925
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Introduction

    For Descriptive Statistics, in addition to graphical summaries of our data, there are also calculated summaries that we can use to identify meaningful information about our population of study. 

    Reminder: Population v. Sample

    Population is the collection of all of the persons, things or objects under study. The calculations that result from using a data set from the population are called Parameters. 

    Sample is portion, or subset, of the population that we collect data from for the study. The calculations that result from using a data set from the sample are called Statistics.

    Measures of center, or central tendency, give the researcher a sense of what data value or values the research subjects favor. There are four types of measures of center: Mean, Median, Mode, and Midrange.  

    Mean and Median

    The two most widely used measures of the "center" of the data are the mean and the median.

    Note

    The words “mean” and “average” are often used interchangeably. The substitution of one word for the other is common practice. The technical term is “arithmetic mean” and “average” is technically a center location. However, in practice among non-statisticians, “average" is commonly accepted for “arithmetic mean.”

    Definition: Mean

    The mean is the sum of the values, divided by the total number of values. 

    The letter used to represent the sample mean is an \(x\) with a bar over it (pronounced “\(x\) bar”): \(\overline{x}\). The letter used to represent the sample size is \(n\).

    \[\bar{x} = \dfrac{x_1+x_2+...+x_n}{n} \]

    The Greek letter \(\mu\) (pronounced "mew") represents the population mean. The letter used to represent the population size is \(N\).

    \[\mu = \dfrac{x_1+x_2+...+x_N}{N} \]

    One of the requirements for the sample mean to be a good estimate of the population mean is for the sample taken to be truly random.

    Round your final answer to one more decimal place than the data values. The units of the mean are the same as the units of the data values.

    When each value in the data set is not unique, the mean can be calculated by multiplying each distinct value by its frequency and then dividing the sum by the total number of data values.

    To see that both ways of calculating the mean are the same, consider the sample:

    1; 1; 1; 2; 2; 3; 4; 4; 4; 4; 4

    \[\bar{x} = \dfrac{1+1+1+2+2+3+4+4+4+4+4}{11} = 2.7\]

    \[\bar{x} = \dfrac{3(1) + 2(2) + 1(3) + 5(4)}{11} = 2.7\]

    In the second calculation, the frequencies are 3, 2, 1, and 5. We will do more with grouped frequencies later. 

    The Law of Large Numbers and the Mean

    The Law of Large Numbers says that if you take samples of larger and larger size from any population, then the mean \(\bar{x}\) of the sample is very likely to get closer and closer to \(\mu\). This is discussed in more detail later in the text.

    Sampling Distributions and Statistic of a Sampling Distribution

    You can think of a sampling distribution as a relative frequency distribution with a great many samples. (See Sampling and Data for a review of relative frequency). Suppose thirty randomly selected students were asked the number of movies they watched the previous week. The results are in the relative frequency table shown below.

    # of movies Relative Frequency
    0

    \(\dfrac{5}{30}\)

    1

    \(\dfrac{15}{30}\)

    2

    \(\dfrac{6}{30}\)

    3

    \(\dfrac{3}{30}\)

    4

    \(\dfrac{1}{30}\)

    If you let the number of samples get very large (say, 300 million or more), the relative frequency table becomes a relative frequency distribution.

    What this means is that although there are technically two formulas for the mean given above, they behave the same way. The only real difference is the notation used to represent the mean. To make sure information is accurately presented and not misrepresented, it is important to use the correct notation when presenting your findings. However, not all of the statistics and parameters we will review in this chapter have the same relationship. While the population mean and the sample mean are calculated the same way, that is because the mean is considered an unbiased estimator as long as it is sampled correctly or we have a sufficiently large sample. 

    In general, the sample mean \(\bar{x}\) is an example of a statistic which estimates the parameter for the population mean \(\mu\).

    Definition: Median

    The median is the midpoint of the data set. 

    You can quickly find the location of the median by using the expression

    \[\dfrac{n+1}{2}\]

    The letter \(n\) is the sample size (the total number of data values in the sample).

    • If \(n\) is an odd number, the median is the middle value of the ordered data (ordered smallest to largest).
    • If \(n\) is an even number, the median is equal to the two middle values added together and divided by two after the data has been ordered.

    Your final answer should be given exactly as calculated, no rounding unless asked to do so. The units of the median are the same as the units of the data values.

    Reminder: Midpoint

    We calculated the midpoint of a class for frequency distributions. 

    \[Midpoint = \dfrac{Lower Limit + Upper Limit}{2}\]

    This is the same calculation describe for the case when \(n\) is an even number.

    For example, if the total number of data values is 97, then

    \[\dfrac{n+1}{2} = \dfrac{97+1}{2} = 49.\]

    The median is the 49th value in the ordered data. If the total number of data values is 100, then

    \[\dfrac{n+1}{2} = \dfrac{100+1}{2} = 50.5.\]

    The median occurs midway between the 50th and 51st values. The location of the median and the value of the median are not the same. The upper case letter \(M\) is often used to represent the median. The next example illustrates the location of the median and the value of the median.

    So, if we have a data set of 50 weights and we want to get a sense of the average weight or the expected weight. We could calculate the mean weight of the 50 people by adding the 50 weights together and dividing by 50. Or we could find the median weight of the 50 people, by ordering the data and finding the number that splits the data into two equal parts. Depending on the data set, the median is generally a better measure of the center when there are extreme values or outliers because it is not affected by the precise numerical values of the outliers but the mean is typically the most common measure of the center.

    Technology

    To find the mean and the median, you can use technology to assist you. Make sure you use the appropriate technology for your class.

    Spreadsheets (Microsoft Excel/Google Sheets):

    1. Enter each datum into its own cell. Usually we use one column for the data.
    2. To find the mean, in the cell below the data, type: =average(
    3. Select your data with your mouse. Make sure you have all your data selected, and no more. This should auto-populate the formula with the cell locations.
      For example, here is a small screenshot.
      Spreadsheet with numbers entered and average formula
    4. Hit enter. The spreadsheet should replace your formula with the mean of the data set.
    5. To find the median, repeat the process, but use the formula: =median(

    TI-83 or TI-84 Graphing Calculator:

    1. Clear list L1. Pres STAT 4:ClrList. Enter 2nd 1 for list L1. Press ENTER.
    2. Enter data into the list editor. Press STAT 1:EDIT.
    3. Put the data values into list L1.
    4. Press STAT and arrow to CALC. Press 1:1-VarStats. Press 2nd 1 for L1 and then ENTER.
    5. Press the down and up arrow keys to scroll.
    Example \(\PageIndex{1}\)

    AIDS data indicating the number of months a patient with AIDS lives after taking a new antibody drug are as follows (smallest to largest):

    3; 4; 8; 8; 10; 11; 12; 13; 14; 15; 15; 16; 16; 17; 17; 18; 21; 22; 22; 24; 24; 25; 26; 26; 27; 27; 29; 29; 31; 32; 33; 33; 34; 34; 35; 37; 40; 44; 44; 47

    Calculate the mean and the median.

    Answer

    The calculation for the mean is:

    \[\bar{x} = \dfrac{3+4+(8)(2)+10+11+12+13+14+(15)(2)+(16)(2)+...+35+37+40+(44)(2)+47}{40} = 23.6\]

    To find the median, \(M\), first use the formula for the location. The location is:

    \[\dfrac{n+1}{2} = \dfrac{40+1}{2} = 20.5\] 

    Starting at the smallest value, the median is located between the 20th and 21st values (the two 24s):

    3; 4; 8; 8; 10; 11; 12; 13; 14; 15; 15; 16; 16; 17; 17; 18; 21; 22; 22; 24; 24; 25; 26; 26; 27; 27; 29; 29; 31; 32; 33; 33; 34; 34; 35; 37; 40; 44; 44; 47

    \[MD = \dfrac{24+24}{2} = 24\] 

    \(\bar{x}\) = 23.6 months, MD = 24 months

    Exercise \(\PageIndex{1}\)

    The following data show the number of months patients typically wait on a transplant list before getting surgery. The data are ordered from smallest to largest. Calculate the mean and median.

    3; 4; 5; 7; 7; 7; 7; 8; 8; 9; 9; 10; 10; 10; 10; 10; 11; 12; 12; 13; 14; 14; 15; 15; 17; 17; 18; 19; 19; 19; 21; 21; 22; 22; 23; 24; 24; 24; 24

    Answer

    Mean: \[3 + 4 + 5 + 7 + 7 + 7 + 7 + 8 + 8 + 9 + 9 + 10 + 10 + 10 + 10 + 10 + 11 + 12 + 12 + 13 + 14 + 14 + 15 + 15 + 17 + 17 + 18 + 19 + 19 + 19 + 21 + 21 + 22 + 22 + 23 + 24 + 24 + 24 = 544\]

    \[\dfrac{544}{39} = 13.95\] months

    Median: Starting at the smallest value, the median is the 20th term, which is 13 months.

    Example \(\PageIndex{2}\)

    Suppose that in a small town of 50 people, one person earns $5,000,000 per year and the other 49 each earn $30,000. Which is the better measure of the "center": the mean or the median?

    Solution

    \[\bar{x} = \dfrac{5,000,000+49(30,000)}{50} = $129,400\]

    \[MD = $30,000\]

    (There are 49 people who earn $30,000 and one person who earns $5,000,000.)

    The median is a better measure of the "center" than the mean because 49 of the values are 30,000 and one is 5,000,000. The 5,000,000 is an outlier. The 30,000 gives us a better sense of the middle of the data.

    Exercise \(\PageIndex{2}\)

    In a sample of 60 households, one house is worth $2,500,000. Half of the rest are worth $280,000, and all the others are worth $315,000. Which is the better measure of the “center”: the mean or the median?

    Answer

    The median is the better measure of the “center” than the mean because 59 of the values are $280,000 and one is $2,500,000. The $2,500,000 is an outlier. Either $280,000 or $315,000 gives us a better sense of the middle of the data.

     

     Mode

    Definition: Mode

    The mode of a set of data is the value that is occurs the most frequently.

    • A data set that has only one value that occurs with the greatest frequency is said to be unimodal.
    • When two data values occur with the same greatest frequency, each one is a mode. The data is called bimodal.
    • When more than two data values occur with the same greatest frequency, each is a mode and the data set is said to be multimodal.
    • When no data value is repeated more than the other data we say that there is no mode.

    Mode is not typically used for numerical data. It is more useful with qualitative data that has a nominal or ordinal level of measurement. The units of the mode are the same as the units of the data values.

    Example \(\PageIndex{3}\)

    Statistics exam scores for 20 students are as follows:

    50; 53; 59; 59; 63; 63; 72; 72; 72; 72; 72; 76; 78; 81; 83; 84; 84; 84; 90; 93

    Find the mode.

    Answer

    The most frequent score is 72, which occurs five times. Mode = 72.

    Exercise \(\PageIndex{3}\)

    The number of books checked out from the library from 25 students are as follows:

    0; 0; 0; 1; 2; 3; 3; 4; 4; 5; 5; 7; 7; 7; 7; 8; 8; 8; 9; 10; 10; 11; 11; 12; 12

    Find the mode.

    Answer

    The most frequent number of books is 7, which occurs four times. Mode = 7.

    Example \(\PageIndex{4}\)

    Five real estate exam scores are 430, 430, 480, 480, 495. The data set is bimodal because the scores 430 and 480 each occur twice.

    When is the mode the best measure of the "center"? Consider a weight loss program that advertises a mean weight loss of six pounds the first week of the program. The mode might indicate that most people lose two pounds the first week, making the program less appealing.

    Statistical software will easily calculate the mean, the median, and the mode. Some graphing calculators can also make these calculations. In the real world, people make these calculations using software.

    Exercise \(\PageIndex{4}\)

    Five credit scores are 680, 680, 700, 720, 720. The data set is bimodal because the scores 680 and 720 each occur twice. Consider the annual earnings of workers at a factory. The mode is $25,000 and occurs 150 times out of 301. The median is $50,000 and the mean is $47,500. What would be the best measure of the “center”?

    Answer

    Because $25,000 occurs nearly half the time, the mode would be the best measure of the center because the median and mean don’t represent what most people make at the factory.

    Note

    The mode can be calculated for qualitative data as well as for quantitative data. For example, if the data set is: red, red, red, green, green, yellow, purple, black, blue, the mode is red.

     

    Midrange

    Definition: Midrange

    The midrange of a data set is the value midway between the maximum and minimum values in the original data set. In other words, it is the midpoint of the data. 

    \[\text{midrange}=\frac{\text{minimum data value} + \text{maximum data value}}{2}\]

    The final answer for midrange should be exactly as calculated, no rounding unless asked to do so. The units of the midrange are the same as the units of the data values.

    Unlike the median, which just focuses on the numbers in the data set and selects the middle data value, the midrange select the midpoint of the number line between the minimum data value and the maximum data value.  

    Example \(\PageIndex{5}\)

    Recall the AIDS data from Example \(\PageIndex{1}\) which indicated the number of months a patient with AIDS lives after taking a new antibody drug are as follows (smallest to largest):

    3; 4; 8; 8; 10; 11; 12; 13; 14; 15; 15; 16; 16; 17; 17; 18; 21; 22; 22; 24; 24; 25; 26; 26; 27; 27; 29; 29; 31; 32; 33; 33; 34; 34; 35; 37; 40; 44; 44; 47

    To find the median, \(M\), we first use the formula for the location of the median within the data set itself.

    The location is:

    \[\dfrac{n+1}{2} = \dfrac{40+1}{2} = 20.5\]

    Starting at the smallest value, the median is located between the 20th and 21st values (the two 24s):

    3; 4; 8; 8; 10; 11; 12; 13; 14; 15; 15; 16; 16; 17; 17; 18; 21; 22; 22; 24; 24; 25; 26; 26; 27; 27; 29; 29; 31; 32; 33; 33; 34; 34; 35; 37; 40; 44; 44; 47

    \[MD = \dfrac{24+24}{2} = 24\]

    \(\bar{x}\) = 23.6, MD = 24

    Now, find the midrange of the data set.

    Answer

    The calculation for the midrange is:

    \[\text{midrange} = \dfrac{3 + 47}{2} = 25\]

     

    The midrange is often not selected as a good measure of center because it is extremely sensitive to outliers 

     

    Calculating the Mean of Grouped Frequency Tables

    When only grouped data is available, you do not know the individual data values (we only know intervals and interval frequencies); therefore, you cannot compute an exact mean for the data set. What we must do is estimate the actual mean by calculating the mean of a frequency table. A frequency table is a data representation in which grouped data is displayed along with the corresponding frequencies. To calculate the mean from a grouped frequency table we can apply the basic definition of mean:

    \[mean = \dfrac{\text{data sum}}{\text{number of data values}}.\]

    We simply need to modify the definition to fit within the restrictions of a frequency table.

    Since we do not know the individual data values we can instead find the midpoint of each interval. The midpoint is

    \[\dfrac{\text{lower boundary+upper boundary}}{2}.\]

    We can now modify the mean definition to be

    \[\text{Mean of Frequency Table} = \dfrac{\sum{fm}}{\sum{f}}\]

    where \(f\) is the frequency of the interval and \(m \) is the midpoint of the interval.

    Example \(\PageIndex{6}\)

    A frequency table displaying professor Blount’s last statistic test is shown. Find the best estimate of the class mean.

    Grade Interval Number of Students
    50–56.5 1
    56.5–62.5 0
    62.5–68.5 4
    68.5–74.5 4
    74.5–80.5 2
    80.5–86.5 3
    86.5–92.5 4
    92.5–98.5 1

    Solution

    • Find the midpoints for all intervals
    Grade Interval Midpoint
    50–56.5 53.25
    56.5–62.5 59.5
    62.5–68.5 65.5
    68.5–74.5 71.5
    74.5–80.5 77.5
    80.5–86.5 83.5
    86.5–92.5 89.5
    92.5–98.5 95.5
    • Calculate the sum of the product of each interval frequency and midpoint. \(\sum{fm} 53.25(1) + 59.5(0) + 65.5(4 )+ 71.5(4) + 77.5(2) + 83.5(3) + 89.5(4) + 95.5(1) = 1460.25\)
    • \(\mu = \dfrac{\sum{fm}}{\sum{f}} = \dfrac{1460.25}{19} = 76.86\)

    Notice: Since a population was used for this calculation, the variable used to represent mean is \(\mu\) and not \(\bar{x}\)

    Exercise \(\PageIndex{5}\)

    Maris conducted a study on the effect that playing video games has on memory recall. As part of her study, she compiled the following sample data:

    Hours Teenagers Spend on Video Games Number of Teenagers
    0–3.5 3
    3.5–7.5 7
    7.5–11.5 12
    11.5–15.5 7
    15.5–19.5 9

    What is the best estimate for the mean number of hours spent playing video games?

    Answer

    Find the midpoint of each interval, multiply by the corresponding number of teenagers, add the results and then divide by the total number of teenagers

    The midpoints are 1.75, 5.5, 9.5, 13.5,17.5.

    \[MD = (1.75)(3) + (5.5)(7) + (9.5)(12) + (13.5)(7) + (17.5)(9) = 409.75\]

     

    Weighted Mean

    A type of grouped frequency table calculation is weighted mean. The frequency counts in the formula:

    \[\text{Mean of Frequency Table} = \dfrac{\sum{fm}}{\sum{f}}\]

    instead become known as the "weight" of the data value \(w\).

    \[\text{Weighted Mean} = \dfrac{\sum{wm}}{\sum{w}}\]

    This is typically something you have seen whenever you review your grade point average (GPA) at your school, your course grade if your instructor weights their grade book categories, and more. This idea is revisited later on with Expected Value.

    Example \(\PageIndex{7}\)

    A student receives the following report card for the semester:

    Spring Grades
    Course Credits Grade
    Math 4 A (4 points)
    English 3 C (2 points)
    Economics 3 B (3 points)

    Calculate the student's grade point average.

    Answer

    First, we need to identify which information represents the weight and which information represents the data values. Since we want to find the average grade point, the grades column are the data values. The weight of each grade is the number of units associated with that grade. 

    \[\bar{x} = \frac{(4)(4) + (3)(2) + (3)(3)}{10} = 3.1\]

    So this student has a 3.1 GPA for the spring semester.

    Skewness and the Mean, Median, and Mode

    Consider the following data set.

    4; 5; 6; 6; 6; 7; 7; 7; 7; 7; 7; 8; 8; 8; 9; 10

    This data set can be represented by following histogram. Each interval has width one, and each value is located in the middle of an interval.

    This histogram matches the supplied data. It consists of 7 adjacent bars with the x-axis split into intervals of 1 from 4 to 10. The heighs of the bars peak in the middle and taper symmetrically to the right and left.
    Figure \(\PageIndex{1}\)

    The histogram displays a symmetrical distribution of data. A distribution is symmetrical if a vertical line can be drawn at some point in the histogram such that the shape to the left and the right of the vertical line are mirror images of each other. The mean, the median, and the mode are each seven for these data. In a perfectly symmetrical distribution, the mean and the median are the same. This example has one mode (unimodal), and the mode is the same as the mean and median. In a symmetrical distribution that has two modes (bimodal), the two modes would be different from the mean and median.

    The histogram for the data: 4; 5; 6; 6; 6; 7; 7; 7; 7; 8 is not symmetrical. The right-hand side seems "chopped off" compared to the left side. A distribution of this type is called skewed to the left because it is pulled out to the left.

    This histogram matches the supplied data. It consists of 5 adjacent bars with the x-axis split into intervals of 1 from 4 to 8. The peak is to the right, and the heights of the bars taper down to the left.
    Figure \(\PageIndex{2}\)

    The mean is 6.3, the median is 6.5, and the mode is seven. Notice that the mean is less than the median, and they are both less than the mode. The mean and the median both reflect the skewing, but the mean reflects it more so.

    The histogram for the data: 6; 7; 7; 7; 7; 8; 8; 8; 9; 10, is also not symmetrical. It is skewed to the right.

    This histogram matches the supplied data. It consists of 5 adjacent bars with the x-axis split into intervals of 1 from 6 to 10. The peak is to the left, and the heights of the bars taper down to the right.
    Figure \(\PageIndex{3}\)

    The mean is 7.7, the median is 7.5, and the mode is seven. Of the three statistics, the mean is the largest, while the mode is the smallest. Again, the mean reflects the skewing the most.

    Generally, if the distribution of data is skewed to the left, the mean is less than the median, which is often less than the mode. If the distribution of data is skewed to the right, the mode is often less than the median, which is less than the mean.

    Skewness and symmetry become important when we discuss probability distributions in later chapters.

    Example \(\PageIndex{8}\)

    Statistics are used to compare and sometimes identify authors. The following lists shows a simple random sample that compares the letter counts for three authors.

    • Terry: 7; 9; 3; 3; 3; 4; 1; 3; 2; 2
    • Davis: 3; 3; 3; 4; 1; 4; 3; 2; 3; 1
    • Maris: 2; 3; 4; 4; 4; 6; 6; 6; 8; 3
    1. Make a dot plot for the three authors and compare the shapes.
    2. Calculate the mean for each.
    3. Calculate the median for each.
    4. Describe any pattern you notice between the shape and the measures of center.

    Solution

    a.

    This dot plot matches the supplied data for Terry. The plot uses a number line from 1 to 10. It shows one  x over 1, two x's over 2, four x's over 3, one  x over 4, one x over 7, and one x over 9. There are no x's over the numbers 5, 6, 8, and 10.
    Figure \(\PageIndex{4}\): This dot plot matches the supplied data for Terry. The plot uses a number line from 1 to 10. It shows one x over 1, two x's over 2, four x's over 3, one x over 4, one x over 7, and one x over 9. There are no x's over the numbers 5, 6, 8, and 10.
    This dot plot matches the supplied data for Davi. The plot uses a number line from 1 to 10. It shows two  x's over 1, one x over 2, five x's over 3, and two x's over 4. There are no x's over the numbers 5, 6, 7, 8, 9, and 10.
    Figure \(\PageIndex{5}\): Copy and Paste Caption here. (Copyright; author via source)
    This dot plot matches the supplied data for Mari. The plot uses a number line from 1 to 10. It shows one x over 2, two x's over 3, three x's over 4, three x's over 6, and one  x over 8. There are no x's over the numbers 1, 5, 7, 9, and 10.
    Figure \(\PageIndex{6}\): Copy and Paste Caption here. (Copyright; author via source)
    • Terry’s mean is 3.7, Davis’ mean is 2.7, Maris’ mean is 4.6.
    • Terry’s median is three, Davis’ median is three. Maris’ median is four.
    • It appears that the median is always closest to the high point (the mode), while the mean tends to be farther out on the tail. In a symmetrical distribution, the mean and the median are both centrally located close to the high point of the distribution.

     

     

    Exercise \(\PageIndex{6}\)

    Discuss the mean, median, and mode for each of the following problems. Is there a pattern between the shape and measure of the center?

    a. Dotplot

    This dot plot matches the supplied data. The plot uses a number line from 0 to 14. It shows two  x's over 0, four x's over 1, three x's over 2, one x over 3, two x's over the number 4, 5, 6, and 9, and 1 x each over 10 and 14. There are no x's over the numbers 7, 8, 11, 12, and 13.
    Figure \(\PageIndex{7}\): This dot plot matches the supplied data. The plot uses a number line from 0 to 14. It shows two x's over 0, four x's over 1, three x's over 2, one x over 3, two x's over the number 4, 5, 6, and 9, and 1 x each over 10 and 14. There are no x's over the numbers 7, 8, 11, 12, and 13.

    b. Stem and Leaf Graph

    The Ages Former U.S Presidents Died
    4 6 9
    5 3 6 7 7 7 8
    6 0 0 3 3 4 4 5 6 7 7 7 8
    7 0 1 1 2 3 4 7 8 8 9
    8 0 1 3 5 8
    9 0 0 3 3

    c. Histogram

    This is a histogram titled Hours Spent Playing Video Games on Weekends. The x-axis shows the number  of hours spent playing video games with bars showing values at intervals of 5. The y-axis shows the number of students. The first bar for 0 - 4.99 hours has a height of 2. The second bar from 5 - 9.99 has a height of 3. The third bar from 10 - 14.99 has a height of 4. The fourth bar from 15 - 19.99 has a height of 7. The fifth bar from 20 - 24.99 has a height of 9.
    Figure \(\PageIndex{8}\): This is a histogram titled Hours Spent Playing Video Games on Weekends. The x-axis shows the number of hours spent playing video games with bars showing values at intervals of 5. The y-axis shows the number of students. The first bar for 0 - 4.99 hours has a height of 2. The second bar from 5 - 9.99 has a height of 3. The third bar from 10 - 14.99 has a height of 4. The fourth bar from 15 - 19.99 has a height of 7. The fifth bar from 20 - 24.99 has a height of 9.
    Answer

    Discuss

    Pros and Cons for Measures of Center

    • Mean
      • Calculated by using all the values of data.
      • Varies less than the median or mode when samples are repeatedly taken from the same population.
      • Used in computing other statistics.
      • Always a single value and not necessarily one of the data values.
      • Affected by extremely high or low values, outliers.
    • Median
      • Used to find middle value of a data set.
      • Used to determine whether the data values fall into the upper or lower half of the distribution.
      • Affected less than the mean by extremely high or low values.
    • Mode:
      • Used when the we want the most frequently used data value.
      • Easiest measure of center to compute.
      • Can be used with qualitative or quantitative data.
      • May not exist or may be more than one value. 
    • Midrange:
      • Easy to compute.
      • Gives a midpoint of the data set.
      • Strongly affected by extremely high or low values.

    Mean, median and mode can lead to information about the characteristics of a distribution.

    • If the mean, median and mode are all the same the distribution can appear symmetric.
    • If the mode is greatly different from the mean and median, the data can appear skewed. Graphically, this means the distribution is not symmetric.

     

    References

    1. Data from The World Bank, available online at http://www.worldbank.org (accessed April 3, 2013).
    2. “Demographics: Obesity – adult prevalence rate.” Indexmundi. Available online at http://www.indexmundi.com/g/r.aspx?t=50&v=2228&l=en (accessed April 3, 2013).

    Review

    The mean and the median can be calculated to help you find the "center" of a data set. The mean is the best estimate for the actual data set, but the median is the best measurement when a data set contains several outliers or extreme values. The mode will tell you the most frequently occuring datum (or data) in your data set. The mean, median, and mode are extremely helpful when you need to analyze your data, but if your data set consists of ranges which lack specific values, the mean may seem impossible to calculate. However, the mean can be approximated if you add the lower boundary with the upper boundary and divide by two to find the midpoint of each interval. Multiply each midpoint by the number of values found in the corresponding range. Divide the sum of these values by the total number of data values in the set.

    Looking at the distribution of data can reveal a lot about the relationship between the mean, the median, and the mode. There are three types of distributions. A left (or negative) skewed distribution has a shape like Figure \(\PageIndex{2}\). A right (or positive) skewed distribution has a shape like Figure \(\PageIndex{3}\). A symmetrical distribution looks like Figure \(\PageIndex{1}\).

    Formula Review

    \[\mu = \dfrac{\sum{fm}}{\sum{f}} \]

    where \(f\) = interval frequencies and \(m\) = interval midpoints.

    Glossary

    Frequency Table
    a data representation in which grouped data is displayed along with the corresponding frequencies
    Mean
    a number that measures the central tendency of the data; a common name for mean is 'average.' The term 'mean' is a shortened form of 'arithmetic mean.' By definition, the mean for a sample (denoted by \(\bar{x}\)) is \(\bar{x} = \dfrac{\text{Sum of all values in the sample}}{\text{Number of values in the sample}}\), and the mean for a population (denoted by \(\mu\)) is \(\mu = \dfrac{\text{Sum of all values in the population}}{\text{Number of values in the population}}\).
    Median
    a number that separates ordered data into halves; half the values are the same number or smaller than the median and half the values are the same number or larger than the median. The median may or may not be part of the data.
    Midpoint
    the mean of an interval in a frequency table
    Mode
    the value that appears most frequently in a set of data

    Contributors and Attributions

    • Barbara Illowsky and Susan Dean (De Anza College) with many other contributing authors. Content produced by OpenStax College is licensed under a Creative Commons Attribution License 4.0 license. Download for free at http://cnx.org/contents/30189442-699...b91b9de@18.114.


    This page titled 3.2: Measures of the Center of the Data is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by OpenStax via source content that was edited to the style and standards of the LibreTexts platform.