Skip to main content
Statistics LibreTexts

3. Central Tendency

  • Page ID
    41928
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Learning Objectives

    • Define mean, median, and mode when to use them.
    • Identify who central tendency and the shape of the distribution are related.

    Central Tendency

    Central tendency is a statistical measure to determine a single score that defines the center of a distribution. The goal of central tendency is to find the single score that is most typical or most representative of the entire group. Central tendency attempts to identify the “average” or “typical” distribution. This average value can then be used to provide a simple description of an entire population or a sample. By determining the central tendency, you can identify the shape of distribution. The central tendency of a group of scores (a distribution) refers to the middle of the group of scores. You will learn about three measures of central tendency: mean, mode, and median. Each measure of central tendency uses its own method to come up with a single number describing the middle of a group of scores

    The Mean

    The mean, known as the arithmetic average, is computed by adding all the scores in the distribution and dividing by the number of scores. The mean for a population is identified by the Greek letter µ (pronounced “mew”), and the mean for a sample is identified by M or x̄ (read “x-bar”). Mean are calculated for both population and sample.

    Equation: Population Mean

    \[\mu=\ \frac{\sum X}{N}\]

    Steps for calculating the population mean for the following numbers: 7, 8, 8, 7, 2, 1, 6, 9, 3, 8

    1.      Add up all of the numbers (7+8+8+7+2+1+6+9+3+8) = 59

    2.      Divide the calculated today to the number of score N = 10

    3.    Plug into the equation: \[\mu=\ \frac{\sum X}{N}\]   \[\mu=\ \frac{\sum(7+8+8+7+2+1+6+9+3+8)}{10}\]

    a.       Add up all of the numbers for the nominator (top number)

    b.      Then divide by the denominator (bottom number)

    4.   Solve: \[\mu=\ \frac{\sum59}{10}\]

    5.      Mean = 5.9

    Equation: Sample Mean

    \[M=\ \frac{\sum X}{n}\]

    The Mean is often use to provide context on where the average score would fall. In some cases, you might be asked to combine two sets of scores and then find the overall mean, this is called the weighted mean. To calculate the overall mean, we need two values:

    1.      the overall sum of the scores for the combined group (∑X), and

    2.      the total number of scores in the combined group (n).

    The overall sum for the combined group can be found by adding the sum for the first sample (∑X1) and the sum for the second sample (∑X2).

    The total number of scores in the combined group can be found easily by adding the number of scores in the first sample (n1) and the number in the second sample (n2).

    Equation: Weighted Mean

    Overall mean = M = \[(\sum X_1+\ \sum X_2)/(n_1+n_2\ )\]

    Where \[\sum X_1=M_1\ast n_1\]

    and

    \[\sum X_2=M_2\ast n_2\]

     

    For example: Sample set 1 has a M = 6 and n = 12 and the second sample has M = 7 and n = 8:

    Overall mean = M =\[\frac{\left(6\ast12\right)+(7\ast8)}{(12+8)}\]

    M =\[\frac{(72+56)}{(20)}\]

    M =\[\frac{(128)}{(20)}\]

    M = 6.4

    When to use the Mean

    You calculate the mean when you are dealing with normally distributed interval and Ratio Data.

    Example Video: 

    How to Calculate the Mean

    ***insert Panopto video***

     

    Your Turn

    Follow the steps provided in the video above and work on your own Data Set. Download the Admissions Rate file. Find the Total For the Admission Rates. Then calculate the mean. You will need to answer a series of questions withing the video based on each step of the equation. You will have multiple attempts to answer the questions within the video. You must identify the correct answer before the video will progress forward.

    ***insert Panopto video with built-in quiz questions***

    Median

    The median is the middle value in the range of scores that you have. The goal of the median is to locate the midpoint of the distribution. Unlike the mean, there are no specific symbols or notation to identify the median. Instead, the median is simply identified by the word median. In addition, the definition and computations for the median are identical for a sample and for a population. If the scores in a distribution are listed in order from smallest to largest, the median is the midpoint of the list. More specifically, the median is the point on the measurement scale below which 50% of the scores in the distribution are located.

    The simple technique of listing and counting scores is sufficient to determine the median for most distributions and is always appropriate for discrete variables. With discrete variables, you will be noting either the single middle score or if the data set is an even number, you will be reporting on both. With a continuous variable, however, it is possible to divide a distribution precisely in half so that exactly 50% of the distribution is located below (and above) a specific point. Notice that this technique will always produce a median that is either a whole number or is halfway between two whole numbers. Remember, finding the precise midpoint by dividing scores into fractional parts is sensible for a continuous variable, however, it is not appropriate for a discrete variable.

    Discrete Data Example: Number of children in a household on your block:

    5, 2, 4, 1, 0, 3, 6, 4

    Median = 3 and 4

    Continuous Data Example: Daily temperatures for the last 5 days (n is an odd number):

    75, 80, 85, 92, 95

    Median = 85

    Continuous Data Example: Daily temperatures for the last 6 days (n is an even number):

    75, 80, 85, 92, 95, 100

    Take the 2 middle values, add them up and divide by 2

    Median = (85 + 92) / 2

    Median = 88.5

    When to use the Median

    You report the median when you are dealing with not normally distributed interval and ratio data; the data is skewed. You also report the median when you are dealing with ordinal data.

    Box and Whiskers Plot or a box plot is a way to provide the spread of the data.

    A box and whisker or box plot is constructed from five values: the minimum value, the first quartile (Q1), the median, the third quartile (Q3), and the maximum value. We use these values to compare how close other data values are to them.

    To construct a box plot, use a horizontal or vertical number line and a rectangular box. The smallest and largest data values label the endpoints of the axis. The first quartile marks one end of the box and the third quartile marks the other end of the box. Approximately the middle 50 percent of the data fall inside the box. The "whiskers" extend from the ends of the box to the smallest and largest data values. The median or second quartile can be between the first and third quartiles, or it can be one, or the other, or both. The box plot gives a good, quick picture of the data.

    Box and Whiskers Plot - plane with median in the middle first and third quarters closing the box whiskers are the outliers with a line and dot

    Box and whisker plot. Start point is the lowest number with a line drawn to a vertical bar that shows the first quarter, box connecting to the 3rd quarter with a middle line as the median variable and the end of the 3rd quarter box with a line drawn to the highest value in the data set.

    Image Retrieved from https://sites.google.com/site/pymath...ot-example.png

     

    Basic principles of a box plot:

    • The lower extreme is the lowest value in the data set.
    • The upper extreme is the highest value in the data set.
    • The median is your middle value.
    • Quarters: Each quarter has approximately 25% of the data.
    • Q1 = the middle value between the lower extreme to the median. 
    • Q3= the middle value between the median and upped extreme.
    • Interquartile Range: IQR = Q3 – Q1.

    Mode

    The mode is the score or category that has the greatest frequency. Much like the median, there is no symbol in reporting the mode. The mode is a useful measure of central tendency because it can be used to determine the typical or most frequent value for any scale of measurement, including a nominal scale. The mode also can be useful because it is the only measure of central tendency that corresponds to an actual score in the data; by definition, the mode is the most frequently occurring score. The mean and the median, on the other hand, are both calculated values and often produce an answer that does not equal any score in the distribution. Although a distribution will have only one mean and only one median, it is possible to have more than one mode. A distribution with two modes is said to be bimodal, and a distribution with more than two modes is called multimodal.  

    In the example above, Number of children in a household on your block:

    5, 2, 4, 1, 0, 3, 6, 4

    Mode = 4

    In the example above, Daily temperatures for the last 6 days:

    75, 80, 85, 92, 95, 100

    There is no Mode

    When to use the Mode

    You report the mode when you are dealing with nominal data or discrete variables.

    Shape of the Distribution

    A frequency distribution shows the pattern of frequencies over the various values. A frequency table or histogram describes a frequency distribution because each shows the pattern or shape of how the frequencies are spread out, or “distributed.” When describing the shape of the distribution, you can look that the number of peaks in the data, the symmetry, and the skew.

     

    A graph for a data set describes the distribution of the data, that is, the values the variable takes and the frequency of occurrence of each value. The distribution of the data (or so-called data distribution) is also described by a frequency table. A unimodal distribution frequency distribution with one value clearly having a larger frequency than any other. A bimodal distribution frequency distribution with two approximately equal frequencies, each clearly larger than any of the others. A multimodal distribution frequency distribution with two or more high frequencies separated by a lower frequency; a bimodal distribution is the special case of two high frequencies. A rectangular distribution frequency distribution in which all values have approximately the same frequency.

    Unimodal (1 peak) Bimodal (2 peaks)

    Image from Sprinthall, R.C. (2003). Basic Statistical Analysis 7th ed. Boston, MA: Allyn and Bacon

    Graph sample of unimodal - 1 bar with the highest peak and bimodal at least 2 bars separated within an area that have higher peaks.

     

    Multimodal (multiple peaks)

    Image Retrieved from https://www.cs.cmu.edu/afs/cs/academ...ticle-filters/

    Graph sample of multimodal bars with multiple areas that have peaks that are separated from each other.

     

    Rectangular - plateau

    Image from Sprinthall, R.C. (2003). Basic Statistical Analysis 7th ed. Boston, MA: Allyn and Bacon

    Graph sample of rectangular drawing resembles that data that looks like a rectangular based on the frequency distribution, similar to a plateau. 

     

    The shape of the distribution is often described as symmetric (median = mean) or skewed. A distribution is symmetric if the side of the distribution below a central value is a mirror image of the side above that central value. The distribution is skewed if one side of the distribution stretches out longer than the other side. A distribution is skewed to the left (median > mean) if the left tail is longer than the right tail. A distribution is skewed to the right (median < mean) if the right tail is longer than the left tail. See below for the shape:

    Chart examples of symmetric distribution, right skew, left skew, first is a symmetrical graph - 50% are represented to the left and right of the graph - Mean and Median are the same value. Right skew the peak is to the left and had a tail to the right - Median is lower than the Mean. Left skew has the peak to the right of the tail - Median is higher than the Mean.

    Image from Aron, A., Coups, E.J., Aron, E.N. (2013). Statistics for Psychology 6th ed. Boston, MA: Pearson

    Chart examples of symmetric distribution, right skew, left skew, first is a symmetrical graph - 50% are represented to the left and right of the graph - Mean and Median are the same value. Right skew the peak is to the left and had a tail to the right - Median is lower than the Mean. Left skew has the peak to the right of the tail - Median is higher than the Mean.

     

    The spread of the data will determine how sharp or flat the peak will be. Floor effect situation in which many scores pile up at the low end of a distribution (creating skewness to the right) because it is not possible to have any lower score. Ceiling effect situation in which many scores pile up at the high end of a distribution (creating skewness to the left) because it is not possible to have a higher score. Kurtosis extent to which a frequency distribution deviates from a normal curve in terms of whether its curve in the middle is more peaked or flat than the normal curve. A Leptokurtic (standard deviation < 1/6*range) peak is thin and sharp – most of the data falls close to each other. A Mesokurtic (standard deviation = 1/6*range) would best describe a normal curve. A Platykurtic (standard deviation > 1/6*range) peak means that the data is more spread out. [Definition: Range is the difference between the maximum score minus the minimum score.]

     

    Kurtosis chart. Leptocurtic has a sharp peak that is taller than mesokurtic where the data is closed together. Mesokurtic is similar to the normal distribution with 50% above or below the mean where the data will fall. Playtykurtic the peak is lower than mesokurtic and the data is more spread out.

    Image Retrieved from https://www.quora.com/What-does-a-le...tic-curve-mean

    Kurtosis chart. Leptocurtic has a sharp peak that is taller than mesokurtic where the data is closed together. Mesokurtic is similar to the normal distribution with 50% above or below the mean where the data will fall. Playtykurtic the peak is lower than mesokurtic and the data is more spread out.

     

    Note

    Notations

    The letter N is used to specify how many scores are in a set for a population.

    The letter n identifies the number of scores in a sample.

    The Greek letter sigma, or , is used to stand for summation. The expression X means to add all the scores for variable X.

    Order of Mathematical Operations (PEMDAS)

    Parenthesis

    Exponential

    Multiplication / Division

    Addition / Subtraction

    Steps

    1. Any calculation contained within parentheses is done first.
    2. Squaring (or raising to other exponents) is done second.
    3. Multiplying and/or dividing is done third. A series of multiplication and/or division operations should be done in order from left to right.
    4. Summation using the notation is done next.
    5. Finally, any other addition and/or subtraction is done.

     


    3. Central Tendency is shared under a not declared license and was authored, remixed, and/or curated by LibreTexts.

    • Was this article helpful?