5.2: Three Types of Central Tendency and Why We Need Them
- Page ID
- 49894
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)There are three types of central tendency: mean, median, and mode. You are likely familiar with the definitions of each:
- The mean is the sum of all the values divided by the number of values.
- The median is the midpoint of all the values.
- The mode is the most frequently occurring value. There are pros and cons to each of these values, and these pros and cons are the reasons why we have all three.
5.2.1: The Mean – the Most Honest Central Tendency
The mean is calculated by taking all the values and dividing it by the number of values. Done.
What is the implication of this calculation procedure? The calculation means that one of the pros of the mean is that the mean scores use all the data. In other words, it is “honest” because it accounts for everything. No data are left out. Everything counts.
The con of the mean is that because it does not involve all of the data, the mean is influenced by outliers. Outliers are scores at the extreme high or low ends. The outlier is like a magnet; it pulls the mean towards the outlier. The outlier then distorts the mean because it is not an accurate or best representation of the overall performance of the sample. Rather, the mean represents the performance of the outlier and does not really represent the group.
This situation is common. We see in neighborhoods most of the houses are in a certain price range. But then someone comes along and builds a mansion-style house in the neighborhood that drives up the average price of the homes. In a college course, the students dread being graded on the curve because there is always someone who thinks they are so smart and gets a high grade and raises the curve, creating misery for everyone else in the course. In baseball, we have the batting average of the team, but there is one superstar. These days, there are sometimes two superstars with high batting averages that boost the overall batting average of the team. Outliers are found in many situations that artificially drive up the mean, making the mean less of a representation of the group and more of the outlier.
In actuality, the presence of the outlier may be permissible from a conceptual standpoint. Continuing with the baseball example, a team does have one superstar. The opposing team has to contend with the overall batting average of the team, plus the superstar. In a given classroom, there is always a standout student. It is not like the standout student is an anomaly. Although I do not know anything about real estate, if a person moves in with a high-priced mansion that drives up the housing prices, that situation is likely to occur, and maybe as a trend in the real estate market, you want property values to increase, because you want your property value to increase.
In psychology, outliers are possible. On a given hospital ward, there is one patient who might have severe symptoms far beyond those on that ward. Why is that patient there? Likely because there is nowhere else to admit the patient. On a given night for crisis calls, most of the calls might be routine, but a call might be received where the crisis is far beyond the usual crisis procedures. Conceptually, the issue and the context likely result in outliers. The presence of outliers may actually be expected as part of the sample that gets drawn from the population and phenomenon of interest.
The outlier is usually so far above or below the mean that it conceptually does not make sense to include the outlier in the sample. For example, I do not know anything about HIV issues, but if we were sampling HIV viral loads in patients, and if the clinic usually provides care to patients with moderate HIV viral loads, if a patient arrives with a viral load that is extremely high, in the millions, then the patient’s data should not be included in the mean score. The patient might represent a treatment-resistant case or an unusual case of HIV, and the usual treatment protocols will not apply. However, if a patient has a viral load in the upper thousands, it might be high, but it does not exceed the range of expected levels and could be within the confines of the facility’s treatment offerings. If we were sampling persons convicted of driving under the influence or drunk driving, and the usual range of cases involves one to 10 convictions, and we obtain a person who has 300 convictions, then that person should not be included in the mean score. Someone who has 300 convictions over 365 days is clearly off the charts in terms of drinking and driving behavior. The best way to evaluate if an outlier is adversely affecting the mean score is to do so from a conceptual perspective.
If there is a huge outlier, you wonder whether the outlier really belongs in the sample. If a patient arrives at a hospital ward and is extremely belligerent and psychotic, perhaps that patient belongs in an entirely different facility and should not be counted as part of the sample of patients on the hospital ward. If a crisis call comes in and the authorities need to be notified because it is a medical crisis, not a psychological crisis, then that call event should not be considered part of the parameters of the crisis call hotline. The decision to remove an outlier, because it does affect what the sample mean represents, should be based on conceptual reasons, not just because it is an outlier per se.
In reality, outliers really don't dramatically damage the outcome of a statistical test. Notice that I stated the outcome of a statistical test, not the mean score. The mean score might be affected, but when these scores are entered into a statistical test, the results usually do not dramatically change. Honestly, I do not have any citations or statistical proof for my assertion. All I have is years of experience, and every time I receive a dataset, and there is an outlier, despite everyone’s concern about the presence of the outlier, I find when I conduct my statistical tests in every way possible to account for the outlier, the results do not dramatically change. I believe that when there is a sample that is a sufficient representation of the population, it really takes a huge outlier that is dramatically different from the population to shift the statistical result into something completely different. It is just simply rare for one single participant with a high outlier to completely alter a result.
The bottom line is that people think that outliers are an invalid representation of the sample, but in reality, this perception is really overblown.
Back to the mean. The mean is the most ubiquitous statistic. We use the mean because it does use all the data, and we want to ensure that we are accounting for all the variation when calculating our statistics. It is used all the time and is a common statistic in most statistical analyses and equations.
5.2.2: The Median – More Dishonest than the Mean
The median is the point of distribution at which an equal number of scores (50% of scores) are above and below the median point. The median is not based on the score values; it is the number of scores you have. The median is the point along the continuum of numbers where there are 50% of scores above and below that number. That point is not necessarily one of the numbers in your set of data.
The median, then, is typically useful when there are outliers and you do not want the outliers to influence the value or the central tendency score that you want to represent your group or your sample. Essentially, the median ignores the outliers. Because the median ignores the outliers, you are manipulating the data by dropping information; you are not using all the data or information in the variable. The median is the opposite of the mean because while the mean uses all the data, the median does not. In this way, the median is more “dishonest” than the mean because the median ignores the extreme scores.
In practice, the median is used mostly for summary statistics purposes. In contexts where you want to ignore the extreme scores, the median is often reported rather than the mean. The median is often used for summary statistics only. I have never encountered a situation where the median, instead of the mean, was used as part of a statistical test computation. Instead, the mean is what we use for statistical test computation.
The median is typically used for variables with no upper or lower limits. This means that scores can be anything, which means extreme scores can occur. This situation occurs when there are ratio variables because, as you recall, there is a zero point, and the values can be unlimited. For example, length, money, weight, and time have no upper limit. What typically occurs with ratio variables is that there are gaps all along the continuum. The number of beers drunk can have a gap in the continuum. Most guests at a wedding just have a few drinks, such as five, but then there is one drunk uncle who has way too many at the open bar, such as 20. There will be a gap between drunk uncle and wedding guests in the number of drinks. So, instead of using the mean, which will include the drunk uncle’s drinks, using the median is helpful here because it will not account for the extreme score or the drunk uncle.
Interval and ordinal variables rarely use the median because the ranges of those variables are fixed. For example, interval variables are usually Likert Scales, and we often say, “On a scale from one to five, rate your satisfaction.” The range is limited from one to five, and outliers are likely to occur. Ordinal variables typically have few and fixed ranks, such as first, second, and third place, or grades, such as A, B, C, D, and F. So, it is unlikely an outlier will occur in terms of an observation that is far away from the rest of the group because these variables have a small range.
Using the median because there are outliers or a skewed distribution might signal something wrong with the distribution or with the nature of the variable under investigation. We will discuss skewed distributions in the next section. For now, the median is not just a solution to representing a skewed distribution. Something might be awry with the distribution if you expected the distribution to be normal, but it turned out to be skewed normal, and you need to use the median rather than the mean. Using the median will not solve your problem of having a skewed distribution. A better approach is to examine the distribution and decide if it is meant to be skewed or if something is awry with how the data were collected. In this case, contacting a consultant would be best, especially if you thought the distribution was supposed to be a normal distribution.
5.2.3: The Mode – the Most Useless Number
The mode is the most frequently occurring number in the distribution. It is a useless statistic. I have never used the mode, I have never reported the mode, I have never discussed the mode, I have never read about the mode in research journal articles, I have never used the mode in a statistical calculation, I have never seen the mode except in statistical textbooks. It is totally worthless. Just like watching a Cubs game, it is a total waste of time. OUCH
The only time I needed to learn about the mode was when I had to answer questions about it in the statistics section of a licensing exam. The mode becomes an issue when answering questions about normal and skewed distributions. That is all you need to know about the mode.


