11.5: Histograms
- Page ID
- 64752
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Frequency distributions for interval or ratio data are based on defining ranges called classes and computing frequencies, relative frequencies, and percentages of the observed data points falling within these classes. The graphical version of these frequency distributions are called histograms and are constructed in a very similar way as the frequency distributions for nominal and ordinal data, except the ranges of the classes are represented directly on the horizontal axis. This means that there are no spaces between the bars in the graph.
A histogram of a frequency distribution is a graph whose horizontal axis corresponds to the values of a quantitative variable and whose vertical axis corresponds to the observed frequency, relative frequency, or percentage for each class. The bars of the graph span the corresponding classes.
Let us consider constructing a histogram for a small set of data. Suppose that a small survey was conducted in which each respondent was asked how many times, in the previous two-week period, they had eaten at a fast-food restaurant. The data were observed to be 0, 2, 1, 5, 2, 2, 3, 4, 1, 2, 7, 1, 3, 4, 1, 0, 1, 4, 2, 1, 3, 3, 2, 1, 9, and 1. Based on this data, we will consider the class intervals defined by \(0\leq \text{fast food} <1\), \(1\leq \text{fast food} <2\), \(2≤\leq \text{fast food} <3\), \(3\leq \text{fast food}<4\), \(4\leq \text{fast food}<5\), \(5\leq \text{fast food}<6\), \(6\leq\text{fast food}<7\), \(7\leq\text{fast food}<8\), \(8≤\leq\text{fast food}<9\), and \(9\leq\text{fast food}<10\). The frequency distribution for this data is given in Table 11.2. Note for example, that there are two instances where an individual reported that they had not had fast food over the two-week period so that the frequency for the class 0≤ fast-food <1 is 2. Similarly, there are eight instances where an individual reported that they had fast food once over the two-week period, so that the frequency for the class 1≤ fast-food <2 is 8. The remaining frequencies are computed in a similar manner.
Using this information, we can plot the histogram in Figure \(\PageIndex{1}\). For the people who responded they did not eat at a fast-food restaurant last week, the height of the rectangle is 2, corresponding to the frequency for that class. There were 8 people who reported that they ate at a fast-food restaurant once, and the height of the corresponding rectangle for this class is 8. The remaining rectangles are plotted in a similar manner. Note that nobody responded with 6, so that the frequency for that class is 0, and the corresponding rectangle is not plotted leaving a gap in the graph.
When looking for trends in histograms, it is most important to look for the general shape of that the rectangle’s form. Of particular interest is areas where there are peaks in the shape of the histogram as these areas have a high concentration of data. A peak in a histogram is called a mode.
A peak on a histogram is called a mode.
Histograms are classified by the number of modes they have.
The modality of a histogram refers to how many “peaks” the histogram has. A histogram is called unimodal if there is one peak, bimodal if there are two peaks, multimodal if there are three or more peaks, and uniform if the histogram is relatively flat across the classes.
The histogram for the fast-food data shown in Figure \(\PageIndex{1}\) is unimodal. It is tempting to conclude that there are three peaks in this histogram. It is important that when analyzing histograms that the general trend of the data is considered, and not small details like to two small peaks on the right-hand side of Figure \(\PageIndex{1}\). These small peaks are the result of a few data points, whereas the general shape of the graph is formed by most of the data. It is much safer to draw conclusions from the features that are a result of most of the data rather than relying on a few data points.
Figure \(\PageIndex{2}\) shows a histogram based on data that have been simulated to represent the amount of time two groups have a headache. One group was given a new headache medication while the second group was given a placebo. As we can see in Figure \(\PageIndex{2}\), there are two modes. One for the length of time it takes for a headache to go away for those taking the new medication, and one for the length of time for a headache to go away for those taking a placebo. Therefore, the histogram shown in Figure \(\PageIndex{2}\) is bimodal.
In a unimodal histogram the tail of the histogram is the range on each side of the of the peak corresponding to the thinner, tapering end of the histogram where data points are less frequent.
While the definition of the tail of a histogram is somewhat vague, knowing precisely where the tail of a histogram begins is rarely required for judging the overall shape of a histogram. Rather it is the relative length and weight of the tails that is important in determining the shape characteristics of a histogram. The most important of these is the concept of symmetry, where the right and left tails of a histogram are roughly the same size. When the tails differ, the shape is described by comparing the length of the tails resulting in what is called skewness.
The skewness of a unimodal histogram indicates whether there is a longer tail to the right of the peak, to the left of the peak, or if the tails are about equal on either side of the peak. If the longer tail is to the left, the histogram is left skewed. If the longer tail is to the right, then the histogram is right skewed. If the tails are about equal, the histogram is symmetric.
For the histogram given in Figure \(\PageIndex{1}\), which is based on the data of the number of times individuals visited a fast-food restaurant in the past two weeks, we can see that the tail on the right side of the peak is longer than on the left. Therefore, this histogram is skewed to the right. Next consider the income simulation histogram in Figure \(\PageIndex{3}\). From Figure \(\PageIndex{3}\) we again observe that the histogram is skewed to the right. This is consistent with what we know about income. While most workers will earn a salary within a certain range, some workers will earn significantly more, and a few workers will earn even more than that. Finally, in Figure \(\PageIndex{4}\) we have the histogram of the simulated student debts. In this case, the left tail is slightly longer than the right and we would conclude that this histogram is skewed to the left.
Histograms are not only useful for visually depicting the frequencies of a set of data, but they can also help researchers visually determine whether there are unusual values in a set of data. Unusual values are called outliers.
The outliers in histograms are values that seem significantly different compared to the rest of the values on the histogram, usually identified as values far out in either of the tails of a histogram.
Consider a set of data extracted from the National Longitudinal Study, which is sponsored by the U.S. Bureau of Labor Statistics. The data contains the yearly income approximately 7,000 individuals. A histogram of the income values is shown in Figure \(\PageIndex{5}\). As is typical of income data, the histogram is skewed to the right. In further consideration of the right tail, one can also observe that some values are separated from the main bulk of the income values. These values are examples of outliers.

