Skip to main content
Statistics LibreTexts

4.3: Box plots

  • Page ID
    45028
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Introduction

    Box plots, also called whisker plots, should be your routine choice for exploring ratio scale data. Like bar charts, box plots are used to compare ratio scale data collected for two or more groups. Box plots serve the same purpose as bar charts with error bars, but box plots provide more information.

    Purpose and design criteria

    Box plots are useful tool for getting a sense of central tendency and spread of data. These types of plots are useful diagnostic plots. Use them during initial stages of data analyses. All summary features of box plots are based on ranks (not sums). So, they are less sensitive to extreme values (outliers). Box plots reveal asymmetry. Standard deviations are symmetric.

    The median splits each batch of numbers in half (center line). The “hinge” (median value) splits the remaining halves in half again (the quartiles). The first, second (median), and third quartiles describes the interquartile range, or IQR, 75% of the data (Fig. \(\PageIndex{1}\)). Outlier points can be identified, for example, with an asterisk or by id number (Fig. \(\PageIndex{1}\)).

    A box plot with all parts labeled. Q1 lies at the 25th percentile relative to the median, Q3 lies at the 75th percentile relative to the median, the plot minimum is Q1 minus 1.5 times the IQR, the plot maximum is Q3 plus 1.5 times the IQR, and outliers lie beyond the plot minimum and maximum.
    Figure \(\PageIndex{1}\): A box plot. Elements of box plot labeled.

    We’ll use the data set described in the previous section, so if you have not already done so, get the data from Table 1, Chapter 4.2 into your R software.

    Note:

    See Chapter 4.10 — Graph software for additional box plot examples, but made with different R packages or software apps.

    R Code

    Command line

    We’ll provide code for the base graph shown in Figure \(\PageIndex{2A}\). At the R prompt, type

    boxplot(OliveMoment~Treatment)
    Box plots for Olive moment of each of the three treatments from the comet tail data. The treatment types are on the x-axis and the Olive moment is on the y-axis.
    Figure \(\PageIndex{2A}\): Box plot, default graph in base package.

    Boxplot is a common function offered in several packages. In the base installation of R, the function is boxplot(). The car package, which is installed as part of R Commander installation, includes Boxplot(), which is a “wrapper function” for boxplot(). Note the difference: base package is all lower case, car package the “B” is uppercase. One difference, base boxplot() permits horizontal orientation of the plot (Fig. \(\PageIndex{2B}\)).

    Note:

    Wrapper functions are code that links to another function, perhaps simplifying working with that function.

    boxplot(OliveMoment ~ Treatment,  horizontal=TRUE, col="steelblue")
    
    Same box plot graph as in 2A, but with treatments on the y-axis and Olive moment on the x-axis. Boxes have been colored blue.
    Figure \(\PageIndex{2B}\): Same graph, but with color and made horizontal; boxplot(), default graph in base package.

    Base package boxplot() has additional features and options compared to Boxplot() in the car package. i.e., not all barcode() options are wrapped. For example, I had more success adding original points to boxplot() graph (Fig. \(\PageIndex{2C}\)) following the function call with stripchart().

    stripchart(OliveMoment ~ Treatment, method = "overplot", pch = 19, add = TRUE)
    Same graph as in 2B, but showing the original data points for each treatment overlaid on their box plots.
    Figure \(\PageIndex{2C}\): Same graph, added original points; boxplot(), default graph in base package.
    Note:

    boxplot and stripchart functions are part of ggplot2 package, part of tidyverse, and easily used to generate graphs like Fig. \(\PageIndex{2B}\) and Fig. \(\PageIndex{2C}\). The overplot option was used to jitter points to avoid overplotting. See below: Apply tidyverse-view to enhance look of boxplot graphic and Fig. \(\PageIndex{9}\).

    Jittering adds random noise to points, which helps view the data better if many points are clustered together. Note however that jitter would add noise to the plot — if the objective is to show an association between two variables, jitter will reduce the apparent association, perhaps even compromising the intent of the graph. Beeswarm also can be used to better visualize clustered points, but uses a nonrandom algorithm to plot points.

    Rcmdr: Graph → Boxplot…

    Select the response variable, then click on the “Plot by:” button

    Boxplot popup menu in R Commander, with the Data tab shown. "Olive Moment" is selected as the response variable.
    Figure \(\PageIndex{3}\): Boxplot popup menu in R Commander. Select the response variable and set the “Plot by:” option.

    Next, select the Groups (Factor) variables (Fig. \(\PageIndex{4}\)). Click OK to proceed.

    Groups popup menu in R Commander. "Treatment" is selected as the groups variable.
    Figure \(\PageIndex{4}\): Select the group variable.

    Back to the Box Plot menu, click “Options” tab to add details to the plot, including a graph title and how outliers are noted (Fig. \(\PageIndex{5}\)),

    Boxplot popup menu in R Commander, with the Options tab shown. In the Identify Outliers menu, the "Automatically" option is checked.
    Figure \(\PageIndex{5}\): Options tab of boxplot popup. Enter axes labels and a title.

    And here is the resulting box plot (Fig. \(\PageIndex{6}\))

    Resulting box plot from the R Commander options selected above.
    Figure \(\PageIndex{6}\): Resulting box plot from car package implemented in R Commander. Outliers are identified by row id number.

    The graph is functional, if not particularly compelling. The data set was “olive moments” from Comet Assays of an immortalized rat lung cell line exposed to dilute copper solution (Cu), Hazel tea (Hazel), or Hazel & Copper solution.

    Apply Tidyverse-view to enhance look of boxplot graphic

    Load the ggplot2 package via the Rcmdr plugin to add options to your graph. As a reminder, to install Rcmdr plugins you must first download and install them from an R mirror like any other package, then load the plugin via Rcmdr Tools → Load Rcmdr plug-in(s)… (Fig. \(\PageIndex{7}\), Fig. \(\PageIndex{8}\)).

    The Load Plugins menu in R Commander, with the KMggplot2 plugin selected.
    Figure \(\PageIndex{7}\): Screenshot of Load Rcmdr plug-ins menu, ggplot2selected. Click OK to proceed (see Fig. PageIndex{8}\).
    Message to restart R Commander to make new plugins available.
    Figure \(\PageIndex{8}\): To complete installation of the plug-in, restart R Commander.

    Significant improvement, albeit with an “eye of the beholder” caveat, can be made over the base package. For example, ggplot2 provides additional themes to improve on the basic box plot. Figure \(\PageIndex{9}\) shows the options available in the Rcmdr plugin KMggplot2, and the default box plot is shown in Figure \(\PageIndex{10}\).

    Menu of KMggplot2. A title was added, all else remained set to defaults.
    Figure \(\PageIndex{9}\): Menu of KMggplot2. A title was added, all else remained set to defaults.

    The next series of plots explore available formats for the charts.

    Default box plot format from KMggplot2.
    Figure \(\PageIndex{10}\): Default box plot from KMggplot2.
    Box plot of the same data using the "Economist" theme from KMggplot2.
    Figure \(\PageIndex{11}\): “Economist” theme box plot from KMggplot2.

    And finally, since the box plot is often used to explore data sets, some recommend including the actual data points on a box plot to facilitate pattern recognition. This can be accomplished in the KMggplot2 plugin by checking “Jitter” under the Add data points option (see Fig. \(\PageIndex{9}\)). Jitter helps to visualize overlapping points at the expense of accurate representation. I also selected the Tufte theme, which results in the image displayed in Figure \(\PageIndex{12}\).

    Box plot of the same data using the Tufte theme and with original data points overlaid on the boxes.
    Figure \(\PageIndex{12}\): Tufte theme and data points added to the box plot.

    Conclusions

    As part of your move from the world of Microsoft Excel graphics to graphs recommended by statisticians, the box plot is used to replace the bar charts plus error bars that you may have learned in previous classes. The second conclusion? I presented a number of versions of the same graph, differing only by style. Pick a style of graphics and be consistent.


    Questions

    1. Why is a box plot preferred over a bar chart for ratio scale data, even if an appropriate error bar is included?
    2. With your comet data (Table 1, Chapter 4.2), explore the different themes available in the box plot commands available to you in Rcmdr. Which theme do you prefer and why?

    This page titled 4.3: Box plots is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Michael R Dohm via source content that was edited to the style and standards of the LibreTexts platform.

    • Was this article helpful?