Skip to main content
Statistics LibreTexts

11.6: Scatterplots

  • Page ID
    64753

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    A scatterplot is a plot of one variable against another where both variables would have at least an ordinal measurement scale. In a scatterplot only points are plotted, and the points are not connected with lines. An example of a scatterplot is given in Figure 11.17. The data used to produce this plot are from a sample of 1,078 sons’ heights plotted against their fathers’ heights, a dataset studied by the famous statistician Karl Pearson in 1903. Both measurements are in inches. Figure 11.17 shows a large cloud of points. Each of the points is plotted so that its position along the horizontal axis corresponds to the height of a father in the set of data. The corresponding position on the vertical axis corresponds to the height of that father’s son. Note that the data are paired. That is, each height of a father is paired uniquely with a height of his son. Having two paired variables is a requirement for constructing scatterplots of data.

    Definition: Scatterplot

    A scatterplot is a plot of the individual values of two paired variables where is point is represented by a point on the plot.

    Considering the plot in Figure \(\PageIndex{1}\), we can see that while there is considerable variation in the height of the sons, even when the fathers’ heights are quite close, there is a general pattern in the plot. For example, when we look at Figure \(\PageIndex{1}\), we can observe that when the height of the father is in the range of about 60–62 inches, it would be quite unusual for the son to have a height greater that 70 inches. However, if the height of the father is around 70 inches, then about half of the sons have a height greater than 70 inches. Hence, while there is no direct relationship between the height of a father and their son, there is an association where the shorter fathers tend to have shorter sons and taller fathers tend to have taller sons.

    clipboard_ef9d474fd5f17d744439e814fc4dab920.png
    Figure \(\PageIndex{1}\): The height of 1,078 sons’ heights plotted against their fathers’ heights, a dataset studied by the famous statistician Karl Pearson in 1903. Both measurements are in inches (Public domain image created by Alan M. Polansky)

    Another interesting example is based on observations of geyser Old Faithful in Yellowstone National Park in the United States. The data discussed here is based on 299 measurements of the time between successive eruptions of the geyser and the duration of the subsequent eruption (Azzalini and Bowman 1990). Both measurements are recorded in minutes. A plot of the data is given in Figure \(\PageIndex{2}\).

    clipboard_e8882f747983c65e77d88ba20104bd6d8.png
    Figure \(\PageIndex{2}\): Observations of the time interval between and the subsequent duration of eruptions of the geyser Old Faithful in Yellowstone National Park, Wyoming, United States from Azzalini and Bowman (1990) (Public domain image created by Alan M. Polansky)

    There are many interesting details in the scatterplot shown in Figure \(\PageIndex{2}\). The trend in the plot has some interesting features. For example, when the waiting time is less than about 70 minutes, the duration is almost always at least 4 minutes. When the waiting time is longer than 70 minutes, the duration can be anywhere between roughly 1 and 5 minutes. There also seems to be some anomalous behavior in the data. There are two apparent outliers. The first outlier is observed when the waiting time is around 80 minutes, and the duration is less than 1 minute. The duration is considerably less than any of the other waiting times observed in the data. The other outlier if observed when the waiting time is 100 minutes. The waiting time in this case is unusually long compared to the rest of the observed waiting times.

    Another interesting aspect of this data can be observed when we consider duration times near 2 minutes and 5 minutes. The plot reveals that there are an unusually large number of observations that equal exactly 2 minutes and 5 minutes. This is the reason for the horizontal lines of points at these locations. A further analysis of the history of this data reveals that this is due to observations of the duration sometimes being rounded to the nearest minute.


    This page titled 11.6: Scatterplots is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by .

    • Was this article helpful?