Skip to main content
Statistics LibreTexts

2.1: Descriptive Statistics- Dotplots and Histograms

  • Page ID
    51630
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    During the statistical analysis process, we ask a question, collect data, summarize and analyze the data, and finally, draw a conclusion. Descriptive statistics help us to summarize and analyze data. We will learn about numerical and graphical ways to describe and present data.

    In this section, we will summarize and analyze frequency distributions of quantitative variables to investigate a question about ages of students at various types of academic institutions. A frequency distribution of a variable provides two important facts about the variable: all values the variable takes on, and how often (or how frequently) the variable takes on each given value.

    A quantitative variable can be measured or counted and data values are expressed as numbers. Categorical or qualitative variables cannot be measured or counted and rather, can be expressed as membership of a group called a category.

    Distributions of Age

    A professor at Chaffey College is curious about the typical age of students who enroll at public two-year institutions compared to public four-year institutions and for-profit institutions.

    1. Make a prediction: what is the typical age of students at each type of institution? Why do you think this?
    2. The variable we are discussing today is age. Is this variable quantitative or qualitative? Justify your answer.

    Dotplots

    She randomly surveys 25 students in her general education classes at Chaffey College (a public two-year institution). She then asks her colleague at a nearby public four-year institution to randomly survey 25 students. Below are the resulting dotplots.

    Ages of Students at a Public Two-Year Institution

    AD_4nXehPQuogvlZgZlvjZCl-2TwF-nhc7YCD5tj-nrWG69aHnnPVvIs5RIGqY-9marBkGLqUk3i-wUXrhePIMtSHNotA7UIqSw9tP_tok3fzkbozfRQiJp6cHh4ugwWSHHosx0Ufv4RvEe94yZmgxZWu9Zsy-4ikeyi1XJeTDlU718V25snr3PRQ

    Ages of Students at a Public Four-Year Institution

    AD_4nXdjHjq4JwWHtjoamzvvyW9zySSjgiO6lkL3w0um6-_Vy0TGgAK23by7RmiFasf9BhV4Z8GK_Wh4KMqwDVaVdkS_2v9MtnsAt5kNZ5-6TMSAKL-o-E2xxAask7w-s4CQtP7BwmV4fYR9eii4j_B0EtHt4ulokeyi1XJeTDlU718V25snr3PRQ

    1. What does a dot represent in these dotplots?
    2. How many students from the two-year sample are older than 25?
    3. What proportion of students from the two-year sample are older than 25?
    4. How many students from the four-year sample are older than 25?
    5. What proportion of students from the four-year sample are older than 25?
    6. What is the most frequent age in the two-year sample?
    7. What is the most frequent age in the four-year sample?
    8. What is the typical age of a student in the two-year sample? What is the typical age of a student in the four-year sample? Compare these using the dotplots.
    9. What is the most frequent age in the four-year sample?
    10. What is the typical age of a student in the two-year sample? What is the typical age of a student in the four-year sample? Compare these using the dotplots.
    11. Which sample has more variation? Explain using the dotplots.

    The shape of these two distributions are right-skewed because they have a long right tail. In other words, the higher ages are less likely to occur.

    The professor from Chaffey College asks a colleague at a for-profit institution to randomly survey 25 students. The dotplot is given below. This distribution is closer to a rough bell-shape (in which the graph is symmetric and has one peak in the middle and two equal tails on each side; values in the tails are less likely to occur) and we could say it might have a slight right skew. It appears as though the typical age of a student at the for-profit institution is higher than at the public institutions (around 32 years).

    Ages of Students at a For-Profit Institution

    AD_4nXeWQM4tUQBZTMnisOgNvq9C7KtC45jlS-b8smF6MDScx9Q-h35STBPbxqhJ5o29eghnzUXd-Cus72mtQMq6m82DT8izuW450jpQ8HSYnnGeiigUX6Tzkw-wxNeKEN0DG3t_mhwYU3chsi0fGHCeHS-kjbAkeyi1XJeTDlU718V25snr3PRQ

    Histograms

    Sometimes, the type of data we collect can influence what type of graph we use to summarize the data. Often, with a variable like age, we want to group students into different ranges of ages, especially if we have a large sample of data. In this case, we use a histogram to summarize the data graphically.

    Here is a dotplot and histogram of ages of 200 public four-year institutions. The histogram is more easily readable and we can more easily use it to analyze the data than using the dotplot.

    AD_4nXfY9r2SXkfvpe43SgodPz8Har8LDnptk2htulI1SXL83U2gfrwyoFnHY_Yzq5B7iYi1EkyCTE4IdwnqUJSUomt4YCp_bSj5T_qZd0yOWt59XvscZutEw4vC35uKpzTMlAhE8lzA5gKdN7b6mNmOKbb68Wn1keyi1XJeTDlU718V25snr3PRQAD_4nXfDfwARP20XjxodVbOoKurvbpQCEdC-uJiT69Uxq1Ox94FoVgpzmuBm-ePVx5A4FBkvuHM4Yt2TqBrW5D1RjqzP-retMssebUP8akJg1BPSNl7CTYKgQZYJ2D3oaqMGaQuLB6vC_mc3ymCP44v3bSVpUKc-keyi1XJeTDlU718V25snr3PRQ

    Let’s return to our example of ages at a for-profit institution. Here is the sample of 25 ages: [33,33,30,31,39,27,35,36,37,23,35,41,42,36,34,28,28,29,26,27,29,30,31,32,34]

    To help us find patterns within the for-profit data set, we will group the data into ranges of ages called bins. For this example, we will use intervals of size 5 so each bin will contain 5 ages (15-19, 20-24, etc.). The first bin starts with a value slightly lower than the lowest age in the set. We will create the frequency distribution table below prior to graphing the histogram.

    Bin

    Tally

    Frequency

    Relative Frequency (as a fraction)

    Relative Frequency (as a decimal)

    Relative Frequency (as a percent)

    15-19

     

    0

    025

    0

    0%

    20-24

    |

    1

    125

    0.04

    4%

    25-29

     

    30-34

     

    35-39

     

    40-45

     

    Total:

     

    For each data value in the set, determine the bin it falls into. For example, the lowest age in the set is 23 years old, so it belongs in the bin with a range 20 to 24. A tally mark (|) has been written in the tally column next to the row led by 20-24. Continue to make tally marks in the tally column until you have selected a bin for all data values. Each time a tally reaches the fifth mark, represent it as a horizontal tally mark (5 is the same as ||||).

    The frequency is the number of data values in each bin, or the number of tally marks for a given bin. Write the frequency as a number in the frequency column. We compute the relative frequency by dividing the frequency by the total number of data values (sample size). We can write the relative frequency as a fraction, decimal, and percent.

    1. Now, use the table to create a frequency histogram. Each bin in the distribution is represented by a vertical bar. The height of that bar is the frequency of the bin. Draw the bars so that each adjacent bar is touching (there are no gaps between adjacent bars).

    AD_4nXfame3yw33z-bq4Zle4N6UWOPDNg1QczU8UEgX0-ziJNk_CLOpL1lt9L0r1pktXbc1QnJCdVmwsR3wlkeEwpQxN4X2gmWbfGY0did_qh6qRnhKD90BQtgkCTEEl9XNz8tqm6mCMx99onPVYBSI-G-clITbqkeyi1XJeTDlU718V25snr3PRQ

    1. What is the sum of all heights of bars in the histogram? What does this sum represent?
    2. What does the height of the second bar represent?

    Summary: Center, Shape, and Spread

    The center of a distribution is the typical value in the data set, or the single value that best represents the distribution. The shape of a distribution is the overall pattern of the distribution. There are four common shapes we might use to describe a distribution.

    AD_4nXcuZmETFa4A46K4ALIKykHfaNIxTFI_pj0xH-SbCGL9-nuNqd-FSWXqQ1b1KEfsj4k4H5gg7POgl3Vxwjy4L9Xe6SdSv-E-FbhYqzYXTErzThEjrEnmwGPKvHzhcT76pcxnaeWxJ1dcecZtWIKCj7L_jDUUkeyi1XJeTDlU718V25snr3PRQ

    Bell-Shaped Distribution

    AD_4nXd5Go8V4oLCTaK8MpzSTTlkkZl2Q1yoSHQJSVpFAVEWzRIPya7oAcBaiZupg7Bwop2dezvpF1gIZzAFUamApQp-1felWIoIePzJz4B7E1Q6g7leg90QyJTX4F54BYzdRevqYzHgoEkf2e8Nikz0CnRLBzprkeyi1XJeTDlU718V25snr3PRQ

    Skewed-Left Distribution

    AD_4nXdaFn9X9fSPSGKB5pjKMdDjlsd3LK-tynz_Zqfp1vfgyslU94MP-uOnLZypRJ_Md4Astk4d44e6wDwjcC5yxMQ5rCX2mCnTSYvom15w_nMDGZUL8c_WgQlHY1mZksErZqUK24GKW3MC6yI7hQ21acXm0Sevkeyi1XJeTDlU718V25snr3PRQ

    Skewed-Right Distribution

    AD_4nXeZ8ed-SWTiSvQpnOzKVKTiQnqQ8cxvYEUYDs_6YPQtCLrOBjBE9IdhfPXMStU77fl86Z5PgLWvu3vpuaK2IwCBYLW18Yz9cYEEM-CHWIVOzvbZeSr_22pZ66NGs3Ug-ld8hoL2Aho-bX8ugwzpJlzUBPUtkeyi1XJeTDlU718V25snr3PRQ

    Uniform Distribution

    A uniform distribution is one in which every data value is equally likely to occur. We can use these graphs to help us identify potential outliers. An outlier is a data value that is much higher or lower than most other values. The spread of a distribution describes the variation within a data set. It is how far apart the data values are. We often consider the range of values in the set, which is found by subtracting the lowest data value from the highest data value.

    1. What is the center, shape, and spread of the frequency histogram?
    2. How many students are older than 30 in the for-profit sample?
    3. A relative frequency histogram displays the relative frequencies for the bins instead of the frequencies.

      AD_4nXdNIPCnzuLv6zLsCrZF6oK5tpTphdgyDPPZTuuTN0nb7teR8B3F94rTAJjM0VMhqi6vufRg3yEJ3RM7vTVW3meYTswF2GOwYpWsstJ_NW3q6aQXV0fTV0R7s1ZPNR7br3s36lyeAydsB7P9Hx42k56wXm_3keyi1XJeTDlU718V25snr3PRQ

      Compare the relative frequency histogram to the frequency histogram you graphed in question 12. Are there any similarities or differences between the two histograms?

    4. What proportion of students are older than 30 in the for-profit sample?

    This page titled 2.1: Descriptive Statistics- Dotplots and Histograms is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Hannah Seidler-Wright.

    • Was this article helpful?