Skip to main content
Statistics LibreTexts

2.2: Quantitative Data

  • Page ID
    45171
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)
    Learning Objectives
    • Organize data into intervals using grouped frequency distributions to simplify large data sets.
    • Visualize the organized data using histograms, frequency polygons, and ogives.
    • Identify patterns, trends, and cumulative totals to support better data interpretation.

    There are two ways of organizing quantitative data: group frequency distribution and cumulative frequency distribution. These distributions are represented by the following graphs: histogram, frequency polygon, and ogive. The graphs help describe the shape of the distribution, and the shape determines what type of analysis can be conducted.

    The histogram is similar to a bar graph, except for some major differences. The first difference from a bar graph, the categories can be put in any order on the horizontal axis. There is no set order for these data values. You can’t say how the data is distributed based on the shape since the shape can change by putting the categories in different orders. With the histogram, the data are in a specific order because the data are expressed with numbers. With quantitative data, you can talk about a distribution since the shape only changes slightly depending on how many categories you set up.

    This leads to the second difference from bar graphs. In a bar graph, the categories that were made in the frequency table were determined by the whims of the researcher. In quantitative data, the groups are numerical intervals, and the frequencies are determined by the number of data values in each group (or class).

    The third difference in the histogram is that the bars must touch, and there is no gap between them. The reason that bar graphs have gaps is that the data are categorical and not numeric. To create a histogram, you must first create the frequency distribution. The idea of a frequency distribution is to take the interval that the data spans and divide it up into equal subintervals called classes.

    Frequency Distribution

    Summary of the Steps Involved in Making a Frequency Distribution
    1. Find the Range = Largest value – Smallest value
    2. Pick the number of classes to use, or the number will be provided. Usually, the number of classes is between five and twenty. Five classes are used if there are a small number of data point,s and twenty classes if there are a large number of data points (over 1000 data points). (Note: categories will now be called classes from now on.)
    3. Class width = \(\dfrac{\text { Range }}{\# \text { Classes }}\). Always round up to the next integer (even if the answer is already a whole number, go to the next integer). If you don’t do this, your last class will not contain your largest data value, and the last data value will have no class to go to. If you round up, then your largest data value will fall in the last class.
    4. Create the classes. Each class has limits that determine which values fall in each class. To find the class limits, set the smallest value as the lower class limit for the first class. Then add the class width to the lower class limit to get the next lower class limit. Repeat until you get all the classes. The upper-class limit for a class is one less than the lower limit for the next class.
    5. For the classes to touch, one class needs to start where the previous one ends. This is known as the class boundary. To find the class boundaries, subtract 0.5 from the lower-class limit and add 0.5 to the upper-class limit.
    6. Sometimes it is useful to find the class midpoint. The process is
      Midpoint \(=\dfrac{\text { Lower limit +Upper limit }}{2}\)
    7. To determine the number of data points that fall in each class, count how many are within the range, including the lower and upper limits. Utilizing tally marks may help count the data values. The frequency for a class is the number of data values that fall in the class.
    Note

    The above description is for data values that are whole numbers. If your data value has decimal places, then your class width should be rounded up to the nearest value with the same number of decimal places as the original data. In addition, your class boundaries should have one more decimal place than the original data. For example, if your data has one decimal place, then the class width would have one decimal place, and the class boundaries are formed by adding and subtracting 0.05 from each class limit.

    Example \(\PageIndex{1}\) creating a frequency table

    The table contains the monthly rent for 24 students from a statistics course. Create a frequency distribution using 7 classes.

    1500 1350 350 1200 850 900
    1500 1150 1500 900 1400 1100
    1250 600 610 960 890 1325
    900 800 2550 495 1200 690
    Table \(\PageIndex{1}\): Data of Monthly Rent
    Solution
    1. Find the range:
      Largest value - Smallest value \(= 2550-350=2200\)
    2. Pick the number of classes:
      The direction says to use 7 classes.
    3. Find the class width:
      Class Width \(=\dfrac{\text { Range }}{7}=\dfrac{2200}{7} \approx 314.286\)
      Round up to 315
      Always round up to the next integer, even if the width is already an integer.
    4. Find the class limits:
      Start at the smallest value. This is the lower class limit for the first class. Add the width to get the lower limit of the next class. Keep adding the width to get all the lower limits.
      First Lower-Limit \(=350\) (Typically the lowest value in the data),
      Second Lower-Limit \(=350+315=665\),
      Third Lower-Limit \(=665+315=980\),
      Fourth Lower-Limit \(=980+315=1295\)
      The upper limit is one less than the next lower limit: so for the first class, the upper class limit would be \(665-1=664\). To get the rest of the upper-class limits, repeatedly add the class limit to the previous upper limit.
      First Upper-Limit \(=664\),
      Second Upper-Limit \(=664+315=979\),
      Third Upper-Limit \(=979+315=1294\),
      Fourth Upper-Limit \(=1294+315=1609\)
      When you have all 7 classes, make sure the last number, in this case the 2550, is at least as large as the largest value in the data. If not, you made a mistake somewhere.
    5. Find the class boundaries: Subtract 0.5 from the lower-class limit to get the lower-class boundary.
      First Lower-Class Boundary \(=350-0.5=349.5\),
      Second Lower-Class Boundary \(=665-0.5=664.5\),
      Third Lower-Class Boundary \(=980-0.5=979.5\),
      Fourth Lower-Class Boundary \(=1295-0.5=1294.5\)
      Add 0.5 to the upper-class limit to get the upper-class boundary.
      First Upper-Class Boundary \(=664+0.5=664.5\),
      Second Upper-Class Boundary \(=979+0.5=979.5\),
      Third Upper-Class Boundary \(=1294+0.5=1294.5\),
      Fourth Upper-Class Boundary \(=1609+0.5=1609.5\)
      Every value in the data should fall into exactly one of the classes. No data values should fall right on the boundary of two classes.
    6. Find the class midpoints:
      First Midpoint \(=\dfrac{\text { Lower limit }+\text { Upper limit }}{2}=\dfrac{350+664}{2}=507\),
      Second Midpoint\(=\dfrac{665+979}{2}=822\)
    7. Tally and find the frequency of the data:
      Go through the data and put a tally mark in the appropriate class for each piece of data by looking to see which class boundaries the data value is between. Fill in the frequency by changing each of the tallies into a number. Be sure the total of the frequencies is the same as the number of data points.
    Class Limits Class Boundaries Class Midpoint Tally Frequency
    350-664 349.5-664.5 507 |||| 4
    665-979 664.5-979.5 822 \(\cancel{||||}\) ||| 8
    980-1294 979.5-1294.5 1137 \(\cancel{||||}\) 5
    1295-1609 1294.5-1609.5 1452 \(\cancel{||||}\) | 6
    1610-1924 1609.5-1924.5 1767   0
    1925-2239 1924.5-2239.5 2082   0
    2240-2554 2239.5-2554.5 2397 | 1
    Table \(\PageIndex{2}\): Frequency Distribution for Monthly Rent

    Histogram

    It is difficult to determine the basic shape of the distribution by looking at the frequency distribution. It would be easier to look at a graph. The graph of a frequency distribution for quantitative data is called a frequency histogram or histogram for short.

    Definition \(\PageIndex{1}\): Histogram

    A Histogram is a graph of the frequencies on the vertical axis and the class boundaries on the horizontal axis. Rectangles where the height is the frequency and the width is the class width are drawn for each class.

    Example \(\PageIndex{2}\: Drawing a Histogram

    Draw a histogram for the distribution from Example \(\PageIndex{1}\).

    Solution

    The class boundaries are plotted on the horizontal axis, and the frequencies are plotted on the vertical axis. You can plot the midpoints of the classes instead of the class boundaries. Graph 2.2.1 was created using the midpoints because it was easier to do with the software that created the graph.

    Frequency histogram of monthly payment.

    Figure \(\PageIndex{1}\): Histogram for Monthly Rent

    Reviewing the graph, you can see that most of the students pay around $750 per month for rent, with about $1500 being the other common value. You can see from the graph that most students pay between $600 and $1600 per month for rent. Of course, these values are just estimates from the graph. There is a large gap between the $1500 class and the highest data value. This seems to say that one student is paying a great deal more than everyone else. This value could be considered an outlier. An outlier is a data value that is far from the rest of the values. It may be an unusual value or a mistake. It is a data value that should be investigated. In this case, the student lives in a very expensive part of town, thus, the value is not a mistake and is just very unusual. Other aspects can be discussed, but first, some other concepts need to be introduced.

    Frequencies are helpful, but understanding the relative size each class is to the total is also useful. To find this, you can divide the frequency by the total to create a relative frequency. If you have the relative frequencies for all of the classes, then you have a relative frequency distribution.

    Definition \(\PageIndex{2}\)
    Relative Frequency Distribution

    A variation on a frequency distribution is a relative frequency distribution. Instead of giving the frequencies for each class, the relative frequencies are calculated.

    Relative frequency \(=\dfrac{\text { frequency }}{\# \text { of data points }}\)

    This gives you percentages of data that fall in each class.

    Example \(\PageIndex{3}\) creating a relative frequency table

    Find the relative frequency for the grade data.

    Solution

    From Example \(\PageIndex{1}\), the frequency distribution is reproduced in Example \(\PageIndex{2}\).

    Class Limits Class Boundaries Class Midpoint Frequency
    350-664 349.5-664.5 507 4
    665-979 664.5-979.5 822 8
    980-1294 979.5-1294.5 1127 5
    1295-1609 1294.5-1609.5 1452 6
    1610-1924 1609.5-1924.5 1767 0
    1925-2239 1924.5-2239.5 2082 0
    2240-2554 2239.5-2554.5 2397 1
    Table \(\PageIndex{2}\): Frequency Distribution for Monthly Rent

    Divide each frequency by the number of data points.

    \(\dfrac{4}{24}=0.17, \dfrac{8}{24}=0.33, \dfrac{5}{24}=0.21, \rightleftharpoons\)

    Class Limits Class Boundaries Class Midpoint Frequency Relative Frequency
    350-664 349.5-664.5 507 4 0.17
    665-979 664.5-979.5 822 8 0.33
    980-1294 979.5-1294.5 1127 5 0.21
    1295-1609 1294.5-1609.5 1452 6 0.25
    1610-1924 1609.5-1924.5 1767 0 0
    1925-2239 1924.5-2239.5 2082 0 0
    2240-2554 2239.5-2554.5 2397 1 0.04
    Total     24 1
    Table \(\PageIndex{3}\): Relative Frequency Distribution for Monthly Rent

    The relative frequencies should add up to 1 or 100%. (This might be off a little due to rounding errors.)

    The graph of the relative frequency is known as a relative frequency histogram. It looks identical to the frequency histogram, but the vertical axis is relative frequency instead of just frequencies.

    Example \(\PageIndex{4}\) drawing a relative frequency histogram

    Draw a relative frequency histogram for the grade distribution from Example \(\PageIndex{1}\).

    Solution

    The class boundaries are plotted on the horizontal axis, and the relative frequencies are plotted on the vertical axis. (This is not easy to do in R, so use another technology to graph a relative frequency histogram.)

    Relative frequency histogram of monthly payment

    Figure \(\PageIndex{2}\): Relative Frequency Histogram for Monthly Rent

    Notice the shape is the same as the frequency distribution.

    Frequency Polygon

    The frequency polygon is used to help understand the shape of a frequency distribution. It is typically graphed by connecting straight lines to points plotted for the frequencies at the midpoints. The midpoints are computed by adding each pair of class limits (or class boundaries) and then dividing by 2. Also, the first and last points are anchored to the x-axis, where hypothetical midpoints would be located. The first anchor is the first midpoint value minus the class width, and the last anchor is the last midpoint value plus the class width.

    Example \(\PageIndex{5}\)

    Create a frequency polygon for the data in Example \(\PageIndex{2}\). The group frequency distribution of this example is provided below.

    Grouped Frequency Distribution.
    Class Limits Class Boundaries Class Midpoint Frequency
    350-664 349.5-664.5 507 4
    665-979 664.5-979.5 822 8
    980-1294 979.5-1294.5 1127 5
    1295-1609 1294.5-1609.5 1452 6
    1610-1924 1609.5-1924.5 1767 0
    1925-2239 1924.5-2239.5 2082 0
    2240-2554 2239.5-2554.5 2397 1
    Total     24
    Solution
    Frequency polygon for monthly rent.
    Figure \(\PageIndex{3}\): Frequency Polygon for Monthly Rent

    Cumulative Frequency Distribution

    Another useful piece of information is how many data points fall below a particular class boundary. As an example, a teacher may want to know how many students received below 80%, a doctor may want to know how many adults have cholesterol below 160, or a manager may want to know how many stores gross less than $2000 per day. This is known as a cumulative frequency. If you want to know what percent of the data falls below a certain class boundary, then this would be a cumulative relative frequency. For cumulative frequencies, you are finding how many data values fall below the upper-class limit.

    To create a cumulative frequency distribution, count the number of data points that are below the upper-class boundary, starting with the first class and working up to the top class. The last upper-class boundary should have all of the data points below it. Also include the number of data points below the lowest class boundary, which is zero.

    Example \(\PageIndex{6}\) creating a cumulative frequency distribution

    Create a cumulative frequency distribution for the data in Example \(\PageIndex{1}\).

    Solution

    The frequency distribution for the data is in Example \(\PageIndex{2}\).

    Class Boundaries Class Boundaries Class Midpoint Frequency
    350-664 349.5-664.5 507 4
    665-979 664.5-979.5 822 8
    980-1294 979.5-1294.5 1127 5
    1295-1609 1294.5-1609.5 1452 6
    1610-1924 1609.5-1924.5 1767 0
    1925-2239 1924.5-2239.5 2082 0
    2240-2554 2239.5-2554.5 2397 1
    Table \(\PageIndex{2}\): Frequency Distribution for Monthly Rent

    Now, ask yourself how many data points fall below each class boundary. Below 349.5, there are 0 data points. Below 664.5, there are 4 data points, below 979.5, there are 4 + 8 = 12 data points, below 1294.5, there are 4 + 8 + 5 = 17 data points, and continue this process until you reach the upper class boundary. To find the cumulative relative frequency values, you divide the cumulative frequency by the total amount of data. For this example, the total is 24. This is summarized in Example \(\PageIndex{4}\).

    Class Boundaries Cumulative Frequency Cumulative Relative Frequency
    < (less than) 349.5 0 0
    < 664.5 4 0.17
    < 979.5 12 0.5
    < 1294.5 17 0.71
    < 1609.5 23 0.96
    < 1924.5 23 0.96
    < 2239.5 23 0.96
    < 2554.5 24 1
    Table \(\PageIndex{4}\): Cumulative Distribution for Monthly Rent

    Ogive

    Again, it is hard to look at the data the way it is. A graph would be useful. The graph for cumulative frequency is called an ogive (o-jive). To create an ogive, first create a scale on both the x and y axes that will fit the data. Then plot the points of the class boundary values versus the cumulative frequencies from left to right. Finally, connect the plotted points with straight lines.

    Example \(\PageIndex{7}\) drawing an ogive

    Draw an ogive for the data in Example \(\PageIndex{1}\).

    Solution
    Ogive for monthly rent - cumulative frequency.

    Figure \(\PageIndex{4}\): Ogive for Monthly Rent

    Shapes of the Distribution

    When you look at a distribution, look at the basic shape. Some basic shapes are seen in histograms. Realize, though that some distributions have no shape. The common shapes are symmetric, skewed, and uniform. Another interest is how many peaks a graph may have. This is known as modal.

    Symmetrical means that you can fold the graph in half down the middle, and the two sides will line up. You can think of the two sides as being mirror images of each other. Skewed means one “tail” of the graph is longer than the other. The graph is skewed in the direction of the longer tail (backward from what you would expect). A uniform graph has all the bars at the same height.

    Modal refers to the number of peaks. Unimodal has one peak, and bimodal has two peaks. Usually, if a graph has more than two peaks, the modal information is no longer of interest.

    Other important features to consider are gaps between bars, a repetitive pattern, how spread out the data is, and where the center of the graph is.

    Examples of Graphs:

    This graph is roughly symmetric and unimodal:

    Graph of a unimodal and symmetrical distribution
    Figure \(\PageIndex{5}\): Graph of a Unimodal and Symmetrical Distribution

    This graph is symmetric and bimodal:

    Graph of a bimodal and symmetrical distribution.
    Figure \(\PageIndex{6}\): Graph of a Bimodal and Symmetrical Distribution

    This graph is skewed to the right:

    Graph of a right (Positive) skewed distribution
    Figure \(\PageIndex{7}\): Graph of a Right (Positive) Skewed Distribution

    This graph is skewed to the left and has a gap:

    Graph of a left (negative) skewed distribution.
    Figure \(\PageIndex{8}\): Graph of a Left (Negative) Skewed Distribution

    This graph is uniform since all the bars are the same height:

    Graph of a uniform distribution.
    Figure \(\PageIndex{9}\): Graph of a Uniform Distribution

    There are other types of graphs for quantitative data. They will be explored in the next section.

    Attributions

    "2.2: Quantitative Data" by Kathryn Kozak is licensed CC BY-SA 4.0


    This page titled 2.2: Quantitative Data is shared under a CC BY-SA 4.0 license and was authored, remixed, and/or curated by Toros Berberyan, Tracy Nguyen, and Alfie Swan via source content that was edited to the style and standards of the LibreTexts platform.