Skip to main content
Statistics LibreTexts

2.1: Qualitative Data

  • Page ID
    58249
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)
    Learning Objectives
    • Explore qualitative (categorical) data representing non-numerical categories such as colors, brands, or types.
    • Use bar graphs, Pareto charts, and pie graphs to visualize categorical data.
    • Highlight frequencies, proportions, and key patterns within the categories for clearer analysis

    Remember, qualitative data are words describing an individual's characteristics. Several different graphs can be used to represent qualitative data, including bar graphs, Pareto charts, and pie charts.

    Pie charts and bar graphs are the most common ways of displaying qualitative data. A spreadsheet program like Microsoft (MS) Excel can create both graphs. The first step for both graphs is to generate a frequency or relative frequency table. A frequency table summarizes the data and lists the number of occurrences of each data type.

    Categorical Frequency Distribution

    Example \(\PageIndex{1}\)

    Suppose you have the following raw data for different car types for students parked in a college parking lot.

    Ford, Chevy, Honda, Toyota, Other, Toyota, Nissan, Nissan, Chevy, Toyota, Honda, Chevy, Toyota, Nissan, Ford, Toyota, Other, Nissan, Chevy, Ford, Nissan, Toyota, Nissan, Ford, Chevy, Toyota, Nissan, Honda, Other, Chevy, Chevy, Honda, Toyota, Chevy, Ford, Nissan, Other, Toyota, Chevy, Honda, Chevy, Toyota, Chevy, Chevy, Nissan, Honda, Toyota, Toyota, Nissan, Other

    Solution

    A list of data is too hard to look at and analyze, so you need to summarize it. First, you need to decide the categories. Second, list the car types in the first column. Third, for the values in the frequency column, we count the number of cars per car type in the list. For example, there are 5 Fords, 12 Chevys, 6 Hondas, 12 Toyotas, 10 Nissans, and 5 Others. Finally, the total of the frequency column should be the number of observations in the data by adding all the frequency values. The total number of data values is denoted as n. In this example, n = 50. Based on the explanation above, the frequency distribution will look like the table below.

    Category Frequency
    Ford 5
    Chevy 12
    Honda 6
    Toyota 12
    Nissan 10
    Other 5
    Total 50
    Table \(\PageIndex{1}\): Frequency Table for Type of Car Data

    A relative frequency is a percentage that is expressed in decimal form. Since raw data values are not as useful in describing data, it is better to create a third column that lists the relative frequency of each category. The relative frequency per category is the frequency divided by the total. For example, the Ford category is computed as follows:

    relative frequency \(= \dfrac{5}{50} = 0.10\)

    This can be written as a decimal, fraction, or percent. Based on the explanation above, the frequency distribution with a relative frequency column should look like the table below.

    Category Frequency Relative Frequency
    Ford 5 0.10
    Chevy 12 0.24
    Honda 6 0.12
    Toyota 12 0.24
    Nissan 10 0.20
    Other 5 0.10
    Total 50 1.00
    Table \(\PageIndex{2}\): Relative Frequency Table for Type of Car Data

    The relative frequency column should add up to 1.00. It might be off a little due to rounding.

    We can use the frequency distribution to display the data using different types of graphs. These graphs include the bar graph, the pie graph, and the Pareto chart.

    Bar Graphs

    Bar graphs consist of the frequencies on the y-axis (vertical axis) and the categories on the x-axis (horizontal axis). For each category, draw a bar with a height equal to each frequency. All bars should have the same width, and the spaces between them should be the same.

    Example \(\PageIndex{2}\) drawing a bar graph

    Draw a bar graph of the data in Example \(\PageIndex{1}\).

    Solution
    Category Frequency Relative Frequency
    Ford 5 0.10
    Chevy 12 0.24
    Honda 6 0.12
    Toyota 12 0.24
    Nissan 10 0.20
    Other 5 0.10
    Total 50 1.00
    Table \(\PageIndex{2}\): Relative Frequency Table for Type of Car Data

    To construct a bar graph using frequencies:

    1. Put the frequency scales on the y-axis and the category on the x-axis.
    2. Draw a bar above each category with a height equal to its frequency.
    3. Label the y-axis, x-axis, and the graph using appropriate titles.

    For this example, if the steps are done correctly, you should have a graph similar to the one below.

    The bar graph for type of car data using frequencies.
    Figure \(\PageIndex{1}\): The Bar Graph for Type of Car Data Using Frequencies.

    Notice from the graph, you can see that Toyota and Chevy are the most popular cars, with Nissan not far behind. Ford seems to be the type of car that is least liked; the cars labeled as others would be liked less than a Ford.

    Some key features of a bar graph:
    • Equal spacing between bars.
    • Bars are the same width.
    • There should be labels on each axis and a title for the graph.
    • There should be a scaling on the y-axis, and the categories should be listed on the x-axis.
    • The bars don’t touch.

    You can also draw a bar graph using relative frequency on the y-axis. This is useful when comparing two samples with different sample sizes because it is better to compare by percentages/relative frequencies to see which categories are the most common. The relative frequency graph and the frequency graph should look the same, except for the scaling on the y-axis.

    To construct a bar graph using relative frequencies:

    1. Put the relative frequency scales on the y-axis and the category on the x-axis.
    2. Draw a bar above each category with a height equal to its frequency.
    3. Label the y-axis, x-axis, and the graph using appropriate titles.

    For this example, if the steps are done correctly, you should have a graph similar to the one below.

    Bar graph for type of car data using relative frequencies
    Figure \(\PageIndex{2}\): Bar Graph for Type of Car Data Using Relative Frequencies

    Pareto chart

    A second type of categorical data graph is a Pareto chart, which is a bar graph with the bars sorted from the highest frequency to the lowest frequency, starting from the left. It is used to represent the vital few items. In this case, it will be the most popular car types. These are Chevy and Toyota. This is especially useful in business applications, where you want to know what services your customers like the most, what processes result in more injuries, which issues employees find more important, and other types of questions like these. Here is the Pareto chart for the data in Example \(\PageIndex{1}\).

    Pareto Chart for type of car data
    Figure \(\PageIndex{3}\): Pareto Chart for Type of Car Data

    Pie Chart

    The pie chart is a circle divided into sections according to the percentage of frequencies in each category. The formula for these percentages is P = f / n * 100%. Where P stands for percent, f for the frequency of each class, and n for the sum of all the frequencies. To draw each section, use the percentages (recall that a quarter of the circle equals 25%).

    Example \(\PageIndex{1}\)

    A random extended family has been selected, and their family members are documented and listed according to the age group below.

    Age of Family Members
    Categories
    Children 10
    Young Adults 12
    Middle Age 19
    Elderly 8

    Table \(\PageIndex{5}\): Age of Family Members

    Solution

    Guidelines for Creating a Pie Chart

    1. Compute the relative frequencies by dividing each frequency by the total frequency .
    2. Multiply each relative frequency by 100 to get the percentages.
    3. Draw a circle.
    4. Draw and label each section according to the percentage of each category.
    Age of Family Members
    Categories Frequency Relative Frequency (Rounded to Two Decimal Places) Percent
    Children 10 10/49 ≈ 0.20 0.20 \(\cdot\) 100 = 20%
    Young Adults 12 12/49 ≈ 0.24 0.24 \(\cdot\) 100 = 24%
    Middle Age 19 19/49 ≈ 0.39 0.39 \(\cdot\) 100 = 39%
    Elderly 8 8/49 ≈ 0.16 0.16 \(\cdot\) 100 = 16%

    Table \(\PageIndex{6}\): Age of Family Members With Relative Frequencies and Percents

    Description of this pie chart is provided below.
    Figure \(\PageIndex{10}\): Pie Chart of Ages of Family Members in Percents.

    We have a pie chart with four sections. The children section is represented by 20%. The young adults section is represented by 25%. The middle age section is represented by 39%. Finally, the elderly section is represented by 16%. The size of the slices is based on the percentages.

    Example \(\PageIndex{3}\) drawing a pie chart

    Draw a pie chart of the data in Example \(\PageIndex{1}\).

    First, you need the relative frequencies.

    Category Frequency Relative Frequency
    Ford 5 0.10
    Chevy 12 0.24
    Honda 6 0.12
    Toyota 12 0.24
    Nissan 10 0.20
    Other 5 0.10
    Total 50 1.00
    Table \(\PageIndex{2}\): Relative Frequency Table for Type of Car Data

    Second, multiply each relative frequency by 360° to obtain the angle measure for each category.

    Category Relative Frequency Angle (in degrees (°))
    Ford 0.10 36.0
    Chevy 0.24 86.4
    Honda 0.12 43.2
    Toyota 0.24 86.4
    Nissan 0.20 72.0
    Other 0.10 36.0
    Total 1.00 360.0
    Table \(\PageIndex{3}\): Pie Graph Angles for Type of Car Data

    Now draw the pie graph using a compass, protractor, and straight edge. Technology is preferred. If you use technology, there is no need for the relative frequencies or the angles.

    The pie graph for this example should be like the one below.

    Pie chart for type of car data
    Figure \(\PageIndex{4}\): Pie Chart for Type of Car Data

    As you can see from the graph, Toyota and Chevy are the most popular car types, while the cars labeled other are liked the least. Based on the car types that are known, Ford is the least common one in the sample.

    Attributions

    "2.1: Qualitative Data" by Kathryn Kozak is licensed CC BY-SA 4.0

    Exercises

    1. The table below shows the percentage of Arizona workers aged 16 and older who use carpool, drive alone, take public transportation, or other means of transportation.
    Table 2.1.4: Data on Travel Mode for Arizona Workers
    Transportation type Percentage
    Carpool 11.6%
    Private Vehicle (Alone) 75.8%
    Public Transportation 2.0%
    Other 10.6%
    1. Create a bar graph with the information provided above.
    2. Create a pie chart with the information provided above.
    3. What transportation type is the most common for Arizona workers?
    4. Which known transportation type is used by the fewest Arizona workers?

    Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
    MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.

    QR code linking to the MyOpenMath version of the question above with step-by-step guided problem-solving.

    1. The number of deaths in the United States due to carbon monoxide (CO) poisoning from generators from the years 1999 to 2011 is provided below.
    Table 2.1.5: Data of Number of Deaths Due to CO Poisoning
    Region Number of Deaths from CO from Generators
    Urban Core 401
    Sub-Urban 97
    Large Rural 86
    Small Rural/Isolated 111
    1. Create a bar graph with the information provided above.
    2. Find the relative frequency and percentage for each row of the table. Round relative frequency to two decimal places and percentage to the nearest whole number.
    3. Use the information from the table above and part b to create a pie chart.
    4. What region has the most deaths due to CO from generators?
    5. What region has the least deaths due to CO from generators?

    Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
    MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.

    QR code linking to the MyOpenMath version of the question above with step-by-step guided problem-solving.

    1. In Connecticut, households use gas, fuel oil, or electricity as a heating source. The table below shows the percentage of households that use one of these as their principal heating source ("Electricity usage," 2013), ("Fuel oil usage," 2013), ("Gas usage," 2013).
    Table 2.1.6: Data of Household Heating Sources
    Heating Source Percentage
    Electricity 15.3%
    Fuel Oil 46.3%
    Gas 35.6%
    Other 2.9%
    1. Create a bar graph with the information provided above.
    2. Create a pie chart with the information provided above.
    3. Which one is the most common heat source in Connecticut?
    4. Which known heating source is the least commonly used in Connecticut?

    Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
    MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.

    QR code linking to the MyOpenMath version of the question above with step-by-step guided problem-solving.

    1. Eyeglassomatic manufactures eyeglasses for different retailers. They test to see how many defective lenses they made from January 1 to March 31. The data below gives the defective type and the number of defects.
    Table 2.1.7: Data on Glass Production Defect Types
    Defect type Number of Defects
    Flaked 1992
    Wrong axis 1838
    Chamfer wrong 1596
    Right shape - big 1105
    Lost in lab 976
    Scratch 5865
    Right shaped - small 4613
    Spots/bubble - intern 976
    Crazing, cracks 1546
    Wrong shape 1485
    Spots and bubbles 1371
    Wrong height 1130
    Wrong PD 1398
    1. Create a Pareto chart with the information provided above.
    2. Describe what Pareto chart tells you about what possibly causes the most defects.

    Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
    MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.

    QR code linking to the MyOpenMath version of the question above with step-by-step guided problem-solving.

    1. A recent survey conducted at a local university revealed students' preferred study locations on campus. Here are the results, highlighting the percentage of students who favor each area for studying.
    Table 2.1.8: Data on Study Areas
    Study Area Popularity (%)
    Library 45%
    Cafeteria 15%
    Outdoor Quad 20%
    Dorm Lounge 10%
    Computer Lab 25%
    STEM Center 18%
    Coffee Shop 12%
    1. Create a Pareto chart with the information provided above.
    2. Describe what the Pareto chart tells us about the most popular study area.

    Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
    MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.

    QR code linking to the MyOpenMath version of the question above with step-by-step guided problem-solving.

    Answers

    If you are an instructor and want the solutions to all the exercise questions for each section, please email Toros Berberyan.


    This page titled 2.1: Qualitative Data is shared under a CC BY-SA 4.0 license and was authored, remixed, and/or curated by Toros Berberyan, Tracy Nguyen, and Alfie Swan via source content that was edited to the style and standards of the LibreTexts platform.