2.1: Qualitative Data
- Page ID
- 45169
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)
- Explore qualitative (categorical) data representing non-numerical categories such as colors, brands, or types.
- Use bar graphs, Pareto charts, and pie graphs to visualize categorical data.
- Highlight frequencies, proportions, and key patterns within the categories for clearer analysis
Remember, qualitative data are words describing an individual's characteristics. Several different graphs can be used to represent qualitative data, including bar graphs, Pareto charts, and pie charts.
Pie charts and bar graphs are the most common ways of displaying qualitative data. A spreadsheet program like Microsoft (MS) Excel can create both graphs. The first step for both graphs is to generate a frequency or relative frequency table. A frequency table summarizes the data and lists the number of occurrences of each data type.
Categorical Frequency Distribution
Suppose you have the following raw data for different car types for students parked in a college parking lot.
Ford, Chevy, Honda, Toyota, Other, Toyota, Nissan, Nissan, Chevy, Toyota, Honda, Chevy, Toyota, Nissan, Ford, Toyota, Other, Nissan, Chevy, Ford, Nissan, Toyota, Nissan, Ford, Chevy, Toyota, Nissan, Honda, Other, Chevy, Chevy, Honda, Toyota, Chevy, Ford, Nissan, Other, Toyota, Chevy, Honda, Chevy, Toyota, Chevy, Chevy, Nissan, Honda, Toyota, Toyota, Nissan, Other
Solution
A list of data is too hard to look at and analyze, so you need to summarize it. First, you need to decide the categories. Second, list the car types in the first column. Third, for the values in the frequency column, we count the number of cars per car type in the list. For example, there are 5 Fords, 12 Chevys, 6 Hondas, 12 Toyotas, 10 Nissans, and 5 Others. Finally, the total of the frequency column should be the number of observations in the data by adding all the frequency values. The total number of data values is denoted as n. In this example, n = 50. Based on the explanation above, the frequency distribution will look like the table below.
| Category | Frequency |
|---|---|
| Ford | 5 |
| Chevy | 12 |
| Honda | 6 |
| Toyota | 12 |
| Nissan | 10 |
| Other | 5 |
| Total | 50 |
A relative frequency is a percentage that is expressed in decimal form. Since raw data values are not as useful in describing data, it is better to create a third column that lists the relative frequency of each category. The relative frequency per category is the frequency divided by the total. For example, the Ford category is computed as follows:
relative frequency \(= \dfrac{5}{50} = 0.10\)
This can be written as a decimal, fraction, or percent. Based on the explanation above, the frequency distribution with a relative frequency column should look like the table below.
| Category | Frequency | Relative Frequency |
|---|---|---|
| Ford | 5 | 0.10 |
| Chevy | 12 | 0.24 |
| Honda | 6 | 0.12 |
| Toyota | 12 | 0.24 |
| Nissan | 10 | 0.20 |
| Other | 5 | 0.10 |
| Total | 50 | 1.00 |
The relative frequency column should add up to 1.00. It might be off a little due to rounding.
We can use the frequency distribution to display the data using different types of graphs. These graphs include the bar graph, the pie graph, and the Pareto chart.
Bar Graphs
Bar graphs consist of the frequencies on the y-axis (vertical axis) and the categories on the x-axis (horizontal axis). For each category, draw a bar with a height equal to each frequency. All bars should have the same width, and the spaces between them should be the same.
Draw a bar graph of the data in Example \(\PageIndex{1}\).
Solution
| Category | Frequency | Relative Frequency |
|---|---|---|
| Ford | 5 | 0.10 |
| Chevy | 12 | 0.24 |
| Honda | 6 | 0.12 |
| Toyota | 12 | 0.24 |
| Nissan | 10 | 0.20 |
| Other | 5 | 0.10 |
| Total | 50 | 1.00 |
To construct a bar graph using frequencies:
- Put the frequency scales on the y-axis and the category on the x-axis.
- Draw a bar above each category with a height equal to its frequency.
- Label the y-axis, x-axis, and the graph using appropriate titles.
For this example, if the steps are done correctly, you should have a graph similar to the one below.
Notice from the graph, you can see that Toyota and Chevy are the most popular cars, with Nissan not far behind. Ford seems to be the type of car that is least liked; the cars labeled as others would be liked less than a Ford.
Some key features of a bar graph:
- Equal spacing between bars.
- Bars are the same width.
- There should be labels on each axis and a title for the graph.
- There should be a scaling on the y-axis, and the categories should be listed on the x-axis.
- The bars don’t touch.
You can also draw a bar graph using relative frequency on the y-axis. This is useful when comparing two samples with different sample sizes because it is better to compare by percentages/relative frequencies to see which categories are the most common. The relative frequency graph and the frequency graph should look the same, except for the scaling on the y-axis.
To construct a bar graph using relative frequencies:
- Put the relative frequency scales on the y-axis and the category on the x-axis.
- Draw a bar above each category with a height equal to its frequency.
- Label the y-axis, x-axis, and the graph using appropriate titles.
For this example, if the steps are done correctly, you should have a graph similar to the one below.
Pareto chart
A second type of categorical data graph is a Pareto chart, which is a bar graph with the bars sorted from the highest frequency to the lowest frequency, starting from the left. It is used to represent the vital few items. In this case, it will be the most popular car types. These are Chevy and Toyota. This is especially useful in business applications, where you want to know what services your customers like the most, what processes result in more injuries, which issues employees find more important, and other types of questions like these. Here is the Pareto chart for the data in Example \(\PageIndex{1}\).
Pie Chart
The pie chart is a circle divided into sections according to the percentage of frequencies in each category. The formula for these percentages is P = f / n * 100%. Where P stands for percent, f for the frequency of each class, and n for the sum of all the frequencies. To draw each section, use the percentages (recall that a quarter of the circle equals 25%).
A random extended family has been selected, and their family members are documented and listed according to the age group below.
| Categories | |
|---|---|
| Children | 10 |
| Young Adults | 12 |
| Middle Age | 19 |
| Elderly | 8 |
Table \(\PageIndex{5}\): Age of Family Members
Solution
Guidelines for Creating a Pie Chart
- Compute the relative frequencies by dividing each frequency by the total frequency.
- Multiply each relative frequency by 100 to get the percentages.
- Draw a circle.
- Draw and label each section according to the percentage of each category.
| Categories | Frequency | Relative Frequency (Rounded to Two Decimal Places) | Percent |
|---|---|---|---|
| Children | 10 | 10/49 ≈ 0.20 | 0.20 \(\cdot\) 100 = 20% |
| Young Adults | 12 | 12/49 ≈ 0.24 | 0.24 \(\cdot\) 100 = 24% |
| Middle Age | 19 | 19/49 ≈ 0.39 | 0.39 \(\cdot\) 100 = 39% |
| Elderly | 8 | 8/49 ≈ 0.16 | 0.16 \(\cdot\) 100 = 16% |
Table \(\PageIndex{6}\): Age of Family Members With Relative Frequencies and Percents
We have a pie chart with four sections. The children section is represented by 20%. The young adults section is represented by 25%. The middle age section is represented by 39%. Finally, the elderly section is represented by 16%. The size of the slices is based on the percentages.
Draw a pie chart of the data in Example \(\PageIndex{1}\).
First, you need the relative frequencies.
| Category | Frequency | Relative Frequency |
|---|---|---|
| Ford | 5 | 0.10 |
| Chevy | 12 | 0.24 |
| Honda | 6 | 0.12 |
| Toyota | 12 | 0.24 |
| Nissan | 10 | 0.20 |
| Other | 5 | 0.10 |
| Total | 50 | 1.00 |
Second, multiply each relative frequency by 360° to obtain the angle measure for each category.
| Category | Relative Frequency | Angle (in degrees (°)) |
|---|---|---|
| Ford | 0.10 | 36.0 |
| Chevy | 0.24 | 86.4 |
| Honda | 0.12 | 43.2 |
| Toyota | 0.24 | 86.4 |
| Nissan | 0.20 | 72.0 |
| Other | 0.10 | 36.0 |
| Total | 1.00 | 360.0 |
Now draw the pie graph using a compass, protractor, and straight edge. Technology is preferred. If you use technology, there is no need for the relative frequencies or the angles.
The pie graph for this example should be like the one below.
As you can see from the graph, Toyota and Chevy are the most popular car types, while the cars labeled other are liked the least. Based on the car types that are known, Ford is the least common one in the sample.
Authors
"2.1: Qualitative Data" by Toros Berberyan, Tracy Nguyen, and Alfie Swan is licensed under CC BY-SA 4.0
Attributions
"2.1: Qualitative Data" by Kathryn Kozak is licensed CC BY-SA 4.0


