Skip to main content
Statistics LibreTexts

5.2: Quantitative and Qualitative Data

  • Page ID
    61395

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    When a population has been identified, and members or items from the population have been observed, there are characteristics or measurements on these items that researchers are interested in for the purpose of studying an issue. This is the data from which we hope to learn valuable information about the issue. Data can come in many different forms, including oral histories, photographs, demographics, scientific measurements, and visual graphs. A study of the conditions of African American families in the post-Civil War era may collect observations in the form of oral or written family histories, photographs, and demographic data. A study of economic conditions, race, and gender may consider demographic data such as income, race, consumer debt, loan default rates, and credit ratings. A study of health access and gender identity may use demographics, measures of health, and data on the frequency of doctor visits. In the pharmaceutical industry, studies of the efficacy of new drug treatments often include observations on patients’ health histories, demographic information, health indicators, measures of the effectiveness of the treatment, and observed side-effects. A study of a new fertilizer may include information about soil nutrients, growth rates, and crop yields.

    It is important to note that a single study may have all types of data, and it is up to the researchers to understand how to bring the information contained in the data together in a way that provides reliable answers to the important questions about the issues they hope to study. The forms of the data have a great bearing on what methods can be used to extract this information. Therefore, it is very important that researchers correctly classify the type of data that has been observed. It is also important for those who read and critically evaluate this research to be able to think about the form of the data that was observed and to be able to critically evaluate what decisions the researchers made about how information was extracted from the data.

    The first major way to classify data into different types concerns whether the data can be represented in a natural numeric form, or whether the form of the data has a more abstract form that does not have a natural numeric form. Data that have a natural numeric form, that is, data that can be represented easily and uniquely as numbers or values, is known as quantitative data.

    Definition: Quantitative Data

    Data observed from a variable that have a natural and unique numeric value are called quantitative data.

    Some types of data are based on measurements. These types of data are almost always quantitative. They include variables corresponding to scientific measurements such as height, weight, temperature, length, and time. For each of these variables there is a natural system of measurements that provides unique information about the observed values. For example, if someone states that a lecture lasted 62 minutes, this measurement means the same thing to each person who might observe the variable. Similarly, if a length is reported as 1.77 meters, this measurement means the same to everyone without any additional information.

    Some data may be represented numerically, but the representation may not be unique. For example, consider a study where a variable corresponding to relationship status is observed. The actual observed values may be categorized as single, cohabitation, married, divorced, or separated. For simplicity, a researcher may represent this data numerically (see example in Table 5.1). Now suppose you are walking down the street, and someone asks you your relationship status. If you respond that your relationship status is 3, will the other person automatically know how to interpret that data? Unless they have access to the additional information contained in Table 5.1, they will not know what the data means. In fact, another researcher doing a study may represent the same categories in a completely different way, as shown in Table 5.2. Note that one of these representations is not better than the other. One is not more correct than the other. Both representations are equally valid, but the actual data values are completely different.

    Table 5.1. A possible numerical representation of relationship status.

    Category

    Representation

    Single

    1

    Cohabitation

    2

    Married

    3

    Divorced

    4

    Separated

    5

    Table 5.2. A different possible numerical representation of relationship status.

    Category

    Representation

    Single

    0

    Cohabitation

    1

    Married

    2

    Divorced

    3

    Separated

    4

    The example based on marital status demonstrates why the qualifier unique is given in the definition of quantitative data. In the example of the length of a lecture, an observation of 62 minutes means the same to everyone. Someone may report the length alternatively as 3,720 seconds, or 1.03 hours, but the meaning is the same as there is a mathematical method for converting between the units of measurement. All these measurements represent the same idea. For the relationship status data, there is not a unique representation. You must have access to Tables 5.1 and 5.2 to decode the numerical data representations of the relationship statuses.

    These examples highlight the difference between data that can be naturally and uniquely represented numerically, that is quantitative data, and data that do not have a natural unique numerical representation, which are called qualitative data.

    Definition: Qualitative Data

    Data observed from a variable that does not have a natural and unique numeric value are called qualitative data.

    The names of the two types of data are very close to one another which can cause some confusion. Looking at the leading parts of each word can be helpful in remembering the two types of data. The word quantitative starts with quant which is close to the word quantity, which can be thought of as a numerical value. Similarly, the word qualitative starts with qual which is close to the word quality, which can be thought of as a non-numerical quality of an individual or item.

    Relationship status is one example of a qualitative variable. Others include ethnicity, gender, housing type, and employment status. Data of this type can often be represented as categories. It is important to realize that it is sometimes difficult to determine whether data is quantitative or qualitative. Further, how the data is represented may change how the variable is categorized. For example, suppose that a study observes the type of vehicles that pass through a toll booth on a highway. It would be quite natural to categorize data using categories like motorcycle, car, truck, van, mini-van, and SUV. In this case the data are qualitative as there is no natural numbering system. However, the researchers might be really interested in the toll collected from each vehicle, which is a function of the vehicle type. If the researchers instead track the collected toll, then the variable would be quantitative.

    Conversely, a variable like age might seem to be automatically quantitative since it is a length of time. But some studies might categorize age using broad categories such as Young (0-12), Young Adult (13-21), Adult (22-64), and Senior (65 and over). Even though the categories have some numerical structure to them in that we know the people in the “Young” category have an age smaller than those in the “Adult” category, there are no natural numbers that would be associated with the categories. If we observed someone in the study and we just know that they are “Young”, what number do we assign to their age? There is not a logical answer unless we have more information.

    Some information and observations are even more abstract and have less formal framework than those examples given above. A study of gender attitudes in the early part of the 20th century may have data that consist solely of diary entries of young women during the time. This type of quantitative research may rely on experts reading the diaries and offering opinions on gender attitudes based on these readings. However, it is also quite plausible for this type of data to be transformed into numerical data as well. For example, one could count the number of times certain key words such as “subservient” or “duty” appear per page in the diaries. In this case the original data would be qualitative, and quite abstract, but the transformed variables would be quantitative.

    An example of such a qualitative study was based on interviews and observations at two juvenile justice centers for girls. The purpose of the study was to explore how the state failed to meet the material needs of young women on their release from the system, their future expectations about state support, and their sense of personal responsibility (Myers, 2017). The data gathered for the study was based on interviews with inmates in the system, and the conclusions from the study were based on expert sociological opinions of the content of the interviews.


    This page titled 5.2: Quantitative and Qualitative Data is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by .

    • Was this article helpful?