1.2: Data Types- Categorical vs. Numerical
- Page ID
- 58855
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Classifying Data
Now that we've seen what a dataset might look like, it's time to develop the language we need to describe data accurately and consistently. At the heart of most datasets are variables. In this section, we'll define what a variable is, explain the main types of variables you’ll encounter in statistics, and introduce key subcategories that will help you prepare for more advanced topics later on.
What Is a Variable?
Definition: Variable
A variable is a characteristic or attribute that can take on different values for different individuals or observations. In a dataset, each column usually represents a variable, and each row provides the values for a single observation.
Variables help describe individuals in our dataset whether they are people, plants, countries, experiments, or ideas. We classify variables based on the kind of information they represent.
Examples of Variables
Here are just a few examples of variables you might encounter in different kinds of data:
- In a housing dataset: Price of the home, square footage, number of bedrooms, ZIP code, listing date
- In a student survey: Age, preferred learning method (online/in-person), major, number of credits completed, satisfaction with campus resources
- In a fitness app: Daily step count, heart rate, type of workout, duration of workout, calories burned
- In a food delivery service: Order total, payment method, delivery time, customer tip, restaurant type
- In a wildlife field study: Species observed, habitat type, weight of animal, date of observation, number of sightings
In each case, variables describe the features or measurements of the observations — whether those observations are people, houses, animals, or transactions.
As we move forward, we’ll learn to classify these variables based on the kind of information they represent: names or labels vs. counts and measurements. This classification guides which graphs and summaries we use later on.
Main Types of Variables
Variables generally fall into one of two main types:
- Categorical variables describe qualities, descriptive labels, or characteristics. They place individuals into distinct groups or categories.
- Numerical variables represent amounts or quantities. They are used to measure or count something.
Categorical Variables
Categorical variables (also known as qualitative variables) take on values that are labels or names. These values reflect categories, not quantities. Even if we use numbers to code them (like 1 = Red, 2 = Blue), the numbers are just labels, not measurements.
Example: The tree species in the Tree Health Study (“Maple”, “Oak”, “Birch”) is a categorical variable.
Nominal vs. Ordinal
Categorical variables can be further classified into nominal or ordinal types:
- Nominal variables have categories with no inherent order.
- Ordinal variables have categories with a meaningful order, but the distances between categories are not necessarily equal.
Examples of Nominal Categorical Variables
- Tree species: oak, maple, pine, birch
- Favorite color: red, green, red
- City of birth: Atlanta, Boston, Seattle
- Marital status: single, married, divorced
- Blood type: A, B, AB, O
Examples of Ordinal Categorical Variables
- Tree health rating: poor, fair, good, excellent
- Spicy level on a menu: mild, medium, hot
- Education level: High School, Bachelor’s, Master’s, PhD
- Customer satisfaction: very unsatisfied to very satisfied
- Fitness level: beginner, intermediate, advanced
Numerical Variables
Numerical (also known as quantitative) variables represent a measurable quantity. These values are numbers where basic math makes sense: you can average them, compare differences, and perform meaningful computations.
A helpful question to ask: If we added the values together, would the result make any sense?
Discrete vs. Continuous
Numerical variables can also be further classified into discrete or continuous types.
- Discrete variables can only take on certain values, often only whole numbers.
- Often these are counts but this could also include examples like shoe size where we have "half" sizes but nothing in between.
- Continuous variables can take on any value on a number line within a range.
- Typically these are are based on measurements. An example would be the length of an object, which could be 1 inch, 2 inches, or anything in between like 1.5482675 inches. Even though we typically will round continuous data values when they are recorded, these variables could have any value, not just whole numbers.
Examples of Discrete Variables
- Number of trees in a plot
- Number of visitors to a park on a given day
- Number of siblings a person has
- Number of pets in a household
- Number of books read in a year
Examples of Continuous Variables
- Canopy area in square feet
- Air temperature in Celsius
- Wind speed in miles per hour
- Time it takes a seed to sprout (days)
- Human height in centimeters
Sometimes, it may be tricky to tell which category a variable belongs to right away. For example, age could be described as a continuous variable (like 21.7 years old), but if grouped into age ranges ("Under 18", "18–34", "35–49", etc.), it becomes categorical and ordinal.
Why This Matters
Knowing the type of each variable is essential for choosing the right statistical tools later. Some graphs and summaries only make sense for numerical data (like histograms), while others are designed specifically for categorical data (like bar charts). Statistical tests also depend heavily on whether the variables involved are categorical or numerical.
Activity: Think of Your Own Examples
Now it’s your turn! Try to come up with at least one new example for each of the four subtypes of variables below:
| Variable Subtype | Your Example |
|---|---|
| Nominal (Categorical) | _________________________ |
| Ordinal (Categorical) | _________________________ |
| Discrete (Numerical) | _________________________ |
| Continuous (Numerical) | _________________________ |
Bonus challenge: Think of a variable you might need to reclassify depending on context (like “age” or “income level”). Describe both ways it could be used.
Quick Quiz: What Type of Variable Is It?
For each variable below, select the most appropriate type.
What’s Next?
Now that we understand what different types of variables are and how to describe them, we're ready to think about how we collect data. In the next section, we'll explore the ideas of populations and samples, and how we design studies using sampling methods that help us draw meaningful and fair conclusions.


