3.1: Data types
- Page ID
- 45017
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Introduction
Data? Data refers to collections of facts, information, or statistics about an object. Data are either quantitative (numbers) or qualitative (observed properties that cannot be summarized by numbers). Data are measured and analyzed for research or reports to be used as evidence in support or against some hypothesis or for some other decision making arena (medicine, policy). Measurement implies a systematic effort to assign a numerical value to the thing that is measured; measurement units are standard quantities used to describe the same kinds of things. Examples of measurement units include kilograms (mass), meter (length), liter (volume), and Celsius (temperature).
Data also implies a means to code or structure information so that it can be analyzed. Raw data refers to unprocessed collection of information about an object, which then needs to go through data processing in order to be useful in the next steps. If you look more closely, you’ll see that considerable effort is made to standardize data formats for analytical purposes. Good examples of such standards are available in clinical research and genomics.
In statistics, we recognize data which belongs to either of two data types: quantitative or qualitative. We will return to data types repeatedly throughout our statistics journey — knowing which type you directs you to the types of statistical tests that are available to you. In brief, quantitative data types implies estimation of parameters about a population, hence, this data type points the user towards use of parametric statistics; qualitative data types do not lead to estimates of parameters, but provide counting of observations in categories.
Quantitative data
Discrete: countable or meristic, example: five Conus shells (Fig \(\PageIndex{1}\))

Interval: example: degrees Celsius (Fig. \(\PageIndex{2}\))

Ratio, true zero, examples: body mass, capillary blood glucose reading (Fig. \(\PageIndex{3}\)), degrees Kelvin, relative humidity (Fig. \(\PageIndex{4}\)).


Qualitative data
Binomial, yes/no, example: a person either has the condition or they do not; hydrangea petals may or may not be blue (Fig. \(\PageIndex{5}\)).

Nominal, example: names of species. Wolves and dogs are members of Canis lupus and Canis familiaris, respectively; house cats are not (Fig. \(\PageIndex{6}\)).

Identifying variables, or id numbers, are unique identification numbers or other for each record (individual) in the data set. These variables are categorical, nominal data type. Examples of id numbers include Social Security numbers, student identification numbers, driver’s license numbers, etc. Note that id numbers would only rarely be considered objects of study because they are typically assigned by researchers to subjects and not properties of subjects. Exceptions may include testing for impacts of anonymization procedures (for example, see Koll et al 2022).
Ordinal, ranked, example: Likert scale:
- Strongly disagree
- Disagree
- No opinion
- Agree
- Strongly agree
Although common practice, caution is warranted when converting Likert categories into numerical scale, for example, Strongly agree = 4, Strongly disagree = -4, and so on. Because it is ordinal, the difference between 4 and -4 can’t be calculated as the difference because it is ranked, not the numerical scale.
Biologists should know their data types before proceeding with an experiment.
Examples to try
In R, load the data set diabetic
(survival
package, which is loaded as part of R Commander), then view the variables.
For more about R data sets, see Part 6: Working with an included data set in Mike’s Workbook for Biostatistics
R code
data(diabetic, package="survival")
In R Commander (Fig. \(\PageIndex{7}\)):
Rcmdr: Data → Data in packages → Read data set from an attached package… Double click survival
, the list of data sets should appear in the right-hand panel. Select diabetic
, then click OK button.

View the data by clicking on Rcmdr’s View data set button, or, better, submit the following command in R:
head(diabetic)
R output:
id laser age eye trt risk time status 1 5 argon 28 left 0 9 46.23 0 2 5 argon 28 right 1 9 46.23 0 3 14 xenon 12 left 1 8 42.50 0 4 14 xenon 12 right 0 6 31.30 1 5 16 xenon 9 left 1 11 42.27 0 6 16 xenon 9 right 0 11 42.27 0
The command head()
displays by default the first six rows of a data frame.
It’s a good idea to read up on the data set. Data sets included with R packages often provide a help page. Submit the following command in R to load the help page.
help(diabetic)
The data set was subjects with high risk diabetic retinopathy; “each patient had one eye randomized to laser treatment and the other eye received no treatment.”
What are the data types for the variables? I’ll give you the a couple to start. The first column with entries 1 – 6 is called the index variable; it’s row 1, row 2, etc. of the data set and technically is not a data set variable (since its assignment is arbitrary) — R adds this for you. Next, the variable labeled id
— clearly we see numbers, so we might think meristic, but because these are labels for the subjects, the proper data type is nominal! Try identifying the data types and example units of measurement for the rest on your own, then open the hidden text immediately below to see the best answers.
- Answers to Examples to Try
-
laser: binomial, there were two types (xenon or argon)
age: ratio, years
eye: binomial
trt: binomial, no treatment (0) or laser (1)
risk: ordinal
time: ratio, time to event, number of months
status: binomial
Questions
Assign the data type and examples of units of measurement for each kind of measurement.
- Darts tossed, Distance from center.
- Shells, width, length.
- InfraRed temperature device readings.
- Body weight.
- Lung volume.
- Tomato color morphs (green, yellow).
- Tomato root length, stem length.
- Systolic blood pressure.
- Blood arsenic levels.
- Body Mass Index.
- Body Mass Index scale, for example NIH: underweight, normal, overweight, obese.