3.5: Nuances in Determining if a Variable Should be a Certain Type
- Page ID
- 49369
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)The sloppiness vs. precision continuum does not mean that one variable type, in this case, ratio, is better or more valued than the others, in this case ordinal or interval. The sloppiness and precision are descriptions, nothing more. Variables are not inherently sloppy or precise. Sometimes, a “sloppy” variable is the best we can do. The goal of having a variable is to describe the variation. The process is how we divide the variation. How we determine the process of separating the variation has its own nuances.
What can trip you up is that you think that a variable is inherently one type or another. It’s how you divide up the variable into sections or demarcations. You decide how to demarcate the variation you see into something that matters for your outcome of interest.
Let us take age. The first reaction is – “That’s got to be a ratio variable; there is a true zero point, and age can be divided into equal units called years. And years can be subdivided into months, days, and hours. So, age must be a ratio variable.”
That is true, but you can divide the variables in different ways to suit your purposes. So, you can divide the variable as a continuous variable into an ordinal or ratio variable.
For age, you could ask your participants in your research study to state their age as follows: “Write your age here: ____.” Or you can have your participants pick the age range that they are in: 17 and under, 18 to 20 or 21 and up. You can ask them for their perception of their age range: young adult, middle age, retirement age. The first version divides age into a ratio variable. Each response is placed along a continuum of years: 17, 18, 19, 20, 21, 22. The second version divides age into an ordinal variable. Each response falls into one of the three ranks: 1 = 17 and under, 2 = 18 to 20, and 3 = 21 and up. What matters is that in the ordinal version, the value that is entered into the dataset as the age variable becomes “1”; “2”; “3.” The statistical processor is going to compute their statistics using “1,” “2,” and “3” and not the actual age.
Does this matter? It matters depending on the outcome you are interested in. If you need the actual age, such as 17, 18, 19, 20, 21, and 22, in this case, you are saying that increasing the age from 17 to 18, then 18 to 19, and so forth, matters for what outcome you want. In this case, you might be interested in career choices. Students at 17 are about to graduate high school, so they have their career outlook. Their career outlook might be different when the students graduate at age 18. Then, from 18 to 19, they go to college or work, and their career or employment paths might be different at age 20 when they are in college or have two years of work post-high school. If you think your outcome variable of career or work aspirations changes as a result of age, and the changes in age from year to year from age 17 to 21 matter from a conceptual standpoint, then yes, consider “age” as a ratio variable because you need the variation of “age” to be divided along a ratio scale. The scale has equal units, in this case by year, and the number of years can stand independently without additional context.
Does scaling the variable age as an ordinal variable matter? Perhaps. You might be only interested in ranking the ages 17 and under, 18 to 20, and 21 and up. Why? Well, let’s extend our previous example. If you are examining career aspirations of a population of high school students, maybe you need to know if the student: graduated high school at age 17 and under; or if the student graduated high school and went to college or work; or neither, such as going on a European vacation. If the student is age 21, this presumably means the student has graduated, or is working full time, or still in school. Still, age 21 is the legal age to drink alcohol, and when most people would consider the adult stage, that age range has something meaningful to say about career choices. So, while “age” is the variable, how you divide it is according to what variations you need from “age” to predict what you want.
Age as an interval scale does seem hard to conceive. Age as a categorical variable does seem hard to conceive. So, no examples will be given here.
How about gender, though? Gender is often conveniently thought of as a categorical variable in a binary form: female and male. But as our thinking of gender advances, we know that the male-female binary is not the only option. Gender can be continuous. There is cis-gender male, cis-gender female, transgender male to female, transgender female to male, or non-binary gender. In this context, the variable gender can still be categorical. The codes for the expanded gender options range from 1 to 5. But these numbers are just codes; there is no inherent value in any of the numbers themselves. The numbers are entered into the dataset and the statistics processor as codes, and the statistics processor will not analyze the codes as being higher or lower than the others. There is no such thing as cis-gender female being greater in value than cis-gender male and transgender female to male as being greater in value than transgender male to female.
Could gender be thought of as a continuous variable? For that possibility, we need to consider whether the values range from low to high for something like gender. One scenario is considering gender as a continuum and how people perceive their gender qualities. Perhaps the continuum is masculinity to femininity. Some males might think of themselves as masculine; some females might think of themselves as masculine; some males might think of themselves as feminine, and some females might think of themselves as masculine. In this case, we could ask participants to rate their masculinity-femininity, ranging from low to high. What would the range be? We could use one to five, one to four, one to 10, or one to 50. Does the range matter? It should. Unless you have a conceptual reason to set the range at one to five and decide what you think a one could represent in terms of masculinity or femininity compared to a two, then you should set your range accordingly. You could set the scale from one to 10. In this case, you need to consider what a 9 means compared to a 10 for masculinity or femininity. If you set the scale from one to 50, then what does 45, 46, 47, 48, 49, or 50 mean in terms of the level of masculinity or femininity?
The example calls for an interval scale if you haven’t surmised by now. The identification of 1 to 5, 1 to 10, or 1 to 50, as the range is arbitrary. You do not know what a 1 or 5 or 10 or 50 means in terms of level of masculinity or femininity unless someone tells you. Does your choice of interval range matter? According to your outcome, yes, it could. Does it matter if the range is 1 to 10 vs 1 to 50? Does it matter if your participants indicate a 45 vs. an 8? If you set the ranges as 1 to 5, 1 to 10, or 1 to 50, then you need to conceptualize what those ranges represent to defend your choice.
It should be noticed here that the range is from masculinity to femininity. This range, as written, implies that the low end of masculinity is femininity, and the low end of femininity is masculinity. As far as anyone can tell, there is no verifiable evidence that the opposite pole of masculinity is femininity. It is quite likely that gender is not one continuum, but two continuums, masculinity and femininity. In turn, gender is now reconceived as gender expression, and now there are two types or two categories of gender expression: masculinity and femininity. Each type could have an interval scale: 1 to 5, 1 to 10, and 1 to 50. Does this matter? Conceptually, it is best to think of your variable of gender expression as two types because people may consider themselves as exclusively all masculine, exclusively all feminine, or both masculine and feminine, or neither masculine nor feminine. If that structure of gender identity matters for your outcome, then it is best to think of dividing the continuum of gender identity expression in that fashion.
These discussions are meant to illustrate that variables are not inherently categorical or continuous; if continuous, they are not inherently ordinal, interval, or ratio. The demarcations and scaling of your variable should be according to your conceptualization of the variation of the variable and how your variable relates to the variations in your outcome variable. Variables can be categorical, dichotomous or several categories, or continuous, and in ordinal, interval, or ratio form. Depending on the context of what you need that variable to vary to predict your outcome, according to your conceptualization of both the predictor variable and the outcome variable, that conceptualization determines the type of variable, which is based on how you want to demarcate or subdivide, the variation of that variable.
There is no “right way” for a construct to be one variable type or data.
Always think conceptually!!!


