Skip to main content
Statistics LibreTexts

3.1: Variables - Not Always Obvious

  • Page ID
    48884
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    When we want to study variation, we must organize it. We use variables as the mechanism to organize the variation in the construct we are observing. If someone dumped a pile of ingredients on the table and asked you to make something out of it, it would be impossible because the pile is a mess. Suppose the dumped pile consisted of sugar, flour, powdered sugar, brown sugar, confectioner’s sugar, cake flour, wheat flour, milk, cream, bread flour, sour cream, eggs, salt, baking powder, baking soda, red food coloring, and it was all mixed up. In that case, it is obvious that no one would dare touch that pile of ingredients. Among the apparent reasons this scenario is impossible to work with is because you have no idea which ingredient is which and how each ingredient will be used to create the final product, whatever that could turn out to be.

    Variables are like containers that sort ingredients. We have sugar and flour; we figure out what type of sugar and flour we have, then sort the sugar and flour into containers. If we have powdered sugar, regular sugar, and brown sugar, then we need three containers, and if we are meticulously organized, we put those three containers on a shelf away from the flour. If we have regular flour, bread flour, and wheat flour, then we need three containers, and then put those three containers on separate shelves, away from the sugar. This process of sorting and keeping the ingredients separate from each other is the process of sorting variables by type. The initial process of sorting by type starts with your definition of each ingredient. We define flour and sugar as different, then sort them according to their differences.

    It is important to note that the first step is sorting. You cannot mix up the ingredients. To state the obvious, each ingredient serves a different purpose in creating the outcome. No one would argue that sugar and flour are not the same ingredients and cannot be put in the same container. However, you might notice that the powdered sugar and the confectioner’s sugar look the same and put them into the same container. Is that wrong to do? I will address this question when we discuss variable types.

    Once we sort the ingredients, we decide how much sugar and flour we have. We need to know how much each ingredient is in each container. This is in contrast to the first step, which is sorting by type or by definition. The second step in this process is measurement. We want to know how much of each ingredient is in each container.

    So, creating a variable involves two processes. First, observe a pile of stuff to sort; next, sort that stuff into types, which is defined by how we define each type, and then measure how much of that type we have. Using this process, we find we have four cups of regular flour, two cups of sugar, one cup of milk, one egg, and one teaspoon each of salt and baking powder, which is enough to make a waffle, at least the way I make it.

    The example sounds silly, but it becomes interesting when you think about psychological variables that overlap in content and are hard to measure. It is not always obvious what variable best represents a concept. Gender used to be considered a male/female-only variable. Gender was thought of as a binary concept. But society evolves and understanding of everything evolves. We now know that gender is not quite a binary concept, especially when you consider the continuum of biological variation, and the expression of gender identity. Focusing on gender identity for now, we expand our idea of gender from a binary male-female to male, female, transgender male to female, transgender female to male. You can rightfully argue if these are the only four categories to represent the continuum of gender identity and whether someone only fits into just one of those categories. This example illustrates the difficulty of selecting a variable to present our observations of a concept, such as gender.

    The difficulty compounds when sorting through a pile of observations and determining how many variables should be used to represent the pile of observations when observations overlap in content. Self-harm and suicide behaviors are considered two separate issues, but sorting through the items to determine which behaviors indicate self-harm or suicide can be difficult. We then need to decide if there should be two or one variables for this pile of observations. Abuse types become a challenge when you think about distinct types of abuse and the overlap between them. Emotional abuse, physical abuse, and sexual abuse likely overlap in terms of the survivor’s experience. Some researchers separate these issues; some collapse them into a single score. We then need to decide if there should be three variables, or an overall variable. Most psychological assessments have a total score, followed by subscale scores. So, you might have one variable or several variables, but then those variables could overlap in terms of shared items or associations. It is not always obvious what variables we should use to sort through our observations of a construct.

    Does this problem matter? Possibly. It matters if you think the observations should be merged or separated because it will adversely affect the outcome or is unlikely to affect the outcome. Imagine if you mixed up your types of flour. You would not know if you had regular or bread flour in your measuring cup. To most, that might not matter, especially if you are me and making waffles the way I make waffles. The end product will be a waffle, and I highly doubt that my son will notice the difference in taste and texture. However, it definitely matters for an experienced chef because mixing those two flours together affects the outcome. The professional chef will know what to do with those two types of flour and how they affect the outcome, which is the waffle. Will my son notice it, Will having a delicious waffle or my waffle affect the outcome of his day? Let us hope not.

    This issue occurs in psychology research. We want to be clear about which variable is affecting what outcome. We will revisit this issue later. But for now, it should be noted that while some variables seem evident in terms of type and measurement, such as gender, age, or depression, it is not always obvious what variables we have on hand for our statistical analysis and how these variables will affect our understanding of the outcome we want.

    3.1.1: Variance – a Fancy Term

    Each variable has a variance. Variance is the term we use to describe variation. Let us take a moment to connect the concepts of variation and numbers. Statistics and its numbers can seem so confusing and bewildering because we do not know how to interpret statistics. Statistics is nothing more than numbers. There are many ways we use numbers in statistics. For now, to describe variance, all a number does is represent our perception of the variation of “it,” or the construct. A variable is simply a way to organize the variation of a construct. Recall that all constructs vary. We use numbers to describe this variation. A variable will have an array of numbers and all that array does is organize the numbers. Person A will have a value of “10,” and is greater than person B with a value of “9.” When the construct varies, the numbers that represent our perception of the variation vary. In this case, person A is greater than person B on whatever construct that the variable represents. In the spirit of de-mystifying statistical datasets, at the end of the day, a variable is simply a collection of numbers that represent which persons have numbers that represent greater or lesser amounts of the construct.

    The opposite of a variable is a constant. Generally, true constants are rare. It bears mentioning that if a variable does not vary much at all, it functions as a constant. Constants really do not help us understand variation. We need variation, so you need variables that do vary.

    The scenario of having variables that are essentially constants can happen. Suppose you want to examine male and female mental health experiences in the police force. There are more males than females in a given police district. But not just more males; it is likely that there are two or three females compared to 40 males within a given district. With only three females, it might as well be treated as a constant because there is limited variation among the females, thereby making the comparison between males and females in the police department moot. What to do? It depends on your conceptualization of the problem, but you will likely need to extend your recruitment of females if having a male-female comparison is important for your research study.

    Back to variance. Variance is a way to communicate how much that variable varies and what the quality of the variation is. In general, you want a good range of variation or more variance. More variance means more variation in the variable. More variation means we can detect differences among entities. We can see how someone is more depressed, less depressed, clinically depressed, or mildly depressed. We want to know what type of variation we are examining. For example, is the depression a chronic type, or is the depression more as an expected response to a situation, such as a death in the family? We want to know the quality of that variation. For a given group of, say, clients attending a counselling center, is the variation all clumped together with everyone having low depression and only a few clients with elevated levels of depression? We will address how to answer these questions when we examine central tendency and variability measures, including means, standard deviations, and distributions.

    When using statistics, you will encounter the idea of partitioning the variance. What that means is that we want to divide the variability into types. For now, we are usually interested in two kinds of variability – true variance and error variance. True variance is a variation that does vary because of actual changes in the variable. Quite simply, age changes because there are differences in age among people. Error variance is anything that varies that is not due to the actual changes in the variable. Notice that error variance is not always just an error. There could be something else going on. Things can vary simply by accident. A person can mistype their age into a form. Or someone can round up their age; instead of saying they are 20 years and 11 months, they may round up their age to 21.

    For now, we introduce the concept of variance to describe variation so we can turn to how we use variables to describe the variance.

    3.1.2: Variables – Way Too Much to Know about Variables

    All research is based on variables. Variables represent the variation that we see. Any issue, phenomenon, or construct that varies is a variable. We relate variables to each other so that we can say how phenomena are related to each other. For example, we want to know if the time variable is associated with the variable emotional development. A research question could be: As youth mature over time, do their emotions become sophisticated? Another example is that we want to know if the number of sessions attended is associated with successful therapy outcomes. A research question could be: As clients attend more sessions, does their mental health improve?

    A variable is a way to describe a phenomenon's characteristics, which have different values. These values are essentially numbers, and we assign numbers to represent our observations of the level by amount, intensity, or type of a phenomenon.

    Any issue, construct, or phenomenon can be represented with variables. Depression is a variable. The number of hours spent studying is a variable. Typically, we encounter issues that have more than one variable—for example, drug use. We can think of variations in drug use by drug type, how often it is used, and how long a person has used it. Or we think of an issue and think of various kinds within that issue. For example, attachment has distinctive styles: Secure, Anxious, Ambivalent, and Undefined. Or we can take an issue and subdivide it. For example, intelligence can be considered a general construct, or we can subdivide it into related types, such as Verbal, Math, and Reasoning. We think there are distinct types of intelligence, such as emotional or streetwise intelligence.

    3.1.3: Variables – Why you Give a Rat’s Ass….

    Knowing how a variable varies helps you organize your understanding of “it”, or the construct, and the variation of the construct. Knowing this information enables you to understand the following:

    • Deciding what statistical analysis to use.
    • Knowing how to organize and conceptualize your own research question and design in advance.
    • Knowing how the variables can be associated with each other. Statistical analysis aims to understand the variation by learning how “it” varies and how other “it’s” vary in association with it. Put differently, we want to know in what ways people are different from each other and if something can help us understand why people are different from each other.

    This page titled 3.1: Variables - Not Always Obvious is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Peter Ji.