Skip to main content
Statistics LibreTexts

3.2: How to Organize Variation With Variables

  • Page ID
    49365
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    There are two broad decisions you need to make when considering your variables for a research study or when understanding the variables you have for your statistical analysis:

    1. Identify your independent variables and dependent variables
    2. How do we scale the variables: categorical or continuous?
      1. If continuous: ordinal, interval, or ratio.
      2. If ordinal or interval, decide if the variable is scaled by amount or intensity.

    First Decision: Identify your Independent and Dependent Variables

    Statistical analyses are based on independent and dependent variables. Let us start with the independent variable (IV). A basic definition of the IV is the variable you change, or one that changes independently. Other terms are used for the independent variable, including input variable and predictor variables. Some examples of IVs are treatment conditions with two types: control and treatment. Gender has two kinds: male and female (assuming a binary). Time can be variable, with a range of more or less time. Self-esteem can be a variable that ranges from low to high self-esteem.

    Dependent variables (DV) vary based on the variations in the independent variable. Other terms for the dependent variable include the outcome variable and criterion variable. Some examples are treatment conditions with two types: substance use recovery, two types no relapse, relapse. PTSD level, none too severe.

    It is not always clear what the independent variable and the dependent variable are. IVs and DVs can sometimes be interchangeable. What leads to what? How can you tell the difference between IV and DV?

    The answer is always to think conceptually. Your conceptualization addresses this question - how are these variables put together according to your intentions? Conceptualization is synonymous with theory, models, frameworks, or orientation. Briefly speaking, conceptualization is how you make sense of all the information about how the world works. We have so much information to put together, and we need a guide on how to accomplish that task. There are many conceptualizations. In psychology, we have theoretical orientations. If we use a cognitive-behavioral orientation, we focus on cognitive thoughts and behaviors, and we use interventions to amend those cognitive thoughts and change those behaviors. If we use psychodynamic orientation, we focus on past experiences and see how those experiences shape our present experiences. For statistics, use your conceptualization about how the variable relates to the other variable and use your knowledge of the temporal order of the variables, or which variable change occurs first, and which variable change occurs afterwards.

    A classic example is the research design, where there is a treatment and an outcome. Patients are put into a treatment group, consisting of medication, versus a control group with no treatment, and the result is less stress. In this example, the IV is the treatment condition; the DV is stress. The treatment precedes the outcome because logically and in temporal order, the treatment precedes the stress reduction. IVs and DVs are usually easier to spot in experimental designs.

    Descriptive designs can make variable identification interesting. In descriptive designs, the intent is to describe relationships among variables. There is nothing manipulated as in an experimental design. Usually, these variables occur concurrently or at the same time, and it is not entirely clear which one precedes the other. For example, depression and substance abuse are associated. But which one is the IV, and which is the DV? Depression could be the IV because when people get depressed, they abuse substances, which is the outcome and hence the DV. Substance abuse could be the IV because when people abuse substances, the substance makes them depressed, or they feel depressed because of abusing substances. Substance abuse is the IV, and depression is the outcome, hence the DV. Both scenarios are correct. So, how do we decide which is the IV and which is the DV?

    The answer is to use your conceptualization of the issue. In this case, we will use the word “model.” Using a risk factor model, you might be interested in what risk factors are related to your DV. So, depression might be the IV because you want to know if depression is a risk factor that leads to people using inappropriate coping skills, such as alcohol. Using a drug reaction model, you might be interested in the effects of alcohol. So, depression might be the DV because you want to know if alcohol drinking, the type and amount of alcohol drinking, leads to depression as an effect of drinking. The guiding principle here is that variables do not just present themselves automatically as IVs and DVs. Once you have conceptualized the issue, you can decide which “it” is the IV or the DV.

    Guidelines help you decide which variables are the IV and the DV. Temporal order is one way to consider which variable is the IV or DV. Temporal order means which variable occurs first and which occurs second. The variable that comes first can be viewed as the IV, and the variable that comes second is the DV. An obvious example is treatment. Treatment comes first, then the outcome. So, treatment is the IV, and the outcome is the DV. Students’ study time comes first. Hence, the IV and grades are the outcome, hence the DV.

    Anything that precedes something is usually an IV. For example, personality is usually a trait that occurs before the outcome. Personality usually precedes career choice. Extroverted personalities tend to choose careers involving social interactions. In psychology, the quality of the therapy alliance usually precedes the client’s outcome.

    Associations between variables can be concurrent. Concurrent usually means happening at the same time. It can mean occurring together. For example, depression and alcohol might be occurring together, not necessarily in a sequence. Using a symptom model, you might be interested in how depression and alcohol co-occur because of something else. In a grieving context, depression and alcohol are both DV’s resulting from how a person might be coping with grief. If something like this scenario occurs, it may not matter which variable is the IV or DV.

    Anything “fixed” does not typically change on its own; it is usually an IV. Demographics are usually “fixed,” meaning they occur first and then the outcome. Gender, socio-economic status, and years of education are usually demographics and are IVs because they occur first or are “fixed.” These variables are generally not the outcome of something; hence, they are usually IVs.

    Anything that interests us as a goal or an outcome of interest is usually the DV. Symptom reduction, better relationships, academic outcomes, and abstinence from substances are goals that clients want and are usually the DV. These variables are usually the outcome of something that the client is doing. What is the client doing that leads to symptom reduction, better relationships, better academic outcomes, and abstaining from substance use?

    The preceding examples are meant to be intuitive. Under a gender identity fluid framework, someone will always argue that gender is an outcome of biological and socio-cultural influences. That is true. To provide typical examples and illustrate what the IV and DV are, the examples are meant to be intuitive and not meant for people to engage in extensive discussion about what predictors are and what outcomes are.

    Second Decision: How do you scale the variables: categorical or continuous? If continuous, ordinal, interval, or ratio?

    This step addresses the question of how the variable varies. The variable type is otherwise known as a scale of measurement. What is the scale that we use to measure the variable’s variation? This process is essential because we need a way to sort our observations. By sorting, we mean classifying, categorizing, and separating. If we do not sort, our observations become a big mush of variation. Whether we are observing temperature, gender, height, depression, alcohol use, racial discrimination, or suicide risk, we are sorting our observations. We need a framework to guide our sorting process.

    The framework starts here - remember there are only two types of variation: by type and by level. Everything we observe varies in only those two ways. Things are different in terms of the type of “it” or the level of “it.” Type is always expressed as “this” or “that.” And then by level, we usually mean by an amount of something, usually by a frequency count, or by the intensity of something, usually by severity. The level is always expressed as more or less of something.

    When creating these variables, you first want to identify the pile of observations you have about something. Let us take a broad question – does therapy work? That question involves a messy pile of observations: what is the therapy, what type of therapy, how long is the therapy, who provided the therapy, what is the mental health problem, who are the clients, what is the outcome, what are the symptoms, what are the goals? First, we sort these observations in the IV and the DV. Second, we sort these observations into variables. Third, we determine how these variables vary, and whether they vary by type or by level. This process is how we determine our variables.


    This page titled 3.2: How to Organize Variation With Variables is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Peter Ji.