Skip to main content
Statistics LibreTexts

12.1: Correlation

  • Page ID
    24075
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    We are often interested in the relationship between two variables. This chapter determines whether a linear relationship exists between sets of quantitative data and making predictions for a population—for instance, the relationship between the number of hours of study time and an exam score, or smoking and heart disease.

    A predictor variable (also called the independent or explanatory variable; usually we use the letter \(x\)) explains or causes changes in the response variable. The predictor variable can be manipulated or changed by the researcher.

    A response variable (also called the dependent variable; usually we use the letter \(y\)) measures the outcome of a study. The different outcomes for a dependent variable are measured or observed by the researcher. For instance, suppose we are interested in how much time spent studying affects the scores on an exam. In this study, study time is the predictor variable, and exam score is the response variable.

    In data from an experiment, it is much easier to know which variable we should use for the independent and dependent variables. This can be harder to distinguish in observational data. Think of the dependent variable as the variable that you are trying to learn about.

    If we were observing the relationship between unemployment rate and economic growth rate, it may not be clear which variable should be \(x\) and \(y\). Do we want to predict the unemployment rate or the economic growth rate? One should never jump to a cause and effect reasoning with observational data. Just because there is a strong relationship between unemployment rate and economic growth rate does not mean that one causes the other to change directly. There may be many other contributing factors to both of these rates changing at the same time, such as retirements or pandemics.


    This page titled 12.1: Correlation is shared under a CC BY-SA 4.0 license and was authored, remixed, and/or curated by Rachel Webb via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.

    • Was this article helpful?