# 3.25: Causation and Lurking Variables (1 of 2)

$$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$

$$\newcommand{\id}{\mathrm{id}}$$ $$\newcommand{\Span}{\mathrm{span}}$$

( \newcommand{\kernel}{\mathrm{null}\,}\) $$\newcommand{\range}{\mathrm{range}\,}$$

$$\newcommand{\RealPart}{\mathrm{Re}}$$ $$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$

$$\newcommand{\Argument}{\mathrm{Arg}}$$ $$\newcommand{\norm}[1]{\| #1 \|}$$

$$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$$

$$\newcommand{\Span}{\mathrm{span}}$$

$$\newcommand{\id}{\mathrm{id}}$$

$$\newcommand{\Span}{\mathrm{span}}$$

$$\newcommand{\kernel}{\mathrm{null}\,}$$

$$\newcommand{\range}{\mathrm{range}\,}$$

$$\newcommand{\RealPart}{\mathrm{Re}}$$

$$\newcommand{\ImaginaryPart}{\mathrm{Im}}$$

$$\newcommand{\Argument}{\mathrm{Arg}}$$

$$\newcommand{\norm}[1]{\| #1 \|}$$

$$\newcommand{\inner}[2]{\langle #1, #2 \rangle}$$

$$\newcommand{\Span}{\mathrm{span}}$$ $$\newcommand{\AA}{\unicode[.8,0]{x212B}}$$

$$\newcommand{\vectorA}[1]{\vec{#1}} % arrow$$

$$\newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$$

$$\newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vectorC}[1]{\textbf{#1}}$$

$$\newcommand{\vectorD}[1]{\overrightarrow{#1}}$$

$$\newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}}$$

$$\newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}}$$

$$\newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} }$$

$$\newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}}$$

Learning Objectives

• Distinguish between association and causation. Identify lurking variables that may explain an observed relationship.

## Introduction

A common mistake people make when describing the relationship between two quantitative variables is that they confuse association and causation. This mistake is so common that we devote this entire section to clarifying the difference.

This confusion often occurs when there is a strong relationship between the two quantitative variables. In the case of a linear relationship, people mistakenly interpret an r-value that is close to 1 or -1 as evidence that the explanatory variable causes changes in the response variable. In this case, the correct interpretation is that there is a statistical relationship between the variables, not a causal link. In other words, the explanatory variable and the response variable vary together in a predictable way. There is an association between the variables. But this should not be interpreted as a cause-and-effect relationship.

Let’s look at an example.

## Fire Damage

The scatterplot below shows the relationship between the number of firefighters sent to fires (x) and the amount of damage caused by fires (y) in a certain city.

The scatterplot shows a positive association with a somewhat strong curvilinear form. An increase in the number of firefighters is associated with an increase in the damage done by the fire.

Can we conclude that the increase in firefighters causes the increase in damage? Of course not.

A third variable is at play in the background – the seriousness of the fire – and is responsible for the observed relationship. More serious fires require more firefighters and also result in more damage.