14.2.1: Introduction to Pearson’s r

Last updated
Save as PDF

Page ID: 22146

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

There are several different types of correlation coefficients, but we will only focus on the most common: Pearson’s \(r\). \(r\) is a very popular correlation coefficient for assessing linear relations, and it serves as both a descriptive statistic (like \(\overline{X}\)) and as an inferential test statistic (like \(t\)). It is descriptive because it describes what is happening in the scatterplot; \(r\) will have both a sign (plus for positive or minus for negative) for the direction and a number (from -1.00 to 1.00) for the magnitude (strength). As noted above, Pearson's \(r\) assumes a linear relation, so nothing about \(r\) will suggest what the shape the dots tend towards; the correlation statistic will only tell what the direction and magnitude would be if the form is linear. Always make a scatterplot!

\(r\) also works as a test statistic because the magnitude of \(r\) will correspond directly to a \(t\) value as the specific degrees of freedom, which can then be compared to a critical value. We will again have a table of \(r\) critical values that we can compare our \(r\) directly to those to decide if we retain or reject the null hypothesis.

The formula for \(r\) is very simple: it is just the covariance (defined above) divided by the standard deviations of \(X\) and \(Y\):

\[r=\dfrac{\operatorname{cov}_{X Y}}{s_{X} s_{Y}} \nonumber \]

The first formula gives a direct sense of what a correlation is: a covariance standardized onto the scale of \(X\) and \(Y\); the second formula is computationally simpler and faster. Both of these equations will give the same value, and as we saw at the beginning of the chapter, all of these values are easily computed if you use a sum of products table (which will be discussed later). When we do this calculation, we will find that our answer is always between -1.00 and 1.00 (if it’s not, check the math again), which gives us a standard, interpretable metric.