12.4: Pearson’s r

Last updated
Save as PDF

Page ID: 7161

Foster et al.
University of Missouri-St. Louis, Rice University, & University of Houston, Downtown Campus via University of Missouri’s Affordable and Open Access Educational Resources Initiative

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

There are several different types of correlation coefficients, but we will only focus on the most common: Pearson’s \(r\). \(r\) is a very popular correlation coefficient for assessing linear relations, and it serves as both a descriptive statistic (like \(\overline{X}\)) and as a test statistic (like \(t\)). It is descriptive because it describes what is happening in the scatterplot; \(r\) will have both a sign (+/–) for the direction and a number (0 – 1 in absolute value) for the magnitude. As noted above, assumes a linear relation, so nothing about \(r\) will suggest what the form is – it will only tell what the direction and magnitude would be if the form is linear (Remember: always make a scatterplot first!). \(r\) also works as a test statistic because the magnitude of \(r\) will correspond directly to a \(t\) value as the specific degrees of freedom, which can then be compared to a critical value. Luckily, we do not need to do this conversion by hand. Instead, we will have a table of \(r\) critical values that looks very similar to our \(t\) table, and we can compare our \(r\) directly to those.

The formula for \(r\) is very simple: it is just the covariance (defined above) divided by the standard deviations of \(X\) and \(Y\):

\[r=\dfrac{\operatorname{cov}_{X Y}}{s_{X} s_{Y}}=\dfrac{S P}{\sqrt{S S X * S S Y}} \]

The first formula gives a direct sense of what a correlation is: a covariance standardized onto the scale of \(X\) and \(Y\); the second formula is computationally simpler and faster. Both of these equations will give the same value, and as we saw at the beginning of the chapter, all of these values are easily computed by using the sum of products table. When we do this calculation, we will find that our answer is always between -1.00 and 1.00 (if it’s not, check the math again), which gives us a standard, interpretable metric, similar to what \(z\)-scores did.

It was stated earlier that \(r\) is a descriptive statistic like \(\overline{X}\), and just like \(\overline{X}\), it corresponds to a population parameter. For correlations, the population parameter is the lowercase Greek letter \(ρ\) (“rho”); be careful not to confuse \(ρ\) with a \(p\)-value – they look quite similar. \(r\) is an estimate of \(ρ\) just like \(\overline{X}\) is an estimate of \(μ\). Thus, we will test our observed value of \(r\) that we calculate from the data and compare it to a value of \(ρ\) specified by our null hypothesis to see if the relation between our variables is significant, as we will see in our example next.