Skip to main content
Statistics LibreTexts

13.2: Pearson's r

  • Page ID
    56666
    • Chanler Hilley, Kennesaw State University
    • University of Missouri System

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    There are several different types of correlation coefficients, but we will only focus on Pearson’s r, the most popular correlation coefficient for assessing linear relationships, which serves as both a descriptive statistic (like M) and a test statistic (like t). It is descriptive because it describes what is happening in the scatter plot; r will have both a sign (+/−) for the direction and a number (0 to 1 in absolute value) for the magnitude. As noted above, because it assumes a linear relationship, nothing about r will suggest what the form is—it will only tell what the direction and magnitude will be if the form is linear. (Remember: always make a scatter plot first!) The coefficient r also works as a test statistic because the magnitude of r will correspond directly to a t value as the specific degrees of freedom, which can then be compared to a critical value. Luckily, we do not need to do this conversion by hand. Instead, we will have a table of r critical values (in section 16.4) that looks very similar to our t table, and we can compare our r directly to those.

    The formula for r is very simple: it is just the covariance (defined above) divided by the standard deviations of X and Y:

    \[
    \Large
    r = \frac{\text{cov}_{XY}}{s_Xs_Y} = \frac{SP}{\sqrt{(SS_X)(SS_Y)}}
    \nonumber
    \]

    The first formula gives a direct sense of what a correlation is: a covariance standardized onto the scale of X and Y; the second formula is computationally simpler and faster. Both of these equations will give the same value, and as we saw at the beginning of the chapter, all of these values are easily computed by using the sum of products table. When we do this calculation, we will find that our answer is always between −1.00 and 1.00 (if it’s not, check the math again), which gives us a standard, interpretable metric, similar to what z scores did.

    Video: How to do a Pearson Correlation by Hand

    How to do a Pearson Correlation by Hand on YouTube. Note that the equation in this video looks a little different than what is presented in the textbook; it will result in the same answer, but it combines the different components into a single equation.

    It was stated earlier that r is a descriptive statistic like M, and just like M, it corresponds to a population parameter. For correlations, the population parameter is the lowercase Greek letter \(\rho\) (“rho”); be careful not to confuse \(\rho\) with a p value—they look quite similar. The statistic r is an estimate of \(\rho\), just as M is an estimate of \(\mu\). Thus, we will test our observed value of r that we calculate from the data and compare it to a value of \(\rho\) specified by our null hypothesis to see if the relationship between our variables is significant, as we will see in the following example.

    Example: Anxiety and Depression

    Anxiety and depression are often reported to be highly linked (or “comorbid”). Our hypothesis testing procedure follows the same four-step process as before, starting with our null and alternative hypotheses. We will look for a positive relationship between our variables among a group of 10 people because that is what we would expect based on them being comorbid.

    Step 1: State the Hypotheses

    Our hypotheses for correlations start with a baseline assumption of no relationship, and our alternative will be directional if we expect to find a specific type of relationship. For this example, we expect a positive relationship:

    \[
    \begin{aligned}
    H_0&: \text{There is no relationship between anxiety and depression} \\
    H_0&: \rho = 0 \\[2.5ex]
    H_A&: \text{There is a positive relationship between anxiety and depression} \\
    H_A&: \rho > 0 \\[2.5ex]
    \end{aligned}
    \nonumber
    \]

    Remember that \(\rho\) (“rho”) is our population parameter for the correlation that we estimate with r, just like M and \(\mu\) for means. Remember also that if there is no relationship between variables, the magnitude will be 0, which is where we get the null and alternative hypothesis values.

    Step 2: Find the Critical Values

    The critical values for correlations come from the correlation table (a portion of which appears in Table \(\PageIndex{1}\)), which looks very similar to the t table. (The complete correlation table can be found in section 16.4.) Just like our t-table, the column of critical values is based on our significance level (\(\alpha\)) and the directionality of our test. The row is determined by our degrees of freedom. For correlations, we have N − 2 degrees of freedom, rather than N − 1 (why this is the case is not important). For our example, we have 10 people, so our degrees of freedom = 10 − 2 = 8.

    Table \(\PageIndex{1}\): Snippet from Critical Values for Pearson’s r (Correlation Table). Each column contains two levels of significance, separated by a slash (/); these two values refer to the level of significance for a one-tailed test / the level of significance for a two-tailed test.
    df (n - 2) .05 / .10 .025 / .05 .01 / .02 .005 / .01
    1 0.988 0.997 0.9995 0.9999
    2 0.9 0.95 0.98 0.99
    3 0.805 0.878 0.934 0.959
    4 0.729 0.811 0.882 0.917
    5 0.669 0.754 0.833 0.875
    6 0.621 0.707 0.789 0.834
    7 0.582 0.666 0.75 0.798
    8 0.549 0.632 0.715 0.765
    9 0.521 0.602 0.685 0.735
    10 0.497 0.576 0.658 0.708
    11 0.476 0.553 0.634 0.684
    12 0.458 0.532 0.612 0.661
    13 0.441 0.514 0.592 0.641
    14 0.426 0.497 0.574 0.623
    15 0.412 0.482 0.558 0.606

    We were not given any information about the level of significance at which we should test our hypothesis, so we will assume \(\alpha = .05\) as always. From our table, we can see that a one-tailed test (because we expect only a positive relationship) at the \(\alpha = .05\) level has a critical value of r* = .549. Thus, if our observed correlation is greater than .549, it will be statistically significant. This is a rather high bar (remember, the guideline for a strong relationship is r = .50); this is because we have so few people. Larger samples make it easier to find significant relationships.

    Step 3: Calculate the Test Statistic and Effect Size

    We have laid out our hypotheses and the criteria we will use to assess them, so now we can move on to our test statistic. Before we do that, we must first create a scatter plot of the data to make sure that the most likely form of our relationship is in fact linear. Figure \(\PageIndex{2}\) shows our data plotted out, and it looks like they are, in fact, linearly related, so Pearson’s r is appropriate.

    Scatter plot with points showing a positive correlation between depression (x-axis) and anxiety (y-axis), with values ranging from 1 to 6 on both axes.
    Figure \(\PageIndex{2}\): Scatter plot of depression and anxiety. (“Scatter Plot Depression and Anxiety” by Judy Schmitt is licensed under CC BY-NC-SA 4.0.)

    The data we gather from our participants is shown in Table \(\PageIndex{2}\).

    Table \(\PageIndex{2}\): Participant Scores for Depression and Anxiety
    Depression Anxiety
    2.81 3.54
    1.93 3.05
    3.43 3.81
    3.40 3.43
    4.71 4.03
    1.80 3.59
    4.27 4.17
    3.68 3.46
    2.44 3.19
    3.13 4.12

    We will need to put these values into our sum of products table (Table \(\PageIndex{2}\)) to calculate the standard deviation and covariance of our variables. We will use X for depression and Y for anxiety to keep track of our data, but be aware that this choice is arbitrary, and the math will work out the same if we decided to do the opposite. Our table is thus:

    Table \(\PageIndex{2}\): Sum of Products Table
    X (X − MX) (X − MX)2 Y (Y − MY) (Y − MY)2 (X − MX)(Y − MY)
    2.81 −0.35 0.12 3.54 −0.10 0.01 0.04
    1.96 −1.20 1.44 3.05 −0.59 0.35 0.71
    3.43 0.27 0.07 3.81 0.17 0.03 0.05
    3.40 0.24 0.06 3.43 −0.21 0.04 −0.05
    4.71 1.55 2.40 4.03 0.39 0.15 0.60
    1.80 −1.36 1.85 3.59 −0.05 0.00 0.07
    4.27 1.11 1.23 4.17 0.53 0.28 0.59
    3.68 0.52 0.27 3.46 −0.18 0.03 −0.09
    2.44 −0.72 0.52 3.19 −0.45 0.20 0.32
    3.13 −0.03 0.00 4.12 0.48 0.23 −0.01
    ΣX=31.63 Σ(X - MX)=0.03 Σ(X - MX)2=7.97 ΣY=36.39 Σ(Y - MY)=−0.01 Σ(Y - MY)2=1.33 Σ(X - MX)(Y - MY)=2.22

    The bottom row is the sum of each column. We can see from this that the sum of the X observations is 31.63, which makes the mean of the X variable \(M=3.16\). The deviation scores for X sum to 0.03, which is very close to 0, given rounding error, so everything looks right so far. The next column is the squared deviations for X, so we can see that the sum of squares for X is SSX = 7.97. The same is true of the Y columns, with an average of \(M_Y=3.64\), deviations that sum to zero within rounding error, and a sum of squares as \(SS_Y=1.33\). The final column is the product of our deviation scores (not of our squared deviations), which gives us a sum of products of \(SP=2.22\).

    There are now three pieces of information we need to calculate before we compute our correlation coefficient: the covariance of X and Y and the standard deviation of each.

    The covariance of two variables, remember, is the sum of products divided by N − 1. For our data:

    \[
    \Large
    \text{cov}_{XY}= \frac{SP}{N-1} = \frac{2.22}{9} = 0.25
    \nonumber
    \]

    The formulas for standard deviation are the same as before. Using subscripts X and Y to denote depression and anxiety:

    \[
    \Large
    \begin{aligned}
    s_X&=&\sqrt{\frac{\sum{(X-M_X)^2}}{N-1}}=\sqrt{\frac{7.97}{9}}=0.94 \\[2.5ex]
    s_Y&=&\sqrt{\frac{\sum{(Y-M_Y)^2}}{N-1}}=\sqrt{\frac{1.33}{9}}=0.38
    \end{aligned}
    \nonumber
    \]

    Now we have all of the information we need to calculate r:

    \[
    \Large
    r=\frac{\text{cov}_{XY}}{s_Xs_Y}=\frac{0.25}{(0.94)(0.38)} = .70
    \nonumber
    \]

    We can verify this using our other formula, which is computationally shorter:

    \[
    \Large
    r = \frac{SP}{\sqrt{(SS_X)(SS_Y)}} = \frac{2.22}{\sqrt{(7.97)(1.33)}} = .70
    \nonumber
    \]

    So our observed correlation between anxiety and depression is r = .70, which, based on sign and magnitude, is a strong, positive correlation. Now we need to compare it to our critical value to see if it is also statistically significant.

    Effect Size and Pearson’s r

    Pearson’s r is an incredibly flexible and useful statistic. Not only is it both descriptive and inferential, as we saw above, but because it is on a standardized metric (always between −1.00 and 1.00), it can also serve as its own effect size. In general, we use r = .10, r = .30, and r = .50 as our guidelines for small, medium, and large effects. Just like with Cohen’s d, these guidelines are not absolutes, but they do serve as useful indicators in most situations. Notice as well that these are the same guidelines we used earlier to interpret the magnitude of the relationship based on the correlation coefficient.

    In addition to r being its own effect size, there is an additional effect size we can calculate for our results. This effect size is r2, and it is exactly what it looks like—it is the squared value of our correlation coefficient. Just like \(\eta^2\) in ANOVA, r2 is interpreted as the amount of variance explained in the outcome variance, and the cut scores are the same as well: .01, .09, and .25 for small, medium, and large, respectively. Notice here that these are the same cutoffs we used for regular r effect sizes, but squared (.102 = .01, .302 = .09, .502 = .25) because, again, the r2 effect size is just the squared correlation, so its interpretation should be, and is, the same. The reason we use r2 as an effect size is because our ability to explain variance is often important to us.

    The similarities between \(\eta^2\) and r2 in interpretation and magnitude should clue you in to the fact that they are similar analyses, even if they look nothing alike. That is because, behind the scenes, they actually are! In the next chapter, we will learn a technique called linear regression, which will formally link the two analyses together.

    Step 4: Make a Decision

    Our critical value was r* = .549, and our obtained value was r = .70. Our obtained value was larger than our critical value, so we can reject the null hypothesis.

    Reject \(H_0\). Based on our sample of 10 people, there is a statistically significant, strong, positive relationship between anxiety and depression, r(8) = .70, p < .05.

    Notice in our interpretation that, because we already know the magnitude and direction of our correlation, we can interpret that. We also report the degrees of freedom, just like with t, and we know that \(p<\alpha\) because we rejected the null hypothesis. As we can see, even though we are dealing with a very different type of data, our process of hypothesis testing has remained unchanged. Unlike for our other statistics, we do not report an effect size for the correlation coefficient because the reader can easily do that for themselves by squaring r. The r2 statistic is called the coefficient of determination and is essentially an effect size for a correlation coefficient; it tells us what percentage of the variance in the X variable is explained by the Y variable (and vice versa).

    Figure \(\PageIndex{3}\) shows the output from JASP for this example.

    Table showing Pearson’s correlations: Depression and Anxiety have a correlation coefficient of 0.680 with a p-value of 0.015. Note: All tests are one-tailed, for positive correlation.
    Figure \(\PageIndex{3}\): Output from JASP for the correlation described in the Anxiety and Depression example. (“JASP correlation” by Rupa G. Gordon/Judy Schmitt is licensed under CC BY-NC-SA 4.0.)
    Test Your Knowledge

    Question \(\PageIndex{1}\)

    Question \(\PageIndex{2}\)


    This page titled 13.2: Pearson's r is shared under a not declared license and was authored, remixed, and/or curated by Chanler Hilley, Kennesaw State University via source content that was edited to the style and standards of the LibreTexts platform.