9.3: Calculating r
- Page ID
- 65609
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)In the previous section, we learned what the correlation coefficient \( r \) means and how to interpret it. In this section we will dig deeper into how it is calculated.
We'll use the same study hours and exam scores dataset from Section 9.1 so that everything feels familiar.
The Formula for \( r \)
The correlation coefficient \( r \) can be calculated using the following formula:
\[ r = \frac{1}{n-1} \sum_{i=1}^{n} \left( \frac{x_i - \bar{x}}{s_x} \right) \left( \frac{y_i - \bar{y}}{s_y} \right) \]
Where:
- \( n \) is the number of data pairs
- \( x_i \) and \( y_i \) are individual data values
- \( \bar{x} \) and \( \bar{y} \) are the means of \( x \) and \( y \)
- \( s_x \) and \( s_y \) are the standard deviations of \( x \) and \( y \)
Note each term \( \left( \frac{x_i - \bar{x}}{s_x} \right) \); this is simply the \(z\)-score for that particular data point! We are standardizing our \(x_i\) and \(y_i\) data points, multiplying them together, and then averaging those products. By using the \(z\)-score, we can ignore the individual variation on the \(x\) and \(y\) data sets themselves, and just look at how they vary together.
If \(y\) tends to increase when \(x\) does, then each will have a larger \(z\)-score and we end up with a bigger number. If \(y\) tends to get smaller instead, then we end up with the product of a positive \(z\)-score for \(x\) and a negative for \(y\), yielding a negative quantity. It goes beyond the scope of this course, but since the \(z\)-score is relative to the mean, there is a guarantee that the average of all these will be between -1 and 1.
Worked Example: Study Hours and Exam Scores
Let's calculate \( r \) step by step using our 10-student dataset. The scatterplot below shows all 10 students. The dashed crosshairs mark the mean point \( (\bar{x}, \bar{y}) = (4.3,\ 67.9) \), dividing the plot into four quadrants. Points in the upper-right and lower-left quadrants will contribute positive products to \( r \); points in the upper-left and lower-right will contribute negative ones.

Figure 9.3: Exam Scores vs. Study Hours with regression line
Step 1: Recall the Data and Means
From the earlier section we already know:
- \( \bar{x} = 4.3 \) (mean study hours)
- \( \bar{y} = 67.9 \) (mean exam score)
- \( s_x = 2.31 \) (standard deviation of study hours)
- \( s_y = 11.09 \) (standard deviation of exam scores)
Step 2: Compute z-scores and Their Products
For each student, we compute the standardized score for \( x \) and for \( y \), then multiply them together.
| \( x \) | \( y \) | \( x_i - \bar{x} \) | \( y_i - \bar{y} \) | \( z_x = \frac{x_i - \bar{x}}{s_x} \) | \( z_y = \frac{y_i - \bar{y}}{s_y} \) | \( z_x \cdot z_y \) |
|---|---|---|---|---|---|---|
| 1 | 52 | −3.3 | −15.9 | −1.43 | −1.43 | 2.05 |
| 2 | 55 | −2.3 | −12.9 | −0.99 | −1.16 | 1.16 |
| 2 | 59 | −2.3 | −8.9 | −0.99 | −0.80 | 0.80 |
| 3 | 61 | −1.3 | −6.9 | −0.56 | −0.62 | 0.35 |
| 4 | 66 | −0.3 | −1.9 | −0.13 | −0.17 | 0.02 |
| 5 | 73 | 0.7 | 5.1 | 0.30 | 0.46 | 0.14 |
| 5 | 71 | 0.7 | 3.1 | 0.30 | 0.28 | 0.08 |
| 6 | 77 | 1.7 | 9.1 | 0.74 | 0.82 | 0.60 |
| 7 | 80 | 2.7 | 12.1 | 1.17 | 1.09 | 1.27 |
| 8 | 85 | 3.7 | 17.1 | 1.60 | 1.54 | 2.47 |
| Sum of \( z_x \cdot z_y \) | 8.94 | |||||
Step 3: Apply the Formula
We have \( n = 10 \) data pairs and a sum of products equal to 8.94. Plugging into the formula:
\[ r = \frac{1}{n-1} \sum z_x \cdot z_y = \frac{1}{10-1} \times 8.94 = \frac{8.94}{9} \approx 0.99 \]
This confirms a very strong positive linear relationship between study hours and exam scores; consistent with what we saw in the scatterplot.
What the Formula Is Really Doing
It helps to think about the formula visually. Imagine drawing your scatterplot and then placing the mean point \( (\bar{x}, \bar{y}) \) at the center. This divides the plot into four quadrants:
\( x > \bar{x} \) and \( y > \bar{y} \)
Both z-scores positive, so product positive
\( x < \bar{x} \) and \( y > \bar{y} \)
z-scores mixed, so product negative
\( x < \bar{x} \) and \( y < \bar{y} \)
Both z-scores negative, so product positive
\( x > \bar{x} \) and \( y < \bar{y} \)
z-scores mixed, so product negative
If most points fall in Quadrants I and III, the positive products dominate and \( r \) is positive. If most fall in Quadrants II and IV, the negative products dominate and \( r \) is negative. If the points are evenly spread across all four quadrants, the products cancel out and \( r \) is near zero.
Properties of \( r \) — A Summary
| Property | What it means |
|---|---|
| Always between –1 and 1 | \( -1 \leq r \leq 1 \). A value outside this range signals a calculation error. |
| Sign gives direction | Positive \( r \) = positive association; negative \( r \) = negative association. |
| Magnitude gives strength | \( |r| \) close to 1 is strong; close to 0 is weak. There is no universal cutoff, but \( |r| \geq 0.8 \) is often described as strong. |
| Symmetric | The correlation of \( x \) with \( y \) equals the correlation of \( y \) with \( x \). |
| Not affected by units | Because we use z-scores, \( r \) is the same whether height is in cm or inches. |
| Measures only linear association | A curved relationship can have \( r \approx 0 \) yet still be a strong pattern. Always plot your data. |
| Sensitive to outliers | A single extreme point can pull \( r \) toward –1 or 1. Check your scatterplot before trusting \( r \) alone. |
Using Technology to Calculate \( r \)
In practice, you will almost never compute \( r \) by hand from a large dataset. Calculators and statistical software handle the arithmetic for you. Here is how to find \( r \) using common tools:
| Tool | Steps |
|---|---|
| TI-84 Calculator | Enter data in L1 and L2 → STAT → CALC → 8: LinReg(a+bx) → include L1, L2 → the output shows \( r \) and \( r^2 \). (DiagnosticOn must be enabled: 2nd → CATALOG → DiagnosticOn → ENTER.) |
| Google Sheets | Enter x-values in column A and y-values in column B, then type =CORREL(A:A, B:B) in any empty cell. You can also click and select your specific data columns one at a time. |
| Excel | Same as Google Sheets: =CORREL(A:A, B:B). |
| Desmos | Enter a table of x and y values. You should see an icon on the top left of the cell with a line passing through dots. Clicking this allows you to see the regression. |
Interactive: Explore How Points Affect \( r \)
Click anywhere in the plot area below to add data points. The app will calculate and display \( r \) for your set of points in real time, so you can see how the arrangement of points changes the correlation.
Try building a tight upward cloud, then a scattered one, then add a single outlier far from the rest, and watch what happens to \( r \).
Click the plot to add points.
- What would happen to \( r \) if you swapped the x and y columns? Try it using the formula.
- If you rescaled all the x values (say, converted hours to minutes), would \( r \) change? Why or why not?
- Using the interactive plot above, can you build a dataset of 5 points with \( r \) close to –1? Close to 0?
Now that we know how to calculate and interpret \( r \), the next step is to evaluate its statistical significance with hypothesis tests.

