Skip to main content
Statistics LibreTexts

10.2: Correlation Coefficient

  • Page ID
    58307
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)
    Learning Objectives
    • Define the correlation coefficient (r) and explain its purpose.
    • Describe how the correlation coefficient measures the strength and direction of a linear relationship between two variables.
    • Interpret the value of r on a scale from -1 to 1.
    • Identify that values of r near -1 or 1 indicate stronger linear relationships.

    Correlation in terms of two variables measures how much they move together. If one variable increases when the other does, they have a positive correlation. If one increases while the other decreases, they have a negative correlation. If they don’t affect each other, there’s no correlation. The correlation coefficient will be computed to measure the strength and direction of the correlation between two independent variables \(x\) and \(y\).

    Definition: Term

    The correlation coefficient, denoted as \(r\), measures the strength and direction of the linear relationship between two variables. Its purpose is to provide a numerical value that quantifies how closely the variables \(x\) and \(y\) are related. The range of \(r\) is from \(-1\) to \(1\).

    Correlation and Scatter Plots

    When the correlation coefficient \(r\) is near \(-1\), it indicates a strong negative linear relationship. As the x-values on the horizontal axis increase, the y-values on the vertical axis will decrease. The closer \(r\) is to \(-1\), the stronger the linear relationship. The shape of the scatter plot will appear to be linear. An example of a scatter plot with \(r\) close to \(-1\) is presented below.

    Scatter plot where there is a strong negative linear relationship.
    Figure \(\PageIndex{1}\): Scatter Plot Where There is a Strong Negative Linear Relationship

    When the correlation coefficient \(r\) is near \(0\), it indicates there isn't a linear relationship between \(x\) and \(y\). The scatter plot will have an amorphous shape when \(r\) is close to \(0\) as presented in the image below.

    Scatter plot where there is no linear relationship.
    Figure \(\PageIndex{2}\): Scatter Plot Where There is No Linear Relationship

    When the correlation coefficient \(r\) is near \(1\), it indicates a strong positive linear relationship. As the x-values on the horizontal axis increase, the y-values on the vertical axis will also increase. The closer \(r\) is to \(1\) the stronger the linear relationship. The shape of the scatter plot will appear to be linear. An example of a scatter plot with \(r\) close to \(1\) is presented below.

    Scatter plot where there is a strong positive linear relationship.
    Figure \(\PageIndex{3}\): Scatter Plot Where There is a Strong Positive Linear Relationship

    Correlation Coefficient Formula

    Definition: Formula to Compute Correlation Coefficient

    \(r = \dfrac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{[n(\sum x^2)-(\sum x)^2][n(\sum y^2)-(\sum y)^2]}}\)

    Where,

    • \(n\) = total number of pairs.
    • \(\sum x\) = sum of all x values.
    • \(\sum y\) = sum of all y values.
    • \(\sum x^2\) = sum of all the squares of the x values.
    • \(\sum y^2\) = sum of all the squares of the y values.
    • \(\sum xy\) = sum of all the products of the corresponding x and y values.
    Example \(\PageIndex{1}\)

    Professor Martinez is conducting a study to understand the relationship between the number of hours students study per week and their performance on the midterm exam in Math 400, an advanced calculus course at the university. She collects data from 8 randomly selected students in her class. The exam is out of 100 points and time is measured in hours per week. Compute the correlation coefficient for the data set below. Use the formula and round the final answer to three decimal places.

    Bivariate Data
    x: Hours Studied Per Week y: Midterm Exam Score (out of 100 points)
    10 51
    10 53
    12 64
    13 68
    14 71
    15 79
    16 84
    20 92

    Table \(\PageIndex{1}\): Hours Studied Per Week and Midterm Exam Score

    Solution
    1. Add three columns to the table for \(xy\), \(x^2\), and \(y^2\). Fill in the columns by performing the required computations.
    Bivariate Data With Expanded Columns
    \(x\) \(y\) \(xy\) \(x^2\) \(y^2\)
    10 51 510 100 2601
    10 53 530 100 2809
    12 64 768 144 4096
    13 68 884 169 4624
    14 71 994 196 5041
    15 79 1185 225 6241
    16 84 1344 256 7056
    20 92 1840 400 8464

    Table \(\PageIndex{2}\): Added Columns Needed for Computation of the Correlation Coefficient.

    1. Find the sum of each column.
    • \(n\) = 8
    • \(\sum x\) = 110
    • \(\sum y\) = 562
    • \(\sum x^2\) = 1590
    • \(\sum y^2\) = 40932
    • \(\sum xy\) = 8055
    1. Plug the information into the formula and compute using the order of operations.

    \(r = \dfrac{n(\sum xy)-(\sum x)(\sum y)}{\sqrt{[n(\sum x^2)-(\sum x)^2][n(\sum y^2)-(\sum y)^2]}}\)

    \(r = \dfrac{8(8055)-(110)(562)}{\sqrt{[8(1590)-(110)^2][8(40932)-(562)^2]}}\)

    \(r = \dfrac{64440-61820}{\sqrt{[620][11612]}}\)

    \(r = \dfrac{2620}{\sqrt{719944}}\)

    \(r = \dfrac{2620}{2683.17722}\)

    \(r \approx 0.976\)

    Correlation Coefficient and Technology

    Using technology such as a scientific calculator, graphing calculator, Microsoft Excel, or other computational tools is a more efficient way to calculate the correlation coefficient. These methods minimize the risk of errors during calculations and save time by eliminating the need for extensive intermediate computations. Below is an example of how to compute the correlation coefficient using the TI-84+ calculator. All future calculations will be done using the TI-84+ calculator.

    Example \(\PageIndex{2}\)

    Using the data from example 1, compute the correlation coefficient \(r\) using a TI-84+ calculator.

    Solution

    To turn on Stat Wizard on a TI-84+, follow these steps:

    • Press the [MODE] on your calculator.
    • Scroll down using the arrow keys until you find [STAT WIZARDS].
    • Highlight [ON] using the right arrow key.
    • Press [ENTER] to confirm your selection.
    1. Press the [STAT] button, make sure that [EDIT] and [1:Edit] are selected, then press [ENTER].
    Selecting edit Function in TI-84+.
    Figure \(\PageIndex{4}\): Selecting Edit Function in TI-84+
    1. Enter the x-values in List 1 [\(L_1\)] and the y-values in List 2 [\(L_2\)].
    Enter data into list one and list two in TI-84+.
    Figure \(\PageIndex{5}\): Enter Data into List One and List Two in TI-84+
    1. Press the [STAT] button again, use the right arrow to select [CALC], use the down arrow to select [8: LinReg(a+bx)], and then press [ENTER].
    Selection of linear regression function in TI-84+.
    Figure \(\PageIndex{6}\): Selection of Linear Regression Function in TI-84+
    1. Make sure that Xlist has \(L_1\) and the Ylist has \(L_2\). Use the down arrow to select [Calculate] and press [ENTER].
    Check screen to ensure the proper lists are selected.
    Figure \(\PageIndex{7}\): Check Screen to Ensure the Proper Lists are Selected
    1. On the output page, \(r\) will be on the last line. After rounding to three places values it is \(r = 0.976.\)
    Output of correlation coefficient r = 0.976.
    Figure \(\PageIndex{8}\): Output of Correlation Coefficient r = 0.976

    Exercises

    1. Compute the correlation coefficient for the data set below. Use the formula and round the final answer to three decimal places.
    Bivariate Data
    X: Hours of Sleep on the Night Before the Exam Y: Points (out of 100) Earned on the Exam
    8 75
    6 86
    3 72
    4 65
    2 68
    0 50
    12 90
    7 98
    10 84
    Table \(\PageIndex{3}\): Hours of Sleep Before Exam and Points Earned on the Exam

    Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
    MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.

    QR code linking to the MyOpenMath version of the question above with step-by-step guided problem-solving.

    1. Compute the correlation coefficient for the data set below. Use a graphing/scientific calculator and round the final answer to three decimal places.
    Bivariate Data
    Years (Time) Car Value ($)
    0 30,000
    1 25,000
    2 21,000
    3 18,000
    4 15,000
    5 13,000
    6 11,000
    7 9,000

    Figure \(\PageIndex{4}\): Years and Car Value in $

    Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
    MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.

    QR code linking to the MyOpenMath version of the question above with step-by-step guided problem-solving.

    Answers

    If you are an instructor and want the solutions to all the exercise questions for each section, please email Toros Berberyan.


    This page titled 10.2: Correlation Coefficient is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Toros Berberyan, Tracy Nguyen, and Alfie Swan.