Skip to main content

Registration is now open for this year's LibreFest! Join us virtually the week of July 13.

Register here
Statistics LibreTexts

9.3.1: Interpretation of r-squared

  • Page ID
    65615
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    We now know how to calculate and interpret \( r \), the correlation coefficient. But statisticians often report a closely related number,\( r^2 \), called the coefficient of determination. In many ways, \( r^2 \) is even more useful for communicating how well a linear model fits the data.


    What Is \( r^2 \)?

    The coefficient of determination, denoted \( r^2 \), is the square of the correlation coefficient. It tells us the proportion of variability in the response variable \( y \) that is explained by the linear relationship with the explanatory variable \( x \).

    • \( r^2 \) is always between 0 and 1 (or equivalently, 0% and 100%)
    • A value of \( r^2 = 0.85 \) means that 85% of the variation in \( y \) is explained by its linear relationship with \( x \)
    • The remaining percentage (here, 15%) is variation left unexplained by the model due to other variables, randomness, or non-linearity

    Notice that because we are squaring \( r \), \( r^2 \) is always non-negative. In a similar manner to standard deviation, it tells us how much of the variation is explained, but not the direction. You still need \( r \) (or the slope of the regression line) to determine whether the association is positive or negative.


    Where Does "Explained Variation" Come From?

    To understand what \( r^2 \) is actually measuring, it helps to think about two different ways of predicting \( y \).

    Without the regression model: If someone asked you to predict a student's exam score and you had no information about study hours, your best guess would simply be the mean: \( \bar{y} \). You'd be wrong by varying amounts for each student, and we can measure that total spread as the total variation in \( y \).

    With the regression model: Once we know a student's study hours, we can use \( \hat{y} = a + bx \) to make a better prediction. The model accounts for some of that spread. The variation that is still unexplained after using the model is captured by the residuals.

    \( r^2 \) is the fraction of the total variation that the model does explain:

    \[ r^2 = \frac{\text{variation explained by the model}}{\text{total variation in } y} = 1 - \frac{\text{unexplained variation}}{\text{total variation in } y} \]


    Example: Study Hours and Exam Scores

    From our worked example in Section 9.3, we found \( r \approx 0.99 \). Squaring this:

    \[ r^2 = (0.99)^2 \approx 0.98 \]

    Interpretation: About 98% of the variation in exam scores among these 10 students is explained by their linear relationship with study hours. Only about 2% of the variation in scores is left unexplained — due to factors our model doesn't account for, such as prior knowledge, test anxiety, or sleep.

    This is a very high \( r^2 \), which makes sense given how tightly the points hugged the regression line in our scatterplot.


    More Examples: Interpreting \( r^2 \) in Context

    Interpreting \( r^2 \) in Context
    Context \( r \) \( r^2 \) Plain-language interpretation
    Study hours → exam score 0.99 0.98 98% of variation in scores is explained by study hours. Strong model.
    TV hours → GPA −0.62 0.38 38% of variation in GPA is explained by TV hours. Moderate — many other factors matter.
    Height → arm span 0.97 0.94 94% of variation in arm span is explained by height. Very tight linear relationship.
    Shoe size → math score 0.01 0.0001 Less than 0.01% of variation in math scores is explained by shoe size. Essentially no relationship.

    \( r \) vs. \( r^2 \): Which Should You Report?

    Both \( r \) and \( r^2 \) are useful, but they answer slightly different questions:

    When to Use \( r \) vs. \( r^2 \)
    Use \( r \) when… Use \( r^2 \) when…
    You want to describe the direction and strength of the linear association You want to describe how much of the variability in \( y \) the model accounts for
    You are comparing two correlations to see which is stronger You are explaining the model to a non-technical audience ("our model explains 80% of the variation")
    Direction (positive or negative) matters to your conclusion You want to assess the practical usefulness of a prediction model

    In regression output from calculators and software, you will usually see both reported. Get comfortable reading and interpreting each one.


    A Common Mistake: Confusing \( r \) and \( r^2 \)

    Watch out: It is easy to accidentally swap \( r \) and \( r^2 \) when writing interpretations. Here are two statements about the same data — only one is correct:

    • ❌   "The correlation \( r = 0.80 \) means that 80% of the variation in \( y \) is explained by \( x \)."
    • ✅   "The correlation \( r = 0.80 \), so \( r^2 = 0.64 \), meaning 64% of the variation in \( y \) is explained by \( x \)."

    The "proportion of variation explained" language always belongs to \( r^2 \), never to \( r \) directly.


    Visualizing \( r^2 \): What Does "Explained Variation" Look Like?

    The interactive plot below shows a fixed regression line and a set of student data points from our study hours example. For any point you select, it illustrates the two components of variation:

    • Total deviation (purple): the distance from the point to \( \bar{y} \) — how far off the mean alone would have been
    • Explained deviation (green): the distance from \( \hat{y} \) to \( \bar{y} \) — how much the regression line improved the prediction
    • Residual (red): the distance from the point to \( \hat{y} \) — what's left unexplained

    Notice that:   Total deviation = Explained deviation + Residual

    Click on any data point to see its deviation breakdown.


    How Good Is "Good Enough"?

    There is no single universal threshold for what counts as a "good" \( r^2 \). It depends entirely on the field and context:

    • In physics or engineering, where variables have precise, controllable relationships, \( r^2 \geq 0.99 \) is common and expected.
    • In education or psychology, where human behavior involves many unpredictable factors, \( r^2 \approx 0.40 \) might be considered a meaningful result.
    • In economics or social science, values anywhere from 0.20 to 0.70 are routinely reported and considered informative.

    The key question is not "Is \( r^2 \) high?" but rather "Does this model explain enough variation to be useful for our purpose?"


    Think About It:
    • If \( r = -0.90 \) and \( r = 0.90 \), which model explains more variation in \( y \)? How do you know?
    • A researcher reports \( r^2 = 0.25 \) for a study predicting depression scores from hours of sleep. Is this model useless? What does it tell us?
    • Why can't we determine the direction of a linear relationship from \( r^2 \) alone?
    • Using the interactive visualization above, find a point whose explained deviation is larger than its residual. What does that tell you about how the model is performing for that student?

    With \( r \) and \( r^2 \) in hand, we have a complete toolkit for describing a linear relationship: its direction, its strength, and how much of the story it tells. In the next section, we turn to the full least-squares regression line — using these ideas to build a model we can use for prediction.


    9.3.1: Interpretation of r-squared is shared under a not declared license and was authored, remixed, and/or curated by LibreTexts.

    • Was this article helpful?