Skip to main content
Statistics LibreTexts

10.6: Coefficient of Determination and the Standard Error of the Estimate

  • Page ID
    52845
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)
    Learning Objectives
    • Define the coefficient of determination and explain its role in evaluating regression models.
    • Interpret the coefficient of determination as a measure of how well the regression line explains variation in the dependent variable.
    • Recognize that values closer to 1 indicate a better model fit.
    • Define the standard error of the estimate and describe its purpose in measuring prediction accuracy.
    • Interpret smaller standard error values as indicators of more accurate predictions and a better model fit.

    Two key metrics, the coefficient of determination and the standard error of the estimate, are computed to assess the quality of the line of regression. The coefficient of determination, often denoted as \(r^2\) and typically written as a percentage, is a statistical measure that helps assess the quality of a regression model. It provides insights into how well the independent variables explain the variance in the dependent variable. The standard error of the estimate, denoted as \(SE\), measures how much the actual data points in a regression model differ from the predicted values. It simply tells you how far off your predictions are, on average.

    Coefficient of Determination

    The formula below is used to calculate the coefficient of determination; however, it can also be conveniently computed using technology.

    Definition: Coefficient of Determination Formula

    It is found by squaring \(r\), the correlation coefficient, to get \(r^2\).

    The table below presents a scale of key values that assist in interpreting the quality of the line of regression based on the coefficient of determination.

    Coefficient of Determination Scale
    Value Range Interpretation
    \(r^2\) = 0% The model explains none of the variance in the dependent variable.
    0% \( < r^2<\) 25% The model explains very little variance; the independent variables are weakly related to the dependent variable.
    25% \( < r^2<\) 50% The model explains a moderate amount of variance, but there is significant unexplained variation.
    50% \( < r^2<\) 75% The model explains a substantial amount of variance; the predictors are relatively strong.
    75% \( < r^2<\) 100% The model explains most of the variance; it has a high level of explanatory power.
    \(r^2\) = 100% The model explains all of the variance perfectly (typically unrealistic in real-world data).

    Table \(\PageIndex{1}\): Scale for Different Coefficient of Determination Values and Their Interpretations

    Examples of Coefficient of Determination

    Example \(\PageIndex{1}\)

    In examples 1 and 2 from section 10.5, the line of regression is \(y' = 12.145+4.226x\) and the correlation coefficient is \(r = 0.976\). Calculate the coefficient of determination and explain its significance.

    Solution

    Method 1) Square \(r\) and write the result as a percentage rounded to two place values. The coefficient of determination is \(r^2 = 0.976^2 = 0.95258 = 95\)%.

    Method 2) Using a TI-84+ calculator, follow the steps in example 2 of section 10.5 to enter the data and calculate the line of regression. The output page of the calculator is presented below and displays \(r^2\) on the fourth line. Convert to a percentage and round to two decimal places to get \(r^2 = 95\)%.

    Output of Coefficient of Determination r squared equal to 0.95
    Figure \(\PageIndex{1}\): Output of Coefficient of Determination r2 = 0.95

    Since \(r^2 = 95\)% the line of regression has a high level of explanatory power.

    Example \(\PageIndex{2}\)

    In example 3 from section 10.5, the line of regression is \(y' = 12.251 - 0.944x\) and the correlation coefficient is \(r = -0.964\). Calculate the coefficient of determination and explain its significance.

    Solution

    Method 1) Square \(r\) and write the result as a percentage rounded to two place values. The coefficient of determination is \(r^2 = (-0.964)^2 = 0.92930 = 93\)%.

    Method 2) Using a TI-84+ calculator, follow the steps in example 2 of section 10.5 to enter the data and calculate the line of regression. The output page of the calculator is presented below and displays \(r^2\) on the fourth line. Convert to a percentage and round to two decimal places to get \(r^2 = 93\)%.

    Output of coefficient of determination r squared equal to 0.93.
    Figure \(\PageIndex{2}\): Output of Coefficient of Determination r2 = 0.93

    Since \(r^2 = 95\)% the line of regression has a high level of explanatory power.

    Standard Error of the Estimate

    The standard error of the estimate indicates how closely the actual data points align with the regression line. Smaller values of the standard error of the estimate reflect a closer fit between the data points and the regression line. In the ideal case, the standard error of the estimate would be zero, meaning all data points lie exactly on the regression line. The two graphs below illustrate the impact of different standard errors of the estimate, allowing for a comparison of their effects on the regression line.

    Graph of two scatter plots showing standard error of the estimates.

    Figure \(\PageIndex{3}\): Graph of Two Scatter Plots Showing Standard Error of the Estimates

    • In Graph 1, SE = 19.26, and the actual values are more spread out from the line of regression. This will result in less accurate predictions for the line of regression.
    • In Graph 2, SE = 1.29, and the actual values are closer to the line of regression. This will result in more accurate predictions for the line of regression.
    Definition: Standard Error of the Estimate Formula

    \(SE = \sqrt{\dfrac{\sum y^2 - a \sum y - b \sum xy}{n-2}}\)

    Where,

    • \(n\) = number of pairs of data.
    • \(a\) = regression coefficient of the intercept.
    • \(b\) = regression coefficient of the slope.
    • \(\sum y\) = sum of the \(y\)-values.
    • \(\sum y^2\) = sum of the \(y^2\)-values.
    • \(\sum xy\) = sum of the \(xy\)-values.

    Examples of Standard Error of the Estimate

    Example \(\PageIndex{3}\)

    Professor Martinez is conducting a study to understand the relationship between the number of hours students study per week and their performance on the midterm exam in Math 400, an advanced calculus course at the university. She collects data from 8 randomly selected students in her class. The exam is out of 100 points and time is measured in hours per week. It was shown that the data values are correlated in 10.3. Compute the standard error of the estimate.

    Bivariate Data
    x: Hours Studied Per Week y: Midterm Exam Score (out of 100 points)
    10 51
    10 53
    12 64
    13 68
    14 71
    15 79
    16 84
    20 92

    Table \(\PageIndex{2}\): Bivariate Data for Hours Studied Per Week and Midterm Exam Score.

    Solution

    The TI-84+ will be used to compute the sums and regression coefficients.

    Step 1) Press the [STAT] button, make sure that [Edit and 1:EDIT] are selected, then press [ENTER].

    Select edit function to enter data.
    Figure \(\PageIndex{4}\): Select Edit Function to Enter Data

    Step 2) Enter the x-values in List 1 [\(L_1\)] and the y-values in List 2 [\(L_2\)].

    Enter X-values in List 1 and Y-values in List 2.
    Figure \(\PageIndex{5}\): Enter X-values in List 1 and Y-values in List 2

    Step 3) Press the [STAT] button again, use the right arrow to select [CALC], use the down arrow to select [8: LinReg(a+bx)], and then press [ENTER]

    Under calculate select option #8 (linear regression function).
    Figure \(\PageIndex{6}\): Under Calculate Select Option #8 (Linear Regression Function)

    Step 4) Make sure that X-list has \(L_1\) and the Y-list has \(L_2\). Use the down arrow to select [Calculate] and press [ENTER].

    Check screen to ensure proper Lists are selected.
    Figure \(\PageIndex{7}\): Check Screen to Make Sure Proper Lists are Selected

    Step 4) On the output page, the regression coefficients will be on the first two lines. After rounding to three places values they are \(a = 12.145\) and \(b=4.226.\)

    Output of regression coefficients (a = 12.145 and b =4.226).
    Figure \(\PageIndex{8}\): Output of Regression Coefficients (a = 12.145 and b =4.226)

    Step 5) Press the [STAT] button again, use the right arrow to select [CALC], use the down arrow to select [2: 2-VAR STATS], and then press [ENTER]

    Select two variable statistics to compute sums needed for calculations.
    Figure \(\PageIndex{9}\): Select Two Variable Statistics to Compute Sums Needed for Calculations

    Step 6) Make sure that X-list has \(L_1\) and the Y-list has \(L_2\). Use the down arrow to select [Calculate] and press [ENTER].

    Check screen to make sure proper lists are selected.
    Figure \(\PageIndex{10}\): Check Screen to Make Sure Proper Lists are Selected

    Step 7) When the output page appears, scroll down using the down arrow until the following sums appear: \(\sum y, \sum y^2, and \sum xy.\)

    Output of sums needed for calculations.
    Figure \(\PageIndex{11}\): Output of Sums Needed for Calculations

    Step 8) The results of steps 4 and 7 can be plugged into the formula to calculate the standard error of the estimate.

    The key values are \(n = 8, a = 12.145, b=4.226, \sum y = 562, \sum y^2 = 40932, and \sum xy = 8055 .\)

    \(SE = \sqrt{\dfrac{40932 - (12.145) 562 - (4.226) 8055}{8-2}}\) = \(\sqrt{\dfrac{40932 - 6825.49 - 34040.43}{6}}\) = \(\sqrt{\dfrac{ 66.08}{6}}\) = \(\sqrt{10.01333}\) = 3.31

    This is a relatively small value, which means the data values are close to the line of regression and will result in good predictions.

    Example \(\PageIndex{4}\)

    This example will demonstrate how to calculate the standard error of the estimate without using the formula. The TI-84+ calculator has a built-in function that directly calculates the standard error of the estimate. The data from example 3 above will be reused for this second example.

    Solution

    Step 1) Repeat steps 1 and 2 in example 1 above. The data must be entered into the calculator.

    Step 2) Press the [STAT] key, arrow over to the [TESTS] menu, arrow down to the option F [LinRegTTest], and press the [ENTER] key. The default is Xlist: L1, Ylist: L1, Freq:1, β, and ρ:≠0. Arrow down to Calculate and press the [ENTER] key.

    Make sure the proper lists are selected after data is input.
    Figure \(\PageIndex{12}\): Make Sure the Proper Lists are Selected After Data Input

    Step 3) The output screen on the calculator returns the standard error of the estimate. It is the value \(s\) on the last line.

    The standard error of the estimate is  s = 3.36.
    Figure \(\PageIndex{13}\): The Standard Error of the Estimate is s = 3.36

    The standard error of the estimate is \(SE = 3.36\) (rounded to two places. This result is slightly different from the equation above because this calculated value had no rounding during intermediate steps in the computation. Overall, this is a relatively small value, which means the data values are close to the line of regression and will result in good predictions.

    Example \(\PageIndex{5}\)

    A health researcher at the Health Department ​​​​​​at a large university is conducting a study to explore the relationship between physical activity and health outcomes among college students aged 18–25 years old. The researcher is specifically interested in determining whether there is a correlation between the number of hours students work out per week and the number of days they spend being ill in a year. The researcher collected data provided in the table below. It was shown that the data values are correlated in 10.3. Compute the standard error of the estimate.

    Bivariate Data
    X: Hours Worked Out per Week Y: Days Spent Ill in a Year
    0 14
    2 10
    4 8
    5 6
    7 5
    10 3
    12 2
    Solution

    Following the steps from example 4 above and 2 from 10.4, the output of the t-test is provided below.

    The standard error of the estimate is s = 1.21.
    Figure \(\PageIndex{14}\): The Standard Error of the Estimate is s = 1.21

    The standard error of the estimate is \(SE = 1.21\). Overall, this is a relatively small value, which means the data values are close to the line of regression and will result in good predictions.


    This page titled 10.6: Coefficient of Determination and the Standard Error of the Estimate is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Toros Berberyan, Tracy Nguyen, and Alfie Swan.

    • Was this article helpful?