Skip to main content
Statistics LibreTexts

10.6: Coefficient of Determination and the Standard Error of the Estimate

  • Page ID
    58311
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)
    Learning Objectives
    • Define the coefficient of determination and explain its role in evaluating regression models.
    • Interpret the coefficient of determination as a measure of how well the regression line explains variation in the dependent variable.
    • Recognize that values closer to 1 indicate a better model fit.
    • Define the standard error of the estimate and describe its purpose in measuring prediction accuracy.
    • Interpret smaller standard error values as indicators of more accurate predictions and a better model fit.

    Two key metrics, the coefficient of determination and the standard error of the estimate, are computed to assess the quality of the line of regression. The coefficient of determination, often denoted as \(r^2\) and typically written as a percentage, is a statistical measure that helps assess the quality of a regression model. It provides insights into how well the independent variables explain the variance in the dependent variable. The standard error of the estimate, denoted as \(SE\), measures how much the actual data points in a regression model differ from the predicted values. It simply tells you how far off your predictions are, on average.

    Coefficient of Determination

    The formula below is used to calculate the coefficient of determination; however, it can also be conveniently computed using technology.

    Definition: Coefficient of Determination Formula

    It is found by squaring \(r\), the correlation coefficient, to get \(r^2\).

    The table below presents a scale of key values that assist in interpreting the quality of the line of regression based on the coefficient of determination.

    Coefficient of Determination Scale
    Value Range Interpretation
    \(r^2\) = 0% The model explains none of the variance in the dependent variable.
    0% \( < r^2<\) 25% The model explains very little variance; the independent variables are weakly related to the dependent variable.
    25% \( < r^2<\) 50% The model explains a moderate amount of variance, but there is significant unexplained variation.
    50% \( < r^2<\) 75% The model explains a substantial amount of variance; the predictors are relatively strong.
    75% \( < r^2<\) 100% The model explains most of the variance; it has a high level of explanatory power.
    \(r^2\) = 100% The model explains all of the variance perfectly (typically unrealistic in real-world data).

    Table \(\PageIndex{1}\): Scale for Different Coefficient of Determination Values and Their Interpretations

    Examples of Coefficient of Determination

    Example \(\PageIndex{1}\)

    In examples 1 and 2 from section 10.5, the line of regression is \(y' = 12.145+4.226x\) and the correlation coefficient is \(r = 0.976\). Calculate the coefficient of determination and explain its significance.

    Solution

    Method 1) Square \(r\) and write the result as a percentage rounded to two place values. The coefficient of determination is \(r^2 = 0.976^2 = 0.95258 = 95\)%.

    Method 2) Using a TI-84+ calculator, follow the steps in example 2 of section 10.5 to enter the data and calculate the line of regression. The output page of the calculator is presented below and displays \(r^2\) on the fourth line. Convert to a percentage and round to two decimal places to get \(r^2 = 95\)%.

    Output of Coefficient of Determination r squared equal to 0.95
    Figure \(\PageIndex{1}\): Output of Coefficient of Determination r2 = 0.95

    Since \(r^2 = 95\)% the line of regression has a high level of explanatory power.

    Example \(\PageIndex{2}\)

    In example 3 from section 10.5, the line of regression is \(y' = 12.251 - 0.944x\) and the correlation coefficient is \(r = -0.964\). Calculate the coefficient of determination and explain its significance.

    Solution

    Method 1) Square \(r\) and write the result as a percentage rounded to two place values. The coefficient of determination is \(r^2 = (-0.964)^2 = 0.92930 = 93\)%.

    Method 2) Using a TI-84+ calculator, follow the steps in example 2 of section 10.5 to enter the data and calculate the line of regression. The output page of the calculator is presented below and displays \(r^2\) on the fourth line. Convert to a percentage and round to two decimal places to get \(r^2 = 93\)%.

    Output of coefficient of determination r squared equal to 0.93.
    Figure \(\PageIndex{2}\): Output of Coefficient of Determination r2 = 0.93

    Since \(r^2 = 95\)% the line of regression has a high level of explanatory power.

    Standard Error of the Estimate

    The standard error of the estimate indicates how closely the actual data points align with the regression line. Smaller values of the standard error of the estimate reflect a closer fit between the data points and the regression line. In the ideal case, the standard error of the estimate would be zero, meaning all data points lie exactly on the regression line. The two graphs below illustrate the impact of different standard errors of the estimate, allowing for a comparison of their effects on the regression line.

    Graph of two scatter plots showing standard error of the estimates.

    Figure \(\PageIndex{3}\): Graph of Two Scatter Plots Showing Standard Error of the Estimates

    • In Graph 1, SE = 19.26, and the actual values are more spread out from the line of regression. This will result in less accurate predictions for the line of regression.
    • In Graph 2, SE = 1.29, and the actual values are closer to the line of regression. This will result in more accurate predictions for the line of regression.
    Definition: Standard Error of the Estimate Formula

    \(SE = \sqrt{\dfrac{\sum y^2 - a \sum y - b \sum xy}{n-2}}\)

    Where,

    • \(n\) = number of pairs of data.
    • \(a\) = regression coefficient of the intercept.
    • \(b\) = regression coefficient of the slope.
    • \(\sum y\) = sum of the \(y\)-values.
    • \(\sum y^2\) = sum of the \(y^2\)-values.
    • \(\sum xy\) = sum of the \(xy\)-values.

    Examples of Standard Error of the Estimate

    Example \(\PageIndex{3}\)

    Professor Martinez is conducting a study to understand the relationship between the number of hours students study per week and their performance on the midterm exam in Math 400, an advanced calculus course at the university. She collects data from 8 randomly selected students in her class. The exam is out of 100 points and time is measured in hours per week. It was shown that the data values are correlated in 10.3. Compute the standard error of the estimate.

    Bivariate Data
    x: Hours Studied Per Week y: Midterm Exam Score (out of 100 points)
    10 51
    10 53
    12 64
    13 68
    14 71
    15 79
    16 84
    20 92

    Table \(\PageIndex{2}\): Bivariate Data for Hours Studied Per Week and Midterm Exam Score.

    Solution

    The TI-84+ will be used to compute the sums and regression coefficients.

    Step 1) Press the [STAT] button, make sure that [Edit and 1:EDIT] are selected, then press [ENTER].

    Select edit function to enter data.
    Figure \(\PageIndex{4}\): Select Edit Function to Enter Data

    Step 2) Enter the x-values in List 1 [\(L_1\)] and the y-values in List 2 [\(L_2\)].

    Enter X-values in List 1 and Y-values in List 2.
    Figure \(\PageIndex{5}\): Enter X-values in List 1 and Y-values in List 2

    Step 3) Press the [STAT] button again, use the right arrow to select [CALC], use the down arrow to select [8: LinReg(a+bx)], and then press [ENTER]

    Under calculate select option #8 (linear regression function).
    Figure \(\PageIndex{6}\): Under Calculate Select Option #8 (Linear Regression Function)

    Step 4) Make sure that X-list has \(L_1\) and the Y-list has \(L_2\). Use the down arrow to select [Calculate] and press [ENTER].

    Check screen to ensure proper Lists are selected.
    Figure \(\PageIndex{7}\): Check Screen to Make Sure Proper Lists are Selected

    Step 4) On the output page, the regression coefficients will be on the first two lines. After rounding to three places values they are \(a = 12.145\) and \(b=4.226.\)

    Output of regression coefficients (a = 12.145 and b =4.226).
    Figure \(\PageIndex{8}\): Output of Regression Coefficients (a = 12.145 and b =4.226)

    Step 5) Press the [STAT] button again, use the right arrow to select [CALC], use the down arrow to select [2: 2-VAR STATS], and then press [ENTER]

    Select two variable statistics to compute sums needed for calculations.
    Figure \(\PageIndex{9}\): Select Two Variable Statistics to Compute Sums Needed for Calculations

    Step 6) Make sure that X-list has \(L_1\) and the Y-list has \(L_2\). Use the down arrow to select [Calculate] and press [ENTER].

    Check screen to make sure proper lists are selected.
    Figure \(\PageIndex{10}\): Check Screen to Make Sure Proper Lists are Selected

    Step 7) When the output page appears, scroll down using the down arrow until the following sums appear: \(\sum y, \sum y^2, and \sum xy.\)

    Output of sums needed for calculations.
    Figure \(\PageIndex{11}\): Output of Sums Needed for Calculations

    Step 8) The results of steps 4 and 7 can be plugged into the formula to calculate the standard error of the estimate.

    The key values are \(n = 8, a = 12.145, b=4.226, \sum y = 562, \sum y^2 = 40932, and \sum xy = 8055 .\)

    \(SE = \sqrt{\dfrac{40932 - (12.145) 562 - (4.226) 8055}{8-2}}\) = \(\sqrt{\dfrac{40932 - 6825.49 - 34040.43}{6}}\) = \(\sqrt{\dfrac{ 66.08}{6}}\) = \(\sqrt{10.01333}\) = 3.31

    This is a relatively small value, which means the data values are close to the line of regression and will result in good predictions.

    Example \(\PageIndex{4}\)

    This example will demonstrate how to calculate the standard error of the estimate without using the formula. The TI-84+ calculator has a built-in function that directly calculates the standard error of the estimate. The data from example 3 above will be reused for this second example.

    Solution

    Step 1) Repeat steps 1 and 2 in example 1 above. The data must be entered into the calculator.

    Step 2) Press the [STAT] key, arrow over to the [TESTS] menu, arrow down to the option F [LinRegTTest], and press the [ENTER] key. The default is Xlist: L1, Ylist: L1, Freq:1, β, and ρ:≠0. Arrow down to Calculate and press the [ENTER] key.

    Make sure the proper lists are selected after data is input.
    Figure \(\PageIndex{12}\): Make Sure the Proper Lists are Selected After Data Input

    Step 3) The output screen on the calculator returns the standard error of the estimate. It is the value \(s\) on the last line.

    The standard error of the estimate is  s = 3.36.
    Figure \(\PageIndex{13}\): The Standard Error of the Estimate is s = 3.36

    The standard error of the estimate is \(SE = 3.36\) (rounded to two places. This result is slightly different from the equation above because this calculated value had no rounding during intermediate steps in the computation. Overall, this is a relatively small value, which means the data values are close to the line of regression and will result in good predictions.

    Example \(\PageIndex{5}\)

    A health researcher at the Health Department ​​​​​​at a large university is conducting a study to explore the relationship between physical activity and health outcomes among college students aged 18–25 years old. The researcher is specifically interested in determining whether there is a correlation between the number of hours students work out per week and the number of days they spend being ill in a year. The researcher collected data provided in the table below. It was shown that the data values are correlated in 10.3. Compute the standard error of the estimate.

    Bivariate Data
    X: Hours Worked Out per Week Y: Days Spent Ill in a Year
    0 14
    2 10
    4 8
    5 6
    7 5
    10 3
    12 2
    Solution

    Following the steps from example 4 above and 2 from 10.4, the output of the t-test is provided below.

    The standard error of the estimate is s = 1.21.
    Figure \(\PageIndex{14}\): The Standard Error of the Estimate is s = 1.21

    The standard error of the estimate is \(SE = 1.21\). Overall, this is a relatively small value, which means the data values are close to the line of regression and will result in good predictions.

    Exercises

    1. A café owner wants to determine if there is a significant correlation between the daily temperature and the number of iced coffee drinks sold. The owner records the daily temperature and the number of iced coffee drinks sold for five randomly selected days listed below.
      1. Test for correlation with \( \alpha = 0.05 \) using r and Pearson's Correlation Matrix (PMC). Please click on the PMC table to access the table in the book.
      2. If there is enough evidence of a linear relationship, determine the line of regression.
      3. Determine the coefficient of determination.
      4. Determine the standard error of the estimate.
    Bivariate Data
    Temperature (⁰F) # of Iced Coffees Sold
    72 35
    78 42
    85 53
    88 56
    91 60

    Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
    MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.

    QR code linking to the MyOpenMath version of the question above with step-by-step guided problem-solving.

    1. A café owner wants to determine if there is a significant correlation between the daily temperature and the number of iced coffee drinks sold. The owner records the daily temperature and the number of iced coffee drinks sold for five randomly selected days listed below.
      1. Test for correlation with \( \alpha = 0.05 \). Use the traditional method. Click on this link for the t-distribution table to locate the critical values.
      2. If there is enough evidence of a linear relationship, determine the line of regression.
      3. Determine the coefficient of determination.
      4. Determine the standard error of the estimate.
    Bivariate Data
    Temperature (⁰F) # of Iced Coffees Sold
    72 35
    78 42
    85 53
    88 56
    91 60

    Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
    MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.

    QR code linking to the MyOpenMath version of the question above with step-by-step guided problem-solving.

    1. A café owner wants to determine if there is a significant correlation between the daily temperature and the number of iced coffee drinks sold. The owner records the daily temperature and the number of iced coffee drinks sold for five randomly selected days listed below.
      1. Test for correlation with \( \alpha = 0.05 \). Use the p-value method.
      2. If there is enough evidence of a linear relationship, determine the line of regression.
      3. Determine the coefficient of determination.
      4. Determine the standard error of the estimate.
    Bivariate Data
    Temperature (⁰F) # of Iced Coffees Sold
    72 35
    78 42
    85 53
    88 56
    91 60

    Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
    MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.

    QR code linking to the MyOpenMath version of the question above with step-by-step guided problem-solving.

    1. A researcher wants to investigate whether there is a significant linear relationship between gas prices and average household income in different cities. The data below shows average gas prices and corresponding household income (in thousands of dollars) for the seven cities listed below.
      1. Test for correlation with \( \alpha = 0.05 \) using r and Pearson's Correlation Matrix (PMC). Please click on the PMC table to access the table in the book.
      2. If there is enough evidence of a linear relationship, determine the line of regression.
      3. Determine the coefficient of determination.
      4. Determine the standard error of the estimate.
    Bivariate Data
    Gas Price ($) Household Income (in $1,000s)
    3.10 45
    3.25 52
    3.40 60
    3.55 66
    3.70 72
    3.85 78
    4.00 85

    Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
    MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.

    QR code linking to the MyOpenMath version of the question above with step-by-step guided problem-solving.

    1. A researcher wants to investigate whether there is a significant linear relationship between gas prices and average household income in different cities. The data below shows average gas prices and corresponding household income (in thousands of dollars) for the seven cities listed below.
      1. Test for correlation with \( \alpha = 0.05 \). Use the traditional method. Click on this link for the t-distribution table to locate the critical values.
      2. If there is enough evidence of a linear relationship, determine the line of regression.
      3. Determine the coefficient of determination.
      4. Determine the standard error of the estimate.
    Bivariate Data
    Gas Price ($) Household Income (in $1,000s)
    3.10 45
    3.25 52
    3.40 60
    3.55 66
    3.70 72
    3.85 78
    4.00 85

    Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
    MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.

    QR code linking to the MyOpenMath version of the question above with step-by-step guided problem-solving.

    1. A researcher wants to investigate whether there is a significant linear relationship between gas prices and average household income in different cities. The data below shows average gas prices and corresponding household income (in thousands of dollars) for the seven cities listed below.
      1. Test for correlation with \( \alpha = 0.05 \). Use the p-value method.
      2. If there is enough evidence of a linear relationship, determine the line of regression.
      3. Determine the coefficient of determination.
      4. Determine the standard error of the estimate.
    Bivariate Data
    Gas Price ($) Household Income (in $1,000s)
    3.10 45
    3.25 52
    3.40 60
    3.55 66
    3.70 72
    3.85 78
    4.00 85

    Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
    MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.

    QR code linking to the MyOpenMath version of the question above with step-by-step guided problem-solving.

    Answers

    If you are an instructor and want the solutions to all the exercise questions for each section, please email Toros Berberyan


    This page titled 10.6: Coefficient of Determination and the Standard Error of the Estimate is shared under a CC BY 4.0 license and was authored, remixed, and/or curated by Toros Berberyan, Tracy Nguyen, and Alfie Swan.

    • Was this article helpful?