10.6: Coefficient of Determination and the Standard Error of the Estimate
- Page ID
- 52845
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)- Define the coefficient of determination and explain its role in evaluating regression models.
- Interpret the coefficient of determination as a measure of how well the regression line explains variation in the dependent variable.
- Recognize that values closer to 1 indicate a better model fit.
- Define the standard error of the estimate and describe its purpose in measuring prediction accuracy.
- Interpret smaller standard error values as indicators of more accurate predictions and a better model fit.
Two key metrics, the coefficient of determination and the standard error of the estimate, are computed to assess the quality of the line of regression. The coefficient of determination, often denoted as \(r^2\) and typically written as a percentage, is a statistical measure that helps assess the quality of a regression model. It provides insights into how well the independent variables explain the variance in the dependent variable. The standard error of the estimate, denoted as \(SE\), measures how much the actual data points in a regression model differ from the predicted values. It simply tells you how far off your predictions are, on average.
Coefficient of Determination
The formula below is used to calculate the coefficient of determination; however, it can also be conveniently computed using technology.
It is found by squaring \(r\), the correlation coefficient, to get \(r^2\).
The table below presents a scale of key values that assist in interpreting the quality of the line of regression based on the coefficient of determination.
| Value Range | Interpretation |
|---|---|
| \(r^2\) = 0% | The model explains none of the variance in the dependent variable. |
| 0% \( < r^2<\) 25% | The model explains very little variance; the independent variables are weakly related to the dependent variable. |
| 25% \( < r^2<\) 50% | The model explains a moderate amount of variance, but there is significant unexplained variation. |
| 50% \( < r^2<\) 75% | The model explains a substantial amount of variance; the predictors are relatively strong. |
| 75% \( < r^2<\) 100% | The model explains most of the variance; it has a high level of explanatory power. |
| \(r^2\) = 100% | The model explains all of the variance perfectly (typically unrealistic in real-world data). |
Table \(\PageIndex{1}\): Scale for Different Coefficient of Determination Values and Their Interpretations
Examples of Coefficient of Determination
In examples 1 and 2 from section 10.5, the line of regression is \(y' = 12.145+4.226x\) and the correlation coefficient is \(r = 0.976\). Calculate the coefficient of determination and explain its significance.
Solution
Method 1) Square \(r\) and write the result as a percentage rounded to two place values. The coefficient of determination is \(r^2 = 0.976^2 = 0.95258 = 95\)%.
Method 2) Using a TI-84+ calculator, follow the steps in example 2 of section 10.5 to enter the data and calculate the line of regression. The output page of the calculator is presented below and displays \(r^2\) on the fourth line. Convert to a percentage and round to two decimal places to get \(r^2 = 95\)%.
Since \(r^2 = 95\)% the line of regression has a high level of explanatory power.
In example 3 from section 10.5, the line of regression is \(y' = 12.251 - 0.944x\) and the correlation coefficient is \(r = -0.964\). Calculate the coefficient of determination and explain its significance.
Solution
Method 1) Square \(r\) and write the result as a percentage rounded to two place values. The coefficient of determination is \(r^2 = (-0.964)^2 = 0.92930 = 93\)%.
Method 2) Using a TI-84+ calculator, follow the steps in example 2 of section 10.5 to enter the data and calculate the line of regression. The output page of the calculator is presented below and displays \(r^2\) on the fourth line. Convert to a percentage and round to two decimal places to get \(r^2 = 93\)%.
Since \(r^2 = 95\)% the line of regression has a high level of explanatory power.
Standard Error of the Estimate
The standard error of the estimate indicates how closely the actual data points align with the regression line. Smaller values of the standard error of the estimate reflect a closer fit between the data points and the regression line. In the ideal case, the standard error of the estimate would be zero, meaning all data points lie exactly on the regression line. The two graphs below illustrate the impact of different standard errors of the estimate, allowing for a comparison of their effects on the regression line.

Figure \(\PageIndex{3}\): Graph of Two Scatter Plots Showing Standard Error of the Estimates
- In Graph 1, SE = 19.26, and the actual values are more spread out from the line of regression. This will result in less accurate predictions for the line of regression.
- In Graph 2, SE = 1.29, and the actual values are closer to the line of regression. This will result in more accurate predictions for the line of regression.
\(SE = \sqrt{\dfrac{\sum y^2 - a \sum y - b \sum xy}{n-2}}\)
Where,
- \(n\) = number of pairs of data.
- \(a\) = regression coefficient of the intercept.
- \(b\) = regression coefficient of the slope.
- \(\sum y\) = sum of the \(y\)-values.
- \(\sum y^2\) = sum of the \(y^2\)-values.
- \(\sum xy\) = sum of the \(xy\)-values.
Examples of Standard Error of the Estimate
Professor Martinez is conducting a study to understand the relationship between the number of hours students study per week and their performance on the midterm exam in Math 400, an advanced calculus course at the university. She collects data from 8 randomly selected students in her class. The exam is out of 100 points and time is measured in hours per week. It was shown that the data values are correlated in 10.3. Compute the standard error of the estimate.
| x: Hours Studied Per Week | y: Midterm Exam Score (out of 100 points) |
|---|---|
| 10 | 51 |
| 10 | 53 |
| 12 | 64 |
| 13 | 68 |
| 14 | 71 |
| 15 | 79 |
| 16 | 84 |
| 20 | 92 |
Table \(\PageIndex{2}\): Bivariate Data for Hours Studied Per Week and Midterm Exam Score.
Solution
The TI-84+ will be used to compute the sums and regression coefficients.
Step 1) Press the [STAT] button, make sure that [Edit and 1:EDIT] are selected, then press [ENTER].
Step 2) Enter the x-values in List 1 [\(L_1\)] and the y-values in List 2 [\(L_2\)].
Step 3) Press the [STAT] button again, use the right arrow to select [CALC], use the down arrow to select [8: LinReg(a+bx)], and then press [ENTER]
Step 4) Make sure that X-list has \(L_1\) and the Y-list has \(L_2\). Use the down arrow to select [Calculate] and press [ENTER].
Step 4) On the output page, the regression coefficients will be on the first two lines. After rounding to three places values they are \(a = 12.145\) and \(b=4.226.\)
Step 5) Press the [STAT] button again, use the right arrow to select [CALC], use the down arrow to select [2: 2-VAR STATS], and then press [ENTER]
Step 6) Make sure that X-list has \(L_1\) and the Y-list has \(L_2\). Use the down arrow to select [Calculate] and press [ENTER].
Step 7) When the output page appears, scroll down using the down arrow until the following sums appear: \(\sum y, \sum y^2, and \sum xy.\)
Step 8) The results of steps 4 and 7 can be plugged into the formula to calculate the standard error of the estimate.
The key values are \(n = 8, a = 12.145, b=4.226, \sum y = 562, \sum y^2 = 40932, and \sum xy = 8055 .\)
\(SE = \sqrt{\dfrac{40932 - (12.145) 562 - (4.226) 8055}{8-2}}\) = \(\sqrt{\dfrac{40932 - 6825.49 - 34040.43}{6}}\) = \(\sqrt{\dfrac{ 66.08}{6}}\) = \(\sqrt{10.01333}\) = 3.31
This is a relatively small value, which means the data values are close to the line of regression and will result in good predictions.
This example will demonstrate how to calculate the standard error of the estimate without using the formula. The TI-84+ calculator has a built-in function that directly calculates the standard error of the estimate. The data from example 3 above will be reused for this second example.
Solution
Step 1) Repeat steps 1 and 2 in example 1 above. The data must be entered into the calculator.
Step 2) Press the [STAT] key, arrow over to the [TESTS] menu, arrow down to the option F [LinRegTTest], and press the [ENTER] key. The default is Xlist: L1, Ylist: L1, Freq:1, β, and ρ:≠0. Arrow down to Calculate and press the [ENTER] key.
Step 3) The output screen on the calculator returns the standard error of the estimate. It is the value \(s\) on the last line.
The standard error of the estimate is \(SE = 3.36\) (rounded to two places. This result is slightly different from the equation above because this calculated value had no rounding during intermediate steps in the computation. Overall, this is a relatively small value, which means the data values are close to the line of regression and will result in good predictions.
A health researcher at the Health Department at a large university is conducting a study to explore the relationship between physical activity and health outcomes among college students aged 18–25 years old. The researcher is specifically interested in determining whether there is a correlation between the number of hours students work out per week and the number of days they spend being ill in a year. The researcher collected data provided in the table below. It was shown that the data values are correlated in 10.3. Compute the standard error of the estimate.
| X: Hours Worked Out per Week | Y: Days Spent Ill in a Year |
|---|---|
| 0 | 14 |
| 2 | 10 |
| 4 | 8 |
| 5 | 6 |
| 7 | 5 |
| 10 | 3 |
| 12 | 2 |
Solution
Following the steps from example 4 above and 2 from 10.4, the output of the t-test is provided below.
The standard error of the estimate is \(SE = 1.21\). Overall, this is a relatively small value, which means the data values are close to the line of regression and will result in good predictions.
Author
"10.6: Coefficient of Determination and the Standard Error of the Estimate" by Alfie Swan is licensed under CC BY 4.0


