The coefficient of determination \(R^{2}\) (or \(r^{2}\)) is the fraction (or percent) of the variation in the values of \(y\) that is explained by the least-squares regression of \(y\) on \(x\). \(R^{2}\) is a measure of how well the values of \(y\) are explained by \(x\).
For example, there is some variability in the dependent variable values, such as grade. Some of the variation in student’s grades is due to hours studied and some is due to other factors. How much of the variation in a student’s grade is due to hours studied?
When considering this question, you want to look at how much of the variation in a student’s grade is explained by the number of hours they studied and how much is explained by other variables. Realize that some of the changes in grades have to do with other factors. You can have two students who study the same number of hours, but one student may have a higher grade. Some variability is explained by the model and some variability is not explained. Together, both of these give the total variability.
The proportion of the variation that is explained by the model is \[R^{2} = \frac{\text{Explained Variation}}{\text{Total Variation}} = \frac{SSR}{SST} \nonumber\]
Find and interpret the coefficient of determination for the hours studied and exam grade data.
The coefficient of determination is this correlation coefficient squared. Note: when \(r\) is negative, then when you square \(r\) the answer becomes positive. For the hours studied and exam grade \(r = 0.825358\), so \(r^{2} = R^{2} = 0.825358^{2} = 0.6812\).
Approximately 68% of the variation in a student’s exam grade is explained by the least square regression equation and the number of hours a student studied.
TI-84: Press the [STAT] key, arrow over to the [TESTS] menu, arrow down to the option [LinRegTTest] and press the [ENTER] key. The default is Xlist:L1, Ylist:L2, Freq:1, \(\beta\) and \(\rho: \neq 0\). Arrow down to Calculate and press the [ENTER] key. The calculator returns the \(y\)-intercept = \(a = b_{0}\), slope = \(b = b_{1}\), the standard error of estimate = \(s\), the coefficient of determination = \(r^{2} = R^{2}\), and the correlation coefficient = \(r\).
TI-89: In the Stats/List Editor select F6 for the Tests menu. Use cursor keys to select A:LinRegTTest and press [Enter]. In the “X List” space type in the name of your list with the \(x\) variable without spaces: for our example, “list1.” In the “Y List” space type in the name of your list with the \(y\) variable without spaces: for our example, “list2.” Under the “Alternate Hyp” menu, select the \(\neq\) sign that is the same as the problem’s alternative hypothesis statement then press the [ENTER] key, arrow down to [Calculate] and press the [ENTER] key. The calculator returns the \(y-\)intercept of the regression line = \(a = b_{0}\), the slope of the regression line = \(b = b_{1}\), the correlation = \(r\), and the coefficient of determination = \(r^{2} = R^{2}\).
The coefficient of determination can take on any value between 0 and 1, or 0% to 100%. The closer \(R^{2}\) is to 100%, the better the regression equation models the data. Unlike \(r\), which can only be used for simple linear regression, we can use \(R^{2}\) for different types of regression. In more advanced courses, if we were to do non-linear or multiple linear regression we could compare different models and pick the one that has the highest \(R^{2}\).
For instance, if we ran a linear regression the scatterplot that showed an obvious curve pattern we would get a regression equation with a zero slope and \(R^{2} = 0\). See Figure 12-16.
If we were to fit a parabola through the data we get a perfect fit and \(R^{2} = 1\). See Figure 12-17.