13.6: Visualizing Linear Regression
- Page ID
- 50174
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Regression builds for the same graph and foundation as correlation. Thus, bivariate regression can be graphed using a scatterplot and a regression line (See Chapter 12 for a review of scatterplots and fit lines). The regression line is used to summarize the change in \(Y\) that is associated with a change in \(X\). The closer the dots tend to be to the regression line, the stronger the relationship and, thus, the more accurate the predictions will be based on the regression line. When there is a relationship between the variables, a regression line will either slope up or it will slope down when read from left to right. The angle of the line (i.e. the slope) indicates the direction of the correlation (either positive or negative).
Let’s take a look at how regression works using Data Set 12.1 from Chapter 12. In Chapter 12, we supposed a researcher collected data from 10 college students to test the hypothesis that hours of sleep would positively relate to quiz scores. For regression we would hypothesize that hours of sleep would be useful in predicting quiz scores. Data Set 12.1 and the corresponding scatterplot for those data are included below for reference.
Participant Number |
Sleep Hours |
Quiz Score |
---|---|---|
1 |
7 |
92 |
2 |
8 |
88 |
3 |
9 |
96 |
4 |
6 |
70 |
5 |
6 |
79 |
6 |
4 |
64 |
7 |
5 |
75 |
8 |
10 |
98 |
9 |
3 |
53 |
10 |
7 |
85 |
Graph 12.2 Hours of Sleep and Quiz Scores with a Regression Line
Predictions using Regression Lines
The stronger the correlation, the more accurate the predictions. This is because the regression line is being used to make predictions. Recall that a regression line is balanced to approximate the location of the dots with as little error as possible. Keep in mind that each dot is a bivariate data point. When we look at those data points in Graph 12.1, we can see that they are fairly close to the positively sloping regression line. Visually, we can see that the line does a good job of estimating the location of the data and, thus, that it will likely be useful in estimating (predicting) \(Y\)-values using \(X\)-values.
The Regression Equation
When regression is used, the regression line is being used to estimate scores of the \(Y\)-variable. Thus, the equation of the line is the formula used to predict (or estimate) \(Y\)-values. Linear graphs are summarized with the following equation:
\[\hat{Y}=b_0+b_{1} x \nonumber \]
In this version of the linear equation \(\hat{Y}\) stands for a predicted \(Y\) value, \(b_0\) stands for the y-intercept of the line, and \(b_1\) stands for the slope of the line. A \(y\)-intercept is where the line crosses the \(y\)-axis; it represents what \(Y\) equals when \(X\) is zero. Notice that this is very similar to the structure some of us have seen before in math classes when we learned the linear equation; the version used in math classes is as follows: \(Y = mx + b\). The symbols for the slope and y-intercept are different in statistics and math but they represent the same things. Specifically, in math the slope is called “\(m\)” and the \(y\)-intercept is called “\(a\)” while in statistics the slope is called “\(b_1\)” and the \(y\)-intercept is called “\(b_0\)”. In statistics, the \(y\)-intercept, \(b_0\), is also often referred to as a constant because it refers to a single point on the graph. In addition, in statistics we often state the \(y\)-intercept first followed by slope times \(X\) whereas in math we may see these reversed (where the slope times \(X\) is stated first and the \(y\)-intercept is then added). However, these are just two different ways of writing the same formula. Thus, the two formulas are computationally the same, even though the symbols look different.
The regression equation is what is used to both summarize the regression line and to make predictions. When the slope and \(y\)-intercept are known, an \(X\)-value can be plugged into the linear equation to predict a \(Y\)-value. Slope and y-intercept are rarely calculated by hand in regression and, instead, are generally calculated using software such as SPSS. Therefore, the process for hand-calculating the slope and \(y\)-intercept are not included in this chapter and, instead, we will focus on how to read and use these values using SPSS as we progress through this chapter.