10.5: Linear Regression
- Page ID
- 58310
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)- Explain the purpose of linear regression in modeling the relationship between two variables.
- Identify the components of a linear regression equation: the y-intercept (a) and the slope (b).
- Describe how the y-intercept and slope define the position and steepness of the regression line.
- Use a linear regression model to make predictions based on the trend between variables.
A line of regression is a straight line that best represents the relationship between two correlated variables in a dataset. It is used in regression analysis to predict a dependent variable's value based on the independent variable's value. The line is determined using the least squares method, which minimizes the sum of the squared differences between the observed data points and the predicted values from the line. The line of regression is the line that best fits those points. In other words, the line minimizes the distance between itself and the data points. Below is a graph of the line of regression with the corresponding equation for the line.
\(y'=a+bx\)
Where,
- \(y'\) is the estimated value of \(y\) at \(x\). Note that it does not represent the derivative of y as in a calculus class.
- The intercept of the line is \(a = \dfrac{(\sum y)(\sum x^2)-(\sum x)(\sum xy)}{n(\sum x^2)-(\sum x)^2}\)
- The slope of the line is \(b = \dfrac{n(\sum xy)-(\sum x)(\sum y)}{n(\sum x^2)-(\sum x)^2}\)
Examples
Professor Martinez is conducting a study to understand the relationship between the number of hours students study per week and their performance on the midterm exam in Math 400, an advanced calculus course at the university. She collects data from 8 randomly selected students in her class. The exam is out of 100 points, and time is measured in hours per week. It was shown that the data values are positively correlated in 10.3. Find the line of regression and estimate the exam score if a student studies for 8 hours per week. Round the results to three decimal places.
| x: Hours Studied Per Week | y: Midterm Exam Score (out of 100 points) |
|---|---|
| 10 | 51 |
| 10 | 53 |
| 12 | 64 |
| 13 | 68 |
| 14 | 71 |
| 15 | 79 |
| 16 | 84 |
| 20 | 92 |
Table \(\PageIndex{1}\) Hours Studied Per Week and Midterm Scores out of 100 Points.
Solution
In this example, the formulas will be used.
Step 1) Add three columns to the table for \(xy\), \(x^2\), and \(y^2\). Fill in the columns by performing the required computations.
| \(x\) | \(y\) | \(xy\) | \(x^2\) | \(y^2\) |
|---|---|---|---|---|
| 10 | 51 | 510 | 100 | 2601 |
| 10 | 53 | 530 | 100 | 2809 |
| 12 | 64 | 768 | 144 | 4096 |
| 13 | 68 | 884 | 169 | 4624 |
| 14 | 71 | 994 | 196 | 5041 |
| 15 | 79 | 1185 | 225 | 6241 |
| 16 | 84 | 1344 | 256 | 7056 |
| 20 | 92 | 1840 | 400 | 8464 |
Table \(\PageIndex{2}\) Key Columns Needed to Compute the Regression Coefficients a and b
Step 2) Find the sum of each column.
- \(n\) = 8
- \(\sum x\) = 110
- \(\sum y\) = 562
- \(\sum x^2\) = 1590
- \(\sum y^2\) = 40932
- \(\sum xy\) = 8055
Step 3) Plug the information into the formulas for the regression coefficients and compute them using the order of operations.
\(a = \dfrac{(\sum y)(\sum x^2)-(\sum x)(\sum xy)}{n(\sum x^2)-(\sum x)^2}\)\(= \dfrac{(562)(1590)-(110)(8055)}{8(1590)-(110)^2}\)\(=\dfrac{893,580 - 886,050}{12,720-12,100} =\dfrac{7,530}{620} = 12.145\)
\(b = \dfrac{8(8055)-(110)(562)}{8(1590)-(110)^2}\)\(=\dfrac{64,440 - 61,820}{12,720-12,100}=\dfrac{2,620}{620}=4.226\)
Step 4) Write out the equation for the line of regression using the computed regression coefficients.
\(y' = 12.145+4.226x\)
Step 5) Use the line of regression to compute \(y'\) (estimated Midterm Exam score) when x = 8 (hours per week). Plug x = 8 into the equation and compute the estimated value.
\(y' = 12.145+4.226x = 12.145+4.226(8) = 12.145 + 33.808 = 45.953\)
Thus, if a student studies 8 hours per week, the estimated Midterm Exam score is around 46 points.
It is more efficient to work out a linear regression problem using technology. The example above will be reworked using the TI-84+ calculator in the next example.
Professor Martinez is conducting a study to understand the relationship between the number of hours students study per week and their performance on the midterm exam in Math 400, an advanced calculus course at the university. She collects data from 8 randomly selected students in her class. The exam is out of 100 points and time is measured in hours per week. It was shown that the data values are positively correlated in 10.3. Find the line of regression and estimate the exam score if a student studies for 8 hours per week. Use the TI-84 + calculator. Round the results to three decimal places.
| x: Hours Studied Per Week | y: Midterm Exam Score (out of 100 points) |
|---|---|
| 10 | 51 |
| 10 | 53 |
| 12 | 64 |
| 13 | 68 |
| 14 | 71 |
| 15 | 79 |
| 16 | 84 |
| 20 | 92 |
Table \(\PageIndex{3}\) Hours Studied Per Week and Midterm Scores out of 100 Points.
Solution
Step 1) Press the [STAT] button, make sure that [Edit and 1:EDIT] are selected, then press [ENTER].
Step 2) Enter the x-values in List 1 [\(L_1\)] and the y-values in List 2 [\(L_2\)].
Step 3) Press the [STAT] button again, use the right arrow to select [CALC], use the down arrow to select [8: LinReg(a+bx)], and then press [ENTER].
Step 4) Make sure that X-list has \(L_1\) and the Y-list has \(L_2\). Use the down arrow to select [Calculate] and press [ENTER].
Step 5) On the output page, \(a\) and \(b\) will be on the first two lines. After rounding to three places, they are \(a = 12.145\) and \(b = 4.226\).
Step 6) Use the line of regression to compute \(y'\) (estimated Midterm Exam score) when x = 8 (hours per week). Plug x = 8 into the equation and compute the estimated value.
\(y' = 12.145+4.226x = 12.145+4.226(8) = 12.145 + 33.808 = 45.953\)
Thus, if a student studies 8 hours per week, the estimated Midterm Exam score is around 46 points.
A health researcher at the Health Department at a large university is conducting a study to explore the relationship between physical activity and health outcomes among college students aged 18–25 years old. The researcher is specifically interested in determining whether there is a correlation between the number of hours students work out per week and the number of days they spend being ill in a year. The researcher collected data provided in the table below. It was shown that the data values are positively correlated in 10.3. Find the line of regression and estimate the exam score if a student studies for 9 hours per week. Use the TI-84 + calculator. Round the results to three decimal places.
| X: Hours Worked Out per Week | Y: Days Spent Ill in a Year |
|---|---|
| 0 | 14 |
| 2 | 10 |
| 4 | 8 |
| 5 | 6 |
| 7 | 5 |
| 10 | 3 |
| 12 | 2 |
Table \(\PageIndex{4}\) Hours Worked Out Per Week and Days Ill Per Year.
Solution
In this example, the TI-84+ calculator will be used to compute the regression coefficients. Follow the steps in Example 2 to compute the regression coefficients. The output is provided in the image below.
Therefore, the line of regression is \(y' = 12.251 - 0.944x\).
The estimated value is \(y' = 12.251 - 0.944(9) = 12.251 -8.496 = 3.755\).
Thus, according to the linear model, if a person works out 9 hours per week, they will be ill for around 4 days during the year.
A researcher is exploring if there is any correlation between the amount of money students spend on lunch and their GPA in a college setting. Hypothetically, we are testing if students who spend more money on lunch tend to have higher or lower GPAs.
The researcher collected 10 pairs of data representing the amount of money students spend on lunch and their corresponding GPA. The researcher collected data provided in the table below. It was shown that the data values are not correlated in 10.3. Find the line of regression and estimate the exam score if a student studies for $13.00 on lunch. Use the TI-84 + calculator. Round the results to three decimal places.
| Amount Spent on Lunch ($) | GPA |
|---|---|
| $ 10.00 | 1.95 |
| $ 7.50 | 3.20 |
| $ 4.00 | 3.60 |
| $ 8.45 | 2.80 |
| $ 6.95 | 3.40 |
| $ 9.00 | 2.70 |
| $ 8.90 | 2.56 |
| $ 12.50 | 3.30 |
| $ 19.80 | 3.00 |
| $ 6.90 | 3.49 |
Table \(\PageIndex{5}\) Amount Spent on Lunch in Dollars and Grade Point Average (GPA).
Solution
Since there is no correlation, the line of regression is not computed as it is not valid.
Authors
"10.5: Linear Regression" by Toros Berberyan, Tracy Nguyen, and Alfie Swan is licensed under CC BY-SA 4.0
Attributions
"10.1: Regression" by Kathryn Kozak is licensed under CC BY-SA 4.0
Exercises
- A café owner wants to determine if there is a significant correlation between the daily temperature and the number of iced coffee drinks sold. The owner records the daily temperature and the number of iced coffee drinks sold for five randomly selected days listed below. Test for correlation with \( \alpha = 0.05 \) using r and Pearson's Correlation Matrix (PMC). Please click on the PMC table to access the table in the book. If there is enough evidence of a linear relationship, determine the line of regression and make a prediction when x = 50 degrees.
| Temperature (⁰F) | # of Iced Coffees Sold |
|---|---|
| 72 | 35 |
| 78 | 42 |
| 85 | 53 |
| 88 | 56 |
| 91 | 60 |
Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.
- A café owner wants to determine if there is a significant correlation between the daily temperature and the number of iced coffee drinks sold. The owner records the daily temperature and the number of iced coffee drinks sold for five randomly selected days listed below. Test for correlation with \( \alpha = 0.05 \). Use the traditional method. Click on this link for the t-distribution table to locate the critical values. If there is enough evidence of a linear relationship, determine the line of regression and make a prediction when x = 50 degrees.
| Temperature (⁰F) | # of Iced Coffees Sold |
|---|---|
| 72 | 35 |
| 78 | 42 |
| 85 | 53 |
| 88 | 56 |
| 91 | 60 |
Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.
- A café owner wants to determine if there is a significant correlation between the daily temperature and the number of iced coffee drinks sold. The owner records the daily temperature and the number of iced coffee drinks sold for five randomly selected days listed below. Test for correlation with \( \alpha = 0.05 \). Use the p-value method. If there is enough evidence of a linear relationship, determine the line of regression and make a prediction when x = 50 degrees.
| Temperature (⁰F) | # of Iced Coffees Sold |
|---|---|
| 72 | 35 |
| 78 | 42 |
| 85 | 53 |
| 88 | 56 |
| 91 | 60 |
Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.
- A researcher wants to investigate whether there is a significant linear relationship between gas prices and average household income in different cities. The data below shows average gas prices and corresponding household income (in thousands of dollars) for the seven cities listed below. Test for correlation with \( \alpha = 0.01 \) using r and Pearson's Correlation Matrix (PMC). Please click on the PMC table to access the table in the book. If there is enough evidence of a linear relationship, determine the line of regression and make a prediction when x = 2.75.
| Gas Price ($) | Household Income (in $1,000s) |
|---|---|
| 3.10 | 45 |
| 3.25 | 52 |
| 3.40 | 60 |
| 3.55 | 66 |
| 3.70 | 72 |
| 3.85 | 78 |
| 4.00 | 85 |
Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.
- A researcher wants to investigate whether there is a significant linear relationship between gas prices and average household income in different cities. The data below shows average gas prices and corresponding household income (in thousands of dollars) for the seven cities listed below. Test for correlation with \( \alpha = 0.01 \). Use the traditional method. Click on this link for the t-distribution table to locate the critical values. If there is enough evidence of a linear relationship, determine the line of regression and make a prediction when x = 2.75.
| Gas Price ($) | Household Income (in $1,000s) |
|---|---|
| 3.10 | 45 |
| 3.25 | 52 |
| 3.40 | 60 |
| 3.55 | 66 |
| 3.70 | 72 |
| 3.85 | 78 |
| 4.00 | 85 |
Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.
- A researcher wants to investigate whether there is a significant linear relationship between gas prices and average household income in different cities. The data below shows average gas prices and corresponding household income (in thousands of dollars) for the seven cities listed below. Test for correlation with \( \alpha = 0.01 \). Use the p-value method. If there is enough evidence of a linear relationship, determine the line of regression and make a prediction when x = 2.75.
| Gas Price ($) | Household Income (in $1,000s) |
|---|---|
| 3.10 | 45 |
| 3.25 | 52 |
| 3.40 | 60 |
| 3.55 | 66 |
| 3.70 | 72 |
| 3.85 | 78 |
| 4.00 | 85 |
Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.
- A researcher believes that students who study more hours per week might experience lower levels of stress. To test this, she surveys 6 college students and records how many hours they study per week and their self-reported stress level on a scale of 1 to 10 (10 = highest stress). Test for correlation with \( \alpha = 0.05 \) using r and Pearson's Correlation Matrix (PMC). Please click on the PMC table to access the table in the book.
| Study Hours | Stress Level |
| 4 | 10 |
| 6 | 8 |
| 8 | 9 |
| 10 | 10 |
| 12 | 4 |
| 14 | 3 |
Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.
- A researcher believes that students who study more hours per week might experience lower levels of stress. To test this, she surveys 6 college students and records how many hours they study per week and their self-reported stress level on a scale of 1 to 10 (10 = highest stress). Test for correlation with \( \alpha = 0.05 \). Use the traditional method. Click on this link for the t-distribution table to locate the critical values.
| Study Hours | Stress Level |
| 4 | 10 |
| 6 | 8 |
| 8 | 9 |
| 10 | 10 |
| 12 | 4 |
| 14 | 3 |
Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.
- A researcher believes that students who study more hours per week might experience lower levels of stress. To test this, she surveys 6 college students and records how many hours they study per week and their self-reported stress level on a scale of 1 to 10 (10 = highest stress). Test for correlation with \( \alpha = 0.05 \). Use the p-value method.
| Study Hours | Stress Level |
| 4 | 10 |
| 6 | 8 |
| 8 | 9 |
| 10 | 10 |
| 12 | 4 |
| 14 | 3 |
Scan the QR code or click on it to open the MyOpenMath version of the above question with step-by-step guidance.
MyOpenMath is a free online learning platform designed to support math instruction through automated homework, quizzes, and assessments. You must register for MyOpenMath and sign in to view the question.
- Answers
If you are an instructor and want the solutions to all the exercise questions for each section, please email Toros Berberyan.











