Skip to main content
Statistics LibreTexts

9.3: The Line of Best Fit

  • Page ID
    49069
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    In a previous section, we saw that an r value close to 1 may indicate a strong linear relationship between an explanatory variable and a response variable. In such situations, we may choose to use an equation of a line to summarize the relationship. We can then use a linear model to make predictions about the values of the response variable.

    1. Imagine a line that fits the data best. Which of the purple points, A, B, C, D, or E, does the line intersect? Sketch the line below.

      AD_4nXfVEJsHnSKlNVhuxEv1LHbXxLguu56Y5RPyaJrYD1NVh4WwaziGDAcbAGHaCQw_DUPf3KTMl9-jobS9MWlF0AtbAvtPGBcUz9WklLjF_mTgub66Arv87KNpMmrbChVjyC9M_9LoPR9DqUlpspCqCcs_91gkeyi1XJeTDlU718V25snr3PRQ

    2. Imagine the line in your head continuing out to x=100. Estimate the y-coordinate of the point on the line where x is 100. Explain how you made your guess.
    3. Select the line below that corresponds to the choice you made in question 1. Use it’s equation to predict the y-coordinate of the point on the line where x is 100.

      AD_4nXfIp2FR5KxHSWCoV0POb6z0eUlZUjmIzofBb0mGPgFnFKQGbGNKte_oz0YgDo20WMP_25TkHSNBgvtI1BZyuorO_rcvL8t-mrkysddF6M_vR2anL0t_89_94c8RFM9oqqOw8DF90WRp6FgZAYmGSkbagfAkeyi1XJeTDlU718V25snr3PRQ

      Line A. \(y=0.3x+1.5\)

      Line B. \(y=0.3x+0.5\)

      Line C. \(y=0.2x+1\)

      Line D. \(y=0.15x+0.75\)

      Line E. \(y=0.05x+1.25\)

    4. Why do you think this line fits the data best?

    LSR Line

    1. The least squares regression line (LSR line) is a linear model that puts roughly half of your scatterplot data above the line and roughly half of your scatterplot data below the line. This line is also called the line of best fit as it is the line from which the points in the scatterplot deviate the least from. What do the values -2, 2, and 0 on the graph below represent?

      AD_4nXdWWLJlu4TR6feo4J8_-FekUnsGhaigr24tCkGG9NUqf-p-yhuvUoLfX8I408m7Vmrl4eZpKBREu3Va0GP0_XHR6CzZrIjioyf_919347vaybz07MEHeqzj-dl-8h07h_WCxmyGXtKMDbsDJOHQfRZ7oR9Gkeyi1XJeTDlU718V25snr3PRQ

    2. What do the values 4, 4, and 0 on the graph represent?

      AD_4nXcFj5EJ0_P6PXxfmuDLjKtZCOHYMvvgAxhs3FwkkL5FeFLFRMqqf9_7vAWXrWaxp0Za9OCAwwxxaqrNias8-b2DkNjWfS6AUWvWipJlovGQv3H9OOUZhTPy4XcOctmCypBPNRNsx4FmNzr75tCGb3Ha6YMkeyi1XJeTDlU718V25snr3PRQ

    3. Now drag the line by dragging the two red dots at this desmos graph. You can use the QR code below to access the desmos graph.

      AD_4nXdfjBy5fFeXd0fRYTxSXZByuYvM7c8SZufT4JcZe-jzxfVBuu9BswS3iKLRkPl4AUUCq5SDGyyo9a_n3rMYQAuPkIZ2BeGiQoCrAk_tNdJmMF6tBoFOdCEOWzEihLqzXoIYuME3nas8IPaw-q5oUY87bQIkeyi1XJeTDlU718V25snr3PRQ

      1. What is the smallest you can make the sum of the squared error about the line?
      2. What do you think is true about the line with the smallest sum of squared errors?

    For this bivariate set of data, the smallest sum of squared error about the line is 6. The equation of the line of best fit is \(\hat{y}=0.5 x+1.5\).

    AD_4nXc_7KoSnvaf80gO2NDIkkLqujbXHxy4wMGahO6FUXftIgKBtRSsudAA-inzA-CqglNpn5FFcD-on177SkWEzc0jX60csIk2Qps9sYfFXDFzNYQ0KL0aW0JcDrPRyZWNMxGu8WdkIdU8Ot979NSEVktkZQ68keyi1XJeTDlU718V25snr3PRQ

    Residuals

    The residual is the difference (or vertical distance) between the ACTUAL value of a data set \((y)\) for an \(x\)-value and the value that your line PREDICTED \((\hat{y})\) for that \(x\)-value.

    1. Let’s return to the original example. The line of best fit is approximately \(\hat{y}=0.2 x+1\) (the line that passes through point C). Let’s look at how much the data deviates from the linear model. The residual for the point (1,1) has been computed and entered in the table below. Complete the table in the same manner.

      x

      y

      \(\hat{y}\)

      \(y-\hat{y}\)

      1

      1

      \(\hat{y}=0.2(1)+1=1.2\)

      \(1-1.2=-0.2\)

      2

      3

         

      5

      1

         

      6

      2

         

      9

      5

         

      10

      2

         
    2. Here's one way to measure how well a line fits a data set:
      1. Square all the residuals.
      2. Add up those squares.

    The smaller the result, the better the fit. Calculate the square of each residual and enter it in the table. The first squared residual has been computed in the table. Complete the table.

    x

    Residual: \(y-\hat{y}\)

    Squared Residual

    1

    \(-0.2\)

    \((-0.2)^2=0.04\)

    2

       

    5

       

    6

       

    9

       

    10

       

    The sum squared residuals is _______.

    Calculating and Interpreting Values in the Equation of the LSR Line

    1. Use the following steps to calculate the equation of the line of best fit:
      1. Open this file that contains randomly collected data comparing the number of cricket chirps and the temperature. Copy the data from the file by highlighting the data and clicking copy.
        1. Alternatively, you can access the data set using this QR code:

          AD_4nXeMeb3FXmvA-QHlK4rDGVYDhl8MnOTwz3geE-mK8IKxA2qPRodZU3KF_7ijtldfoiTLVz-cm-9REsSeqQsUgSnVpnQhpNxqRenIzyW4Gj-n6V4xf4ziHsmYGcLou5MEndXEkkHQ0-G4N2c8zcsc4xEdQPskeyi1XJeTDlU718V25snr3PRQ

      2. Open https://www.desmos.com/calculator and paste the data into the first line by clicking paste.
      3. In the second line, calculate the linear correlation coefficient, \(r\), by typing in \(\operatorname{corr}\left(x_1, y_1\right)\).

        r= __________

      4. Compute the sample means, \(\bar{x}=\operatorname{mean}(x 1)\) and \(\bar{y}=\operatorname{mean}(y 1)\), and sample standard deviations, \(s_x=\operatorname{stdev}(x 1) \text { and } s_y=\operatorname{stdev}(y 1)\).

        \(\bar{x}=\) __________

        \(\bar{y}=\) __________

        \(S_x=\) __________

        \(S_y=\) __________

      5. Calculate the slope, \(m=\frac{r \cdot s_y}{s_x}\), of the line of best fit.
      6. The slope of the line is the predicted change in y for every unit change in x. Interpret the slope of the equation of the line of best fit in context (the units of x are number of chirps, and the units of y are degrees fahrenheit).
      7. Calculate the y-intercept, \(b=\bar{y}-m \cdot \bar{x}\), for the line of best fit.
      8. The y-intercept of the line of best fit is the predicted y-value when the x-value is 0. Interpret the y-intercept in context.
      9. Write the equation you found. Use the line of best fit to predict the temperature near a cricket that chirps 24 times.

        \(\hat{y}=\) ______________.

    2. One of the crickets chirped 16 times and the temperature near it was 71.6 degrees Fahrenheit. Calculate the residual for this cricket.

      AD_4nXcYj3OfX7YB7ER1vN606keRZLegKPpKVgfBC9TmiD8xnU8KBhGWSylRbD1TpBAprMGcimihIQyMviFpzWSmV8bK7XHZpEKPZxCnPbN_iElI-ZBk4BRY4gv1k2n4KOZMwztDv3ZE7VpD7nPAjVyD9-wx5Ji5keyi1XJeTDlU718V25snr3PRQ

    3. In desmos, type in y1~mx1+b to check your work.

      \(\hat{y}=\) ______________

      AD_4nXflBcDsC0k9G2EvDwBlIFt-XRlTdESAIsg-zEUdMXshyyivY6fIflDN7bvZ23C6zgd_1td8YXe0HTNqRFH7qqF6nzUeOxIwhZclHTZIYC1Q-U8edNqrt19P10aE1SEQMO-d6Tvdk2CqtepJVdH31VrfxPchkeyi1XJeTDlU718V25snr3PRQ


    This page titled 9.3: The Line of Best Fit is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Hannah Seidler-Wright.

    • Was this article helpful?