Skip to main content
Statistics LibreTexts

Ch 12.3 and Ch 12.1 Linear regression

  • Page ID
    15928
  • Ch 12.1 Linear Equations

    A linear equation has the form of \( y = a + bx \) where the graph is in the form of a line.

     y is the dependent variable or explanatory variable.

     x is the independent variable or predictor or response variable.

    The goal is to use x to predict y.

    clipboard_e5b6fb966b2c40475f762bb684fd8c80c.png

    when b > 0,  the line has a positive slope, graph (a ).

    when b < 0, the line has a negative slope, graph (c).

    when b = 0, the line is a horizontal line, graph (b).

    The value of a is called y-intercept, it is the value of y when x = 0.

     

    Ex1. The cost of ordering x items includes a fix shipping cost of $4.99 and $3.20 per item. Write the cost y and number of item x.  Interpret the slope and y intercept of the equation.

    Ans:   equation is  \( y = 4.99 + 3.2 x \)

    slope is 3.2 that is a cost of $3.2 per item.

    y-intercept is  4.99 which is the cost when 0 item are ordered. 4.99 is not meaningful in real life.

     

    Ch 12.3 The regression equation

    Match pairs sample can be used to find the equation of the “best fit line” also known as “linear regression line” or “least-squares line”.

    The line of best fit is used to predict y given a known value of x.   (note: the prediction is a point estimate.)

    Terms: Given a matched pair data (x, y)

    x – explanatory variable, independent variable

    y – response variable, predictor variable, dependent variable.

    Line of best fit is   \( \hat{y} = b_0 + b_1 x \) 

    where \( \hat{y} \) is the predicted value of y. (y is the observed value in the data.)

    where b0 is the y-intercept (predicted y value when x = 0)

              b1 is the slope (rate of change of y per change of x)

     

    Coefficient of determination (r2)

     r2  shows the proportion of variation of y that can be predicted by change of x. It tells how good linear prediction is.

    How to determine the line of best fit?

    The criterion to determin the line that is better than all others is based on the vertical distances between the original data points and the regression line. The distance is also known as residuals.

    Residual (ε)= observed y – predicted y  is  \( y - \hat{y} \).

    line of best fit

    The best fit line is the line that satisfy the “least-squares proprety” if the sum of squares of the residuals (SSE) is the smallest sum possible. (Calculus are used to build this.)

     

    This also results in \( (\bar{x}, \bar{y}) \)  always on the line.

     

    Find equation of line of best fit:

    Method 1: (use Statdisk)

    - Enter match data to two columns of Statdisk, use Analysis/Correlation and Regression/

     Enter significance, select data columns, evaluate

    output: b0 and b1 for  equation is \( \fbox{ \(\hat{y} = b_0 + b_1 x\) } \)

              x is dependent variable,  \( \hat{y}  \)= predicted value of y.

    Method 2: use formula \( b_1 = r \frac{s_y}{s_x}  ,  b_0 = \bar{y} - b_1 \bar{x} \)

    Note: slope has the same sign as r.

                 use x to predict or estimate y.

                 the line of best fit is different if x and y switch.

     

    Ex1. Given following matched pair data:

    matched pair data

    a) Find the best fit-line. Interpret slope.

    Enter shoe size and scores to Statdisk. Cick Analysis/Correlation and Regression/

    Enter significance, select data columns, evaluate.

    output:  b0 = 3.861, b1=8.474 (round to 3 dec.  places),  r2 = 0,767 or 76.7%, \( \hat{y} = 3.861 +8.474 x \)  is the linear regression line.

    The score will increase 8.574 points for every increase in shoe size of the child.

    b)  Find correlation of determination. Interpret in the context of this problem.

    Ans:

    r2 = 76.7% means 76.7% of variation in math scores can be predicted by the variation in shoe size.

     

    • Was this article helpful?