Skip to main content
Statistics LibreTexts

12.3: Linear Regression

  • Page ID
    • Contributed by Paul Pfeiffer
    • Professor emeritus (Computational and Applied Mathematics) at Rice University

    Linear Regression

    Suppose that a pair \(\{X, Y\}\) of random variables has a joint distribution. A value \(X(\omega)\) is observed. It is desired to estimate the corresponding value \(Y(\omega)\). Obvious there is no rule for determining \(Y(\omega)\) unless \(Y\) is a function of \(X\). The best that can be hoped for is some estimate based on an average of the errors, or on the average of some function of the errors.

    Suppose \(X(\omega)\) is observed, and by some rule an estimate \(\widehat{Y} (\omega)\) is returned. The error of the estimate is \(Y(\omega) - \widehat{Y} (\omega)\). The most common measure of error is the mean of the square of the error

    \(E[(Y - \widehat{Y})^2]\)

    The choice of the mean square has two important properties: it treats positive and negative errors alike, and it weights large errors more heavily than smaller ones. In general, we seek a rule (function) \(r\) such that the estimate \(\widehat{Y} (\omega)\) is \(r(X(\omega))\). That is, we seek a function \(r\) such that

    \(E[(Y - r(X))^2]\) is a minimum.

    The problem of determining such a function is known as the regression problem. In the unit on Regression, we show that this problem is solved by the conditional expectation of \(Y\), given \(X\). At this point, we seek an important partial solution.

    The regression line of \(Y\) on \(X\)

    We seek the best straight line function for minimizing the mean squared error. That is, we seek a function \(r\) of the form \(u = r(t0 = at + b\). The problem is to determine the coefficients \(a, b\) such that

    \(E[(Y - aX - b)^2]\) is a minimum

    We write the error in a special form, then square and take the expectation.

    \(\text{Error} = Y - aX - b = (Y - \mu_Y) - a(X - \mu_X) + \mu_Y - a\mu_X - b = (Y - \mu_Y) - a(X - \mu_X) - \beta\)

    \(\text{Error squared} = (Y - \mu_Y)^2 + a^2 (X - \mu_X)^2 + \beta^2 - 2\beta (Y - \mu_Y) + 2 \alpha \beta (X - \mu_X) - 2a(Y - \mu_Y) (X - \mu_X)\)

    \(E[(Y - aX - b)^2] = \sigma_Y^2 + a^2 \sigma_X^2 + \beta^2 - 2a \text{Cov} [X, Y]\)

    Standard procedures for determining a minimum (with respect to a) show that this occurs for

    \(a = \dfrac{\text{Cov} [X,Y]}{\text{Var}[X]}\) \(b = \mu_Y - a \mu_X\)

    Thus the optimum line, called the regression line of \(Y\) on \(X\), is

    \(u = \dfrac{\text{Cov} [X,Y]}{\text{Var}[X]} (t - \mu_X) + \mu_Y = \rho \dfrac{\sigma_Y}{\sigma_X} (t - \mu_X) + \mu_Y = \alpha(t)\)

    The second form is commonly used to define the regression line. For certain theoretical purposes, this is the preferred form. But for calculation, the first form is usually the more convenient. Only the covariance (which requres both means) and the variance of \(X\) are needed. There is no need to determine \(\text{Var} [Y]\) or \(\rho\).

    Example \(\PageIndex{1}\) The simple air of Example 3 from "Variance"

    Enter JOINT PROBABILITIES (as on the plane)  P
    Enter row matrix of VALUES of X  X
    Enter row matrix of VALUES of Y  Y
     Use array operations on matrices X, Y, PX, PY, t, u, and P
    EX = total(t.*P)
    EX =   0.6420
    EY = total(u.*P)
    EY =   0.0783
    VX = total(t.^2.*P) - EX^2
    VX =   3.3016
    CV = total(t.*u.*P) - EX*EY
    CV =  -0.1633
    a = CV/VX
    a  =  -0.0495
    b = EY - a*EX
    b  =   0.1100           % The regression line is u = -0.0495t + 0.11

    Example \(\PageIndex{2}\) The pair in Example 6 from "Variance"

    Suppose the pair \(\{X, Y\}\) has joint density \(f_{XY}(t, u) = 3u\) on the triangular region bounded by \(u = 0\), \(u = 1 + t\), \(u = 1- t\). Determine the regression line of \(Y\) on \(X\).

    Analytic Solution

    By symmetry, \(E[X] = E[XY] = 0\), so \(\text{Cov} [X, Y] = 0\). The regression curve is

    \(u = E[Y] = 3\int_0^1 u^2 \int_{u - 1}^{1 - u} \ dt du = 6 \int_{0}^{1} u^2 (1 - u)\ du = 1/2\)

    Note that the pair is uncorrelated, but by the rectangle test is not independent. With zero values of \(E[X]\) and \(E[XY]\), the approximation procedure is not very satisfactory unless a very large number of approximation points are employed.

    Example \(\PageIndex{3}\) Distribution of Example 5 from "Random Vectors and MATLAB" and Example 12 from "Function of Random Vectors"

    The pair \(\{X, Y\}\) has joint density \(f_{XY} (t, u) = \dfrac{6}{37} (t + 2u)\) on the region \(0 \le t \le 2\), \(0 \le u \le \text{max} \{1, t\}\) (see Figure 12.3.1). Determine the regression line of \(Y\) on \(X\). If the value \(X(\omega) = 1.7\) is observed, what is the best mean-square linear estimate of \(Y(\omega)\)?

    Figure one contains two lines in the first quadrant of a cartesian graph. The horizontal axis is labeled t, and the vertical axis is labeled u. The title caption reads f_xy (t, u) = (6/37)(t + 2u). The first line crosses the vertical axis one quarter of the way up the graph. It has a positive slope, and is labeled u = 0.3382t + 0.4011. It continues as a linear plot from one side of the graph to the other. The second line begins horizontally as one segment from the left to point (1, 1). The segment is labeled u = 1. After point (1, 1), the line moves upward with a positive, constant slope to point (2, 2). This segment is labeled u = t. At (2, 2) there is a vertical line continuing downward to point (2, 0).
    Figure 12.3.1. Regression line for Example 12.3.3

    Analytic Solution

    \(E[X] = \dfrac{6}{37} \int_{0}^{1} \int_{0}^{1} (t^2 + 2tu)\ dudt + dfrac{6}{37} \int_{1}^{2} \int_{0}^{t} (t^2 + 2tu)\ dudt = 50/37\)

    The other quantities involve integrals over the same regions with appropriate integrands, as follows:

    Quantity Integrand Value
    \(E[X^2]\) \(t^3 + 2t^2 u\) 779/370
    \(E[Y]\) \(tu + 2u^2\) 127/148
    \(E[XY]\) \(t^2u + 2tu^2\) 232/185


    \(\text{Var} [X] = \dfrac{779}{370} - (\dfrac{50}{37})^2 = \dfrac{3823}{13690}\) \(text{Cov}[X, Y] =\dfrac{232}{185} - \dfrac{50}{37} \cdot \dfrac{127}{148} = \dfrac{1293}{13690}\)


    \(a = \text{Cov}[X, Y]/\text{Var}[X] = \dfrac{1293}{3823} \approx 0.3382\), \(b = E[Y] - aE[X] = \dfrac{6133}{15292} \approx 0.4011\)

    The regression line is \(u = at + b\). If \(X(\omega) = 1.7\), the best linear estimate (in the mean square sense) is \(\widehat{Y} (\omega) = 1.7a + b = 0.9760\) (see Figure 12.3.1 for an approximate plot).


    Enter matrix [a b] of X-range endpoints  [0 2]
    Enter matrix [c d] of Y-range endpoints  [0 2]
    Enter number of X approximation points  400
    Enter number of Y approximation points  400
    Enter expression for joint density  (6/37)*(t+2*u).*(u<=max(t,1))
    Use array operations on X, Y, PX, PY, t, u, and P
    EX = total(t.*P)
    EX =  1.3517                   % Theoretical = 1.3514
    EY = total(u.*P)
    EY =  0.8594                   % Theoretical = 0.8581
    VX = total(t.^2.*P) - EX^2
    VX =  0.2790                   % Theoretical = 0.2793
    CV = total(t.*u.*P) - EX*EY
    CV =  0.0947                   % Theoretical = 0.0944
    a = CV/VX
    a  =  0.3394                   % Theoretical = 0.3382
    b = EY - a*EX
    b  =  0.4006                   % Theoretical = 0.4011
    y = 1.7*a + b
    y  =  0.9776                   % Theoretical = 0.9760

    An interpretation of \(\rho^2\)

    The analysis above shows the minimum mean squared error is given by

    \(E[(Y - \widehat{Y})^2] = E[(Y - \rho \dfrac{\sigma_Y}{\sigma_X} (X - \mu_X) - \mu_Y)^2] = \sigma_Y^2 E[(Y^* - \rho X^*)^2]\)

    \(= \sigma_Y^2 E[(Y^*)^2 - 2\rho X^* Y^* + \rho^2(X^*)^2] = \sigma_Y^2 (1 - 2\rho^2 + \rho^2) = \sigma_Y^2 (1 - \rho^2)\)

    If \(\rho = 0\), then \(E[(Y - \widehat{Y})^2] = \sigma_Y^2\), the mean squared error in the case of zero linear correlation. Then, \(\rho^2\) is interpreted as the fraction of uncertainty removed by the linear rule and X. This interpretation should not be pushed too far, but is a common interpretation, often found in the discussion of observations or experimental results.

    More general linear regression

    Consider a jointly distributed class. \(\{Y, X_1, X_2, \cdot\cdot\cdot, X_n\}\). We wish to deterimine a function \(U\) of the form

    \(U = \sum_{i = 0}^{n} a_i X_i\), with \(X_0 = 1\), such that \(E[(Y - U)^2]\) is a minimum

    If \(U\) satisfies this minimum condition, then \(E[(Y - U)V] = 0\), or, equivalently

    \(E[YV] = E[UV]\) for all \(V\) of the form \(V = \sum_{i = 0}^{n} c_i X_i\)

    To see this, set \(W = Y - U\) and let \(d^2 = E[W^2]\). Now, for any \(\alpha\)

    \(d^2 \le E[(W + \alpha V)^2] = d^2 + 2\alpha E[WV] + \alpha^2 E[V^2]\)

    If we select the special

    \(\alpha = -\dfrac{E[WV]}{E[V^2]}\) then \(0 \le -\dfrac{2E[WV]^2}{E[V^2]} + \dfrac{E[WV]^2}{E[V^2]^2} E[V^2]\)

    This implies \(E[WV]^2 \le 0\), which can only be satisfied by \(E[WV] =0\), so that

    \(E[YV] = E[UV]\)

    On the other hand, if \(E[(Y - U)V] = 0\) for all \(V\) of the form above, then \(E[(Y- U)^2]\) is a minimum. Consider

    \(E[(Y - V)^2] = E[(Y - U + U - V)^2] = E[(Y - U)^2] + E[(U - V)^2] + 2E[(Y - U) (U - V)]\)

    See \(U - V\) is of the same form as \(V\), the last term is zero. The first term is fixed. The second term is nonnegative, with zero value iff \(U - V = 0\) a.s. Hence, \(E[(Y - V)^2]\) is a minimum when \(V = U\).

    If we take \(V\) to be 1, \(X_1, X_2, \cdot\cdot\cdot, X_n\), successively, we obtain \(n + 1\) linear equations in the \(n + 1\) unknowns \(a_0, a_1, \cdot\cdot\cdot, a_n\), as follows.

    \(E[Y] = a_0 + a_1 E[X_1] + \cdot\cdot\cdot + a_n E[X_n]\)
    \(E[YX_1] = a_0 E[X_i] + a_1 E[X_1X_i] + \cdot\cdot\cdot + a_n E[X_n X_i]\) for \(1 \le i \le n\)

    For each \(i = 1, 2, \cdot\cdot\cdot, n\), we take (2) - \(E[X_i] \cdot (1)\) and use the calculating expressions for variance and covariance to get

    \(\text{Cov} [Y, X_i] = a_1 \text{Cov} [X_1, X_i] + a_2 \text{Cov} [X_2, X_i] + \cdot\cdot\cdot + a_n \text{Cov} [X_n, X_i]\)

    These \(n\) equations plus equation (1) may be solved alagebraically for the \(a_i\).

    In the important special case that the \(X_i\) are uncorrelated (i.e. \(\text{Cov}[X_i, X_j] = 0\) for \(i \ne j\)), we have

    \(a_i = \dfrac{\text{Cov}[Y, X_i]}{\text{Var} [X_i]}\) \(1 \le i \le n\)


    \(a_0 = E[Y] - a_1 E[X_1] - a_2 E[X_2] - \cdot\cdot\cdot - a_n E[X_n]\)

    In particular, this condition holds if the class \(\{X_i : 1 \le i \le n\}\) is iid as in the case of a simple random sample (see the section on "Simple Random Samples and Statistics").

    Examination shows that for \(n = 1\), with \(X_1 = X\), \(a_0 = b\), and \(a_1 = a\), the result agrees with that obtained in the treatment of the regression line, above.

    Example \(\PageIndex{4}\) Linear regression with two variables.

    Suppose \(E[Y] = 3\), \(E[X_1] = 2\), \(E[X_2] = 3\), \(\text{Var}[X_1] = 3\), \(\text{Var}[X_2] = 8\), \(\text{Cov}[Y, X_1] = 5\), \(\text{Cov} [Y, X_2] = 7\), and \(\text{Cov} [X_1, X_2] = 1\). Then the three equations are

    \(a_0 + 2a_2 + 3a_3 = 3\)

    \(0 + 3a_1 + 1 a_2 = 5\)

    \(0 + 1a_1 + 8a_2 = 7\)

    Solution of these simultaneous linear equations with MATLAB gives the results

    \(a_0 = - 1.9565\), \(a_1 = 1.4348\), and \(a_2 = 0.6957\).

    • Was this article helpful?