Skip to main content
Statistics LibreTexts

10.2: Multiple Linear Regression

  • Page ID
    56186
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    A multiple linear regression line describes how two or more predictor variables affect the response variable \(y\). An equation of a line relating \(p\) independent variables to \(y\) is of the form for the population as: \(y = \beta_{0} + \beta_{1} x_{1} + \beta_{2} x_{2} + \cdots + \beta_{p} x_{p} + \varepsilon\), where \(\beta_{1}, \beta_{2}, \ldots, \beta_{p}\) are the slopes, \(\beta_{0}\) is the \(y\)-intercept and \(\varepsilon\) is called the error term.

    We use sample data to estimate this equation using the predicted value of \(y\) as \(\hat{y}\) with the regression equation (also called the line of best fit or least squares regression line) as: \[y = b_{0} + b_{1} x_{1} + b_{2} x_{2} + \cdots + b_{p} x_{p} \nonumber\]

    where \(b_{1}, b_{2}, \ldots, b_{p}\) are the slopes, and \(b_{0}\) is the \(y\)-intercept

    For example, if we had two independent variables, we would have a 3-dimensional space as in Figure 12-25 where the red dots represent the sample data points and the equation would be a plane in the space represented by \(y = b_{0} + b_{1} x_{1} + b_{2} x_{2}\).

    A three-dimensional coordinate system with axes for x1 and x2 forming the "floor" and the y-axis running vertically, containing a number of data points in red. A diagonal plane represents the "best fit" of these data points, with a vertical line connecting each point to the plane.
    Figure 12-25: Multiple linear regression with 2 independent variables. This photo by unknown author is licensed under CC BY-SA-NC.

    The calculations use matrix algebra, which is not a prerequisite for this course. We will instead rely on a computer to calculate the multiple regression model.

    If all the population slopes were equal to zero, the model \(y = \beta_{0} + \beta_{1} x_{1} + \beta_{2} x_{2} + \cdots + \beta_{p} x_{p} + \varepsilon\) would not be significant and should not be used for prediction. If one or more of the population slopes are not equal to zero then the model will be significant, meaning there is a significant relationship between the independent variables and the dependent variable and we may want to use this model for prediction. There are other statistics to look at to decide if this would be the best model to use. Those methods are discussed in more advanced courses.

    The hypotheses will always have an equal sign in the null hypotheses.

    The hypotheses are:

    \(H_{0}: \beta_{1} = \beta_{2} = \cdots = \beta_{p} = 0\)
    \(H_{1}:\) At least one slope is not zero.

    Note that the alternative hypothesis is not written as \(H_{1}: \beta_{1} \neq \beta_{2} \neq \cdots \neq \beta_{p} \neq 0\). This is because we just want one or more of the independent variables to be significantly different from zero, not necessarily all the slopes unequal to zero.

    Use the F-distribution with degrees of freedom for regression = \(df_{R} = p\), where \(p\) = the number of independent variables (predictors), and degrees of freedom for error = \(df_{E} = n - p - 1\), where \(n\) is the number of pairs. This is always a right-tailed ANOVA test, since we are testing if the variation in the regression model is larger than the variation in the error.

    The test statistic and p-value are the last two values on the right in the ANOVA table. The p-value rule is easiest to use since the p-value is part of the outcome, but a critical value can be found using the invF program on your calculator or in Excel using =F.INV.RT(\(\alpha, df_{R}, df_{E}\)) We can also single out one independent variable at a time and use a t-test to see if the variable is significant by itself in predicting \(y\).

    This would have hypotheses:

    \(H_{0}: \beta_{i} = 0\)
    \(H_{1}: \beta_{i} \neq 0\)
    where \(i\) is a placeholder for whichever independent variable is being tested.

    This t-test is found in the same row as the coefficient that you are testing.

    Example in Context

    Suppose we want to estimate the wage (y) of workers based on their years of education (x1) and years of work experience (x2​).

    The MLR model could be:

    \[
    \text{wage}_i = \beta_0 + \beta_1 \,\text{education}_i + \beta_2 \,\text{experience}_i + \varepsilon_i
    \]

    This would have the interpretation of

    • β1​: Change in wage from one additional year of education holding experience constant.
    • β2​: Change in wage from one additional year of experience holding education constant.

    This "holding other variables constant" interpretation is one of the key differences between SLR and MLR.

    Adjusted R2

    The R2 statistic tells us the proportion of variance in y explained by the model. In MLR, adding more explanatory variables will never decrease R2, even if the variables are irrelevant.

    To adjust for this, we use:

    \[
    \bar{R}^2 = 1 - \frac{\text{SSE}/(n-k-1)}{\text{SST}/(n-1)}
    \]

    where

    \[
    n = \text{sample size}
    \]
    \[
    k = \text{number of explanatory variables}
    \]
    \[
    \text{SSE} = \sum_{i=1}^n \hat{\varepsilon}_i^2
    \]
    \[
    \text{SST} = \sum_{i=1}^n (y_i - \bar{y})^2
    \]

    Multicollinearity

    Multicollinearity occurs when two or more explanatory variables in a regression model are highly correlated. This makes it difficult for the model to separate out the individual effect of each variable on the dependent variable. For example, suppose we model worker wages using both years of education (x1​) and highest degree earned (x2​, coded numerically). These two variables are strongly related—people with more education typically have higher degrees. This correlation means that the regression model will struggle to determine how much of the wage difference is due to extra years of schooling versus holding a higher degree.

    Consequences of Multicollinearity:

    1. Large Standard Errors for Coefficients: High correlation among explanatory variables inflates the variance of estimated coefficients, making them less precise.
    2. Unstable Coefficient Estimates: Small changes in the data can lead to large swings in coefficient values.
    3. Reduced Statistical Significance: Even variables that have a real effect might appear insignificant because of inflated standard errors.
    4. Misleading Signs and Magnitudes: Coefficients can have unexpected signs (positive instead of negative, or vice versa) if variables are highly collinear.

    While multicollinearity does not bias coefficient estimates, it makes them less reliable for inference. In applied economics, the choice of variables should balance explanatory power with minimizing redundancy among predictors.

    Review Questions

    1) A state labor department is building a model to explain variation in county-level unemployment rates. They add “number of public libraries per capita” as a predictor, and R2 rises slightly, but R2 adjusted falls.

    What does this tell us about the value of adding public library availability to the model?

    2) An economist is studying the effect of education and work experience on annual earnings to inform wage growth policy. Why might education and experience be positively correlated, and how could that affect the precision of coefficient estimates in an earnings model that has wage as the dependent variable?


    3) A city council wants to understand how a new certification requirement affects hourly wages for skilled trades. A regression model estimates that β3=−2.5 for the binary variable “No certification” (1 if the worker lacks certification, 0 if certified). How should we interpret this coefficient in the context of hourly wages?


    This page titled 10.2: Multiple Linear Regression is shared under a CC BY-SA 4.0 license and was authored, remixed, and/or curated by Rachel Webb via source content that was edited to the style and standards of the LibreTexts platform.