Skip to main content
Statistics LibreTexts

10.3: Generalized Least Squares

  • Page ID
    57752
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Both ordinary least squares and weighted least squares requires the errors be independent. Reality does not always meet this requirement. If the dependent variable consists of repeated measures on one unit over time, such as in modeling stock prices, it is quite likely that the residuals will be correlated. Also, if the dependent variable is measured on geographic structure, such as states in a country or trees in a forest, it is also likely that errors of near units are correlated.

    In such examples, the covariance matrix of \(\mathbf{E}\) will not be diagonal. Thankfully, it is a covariance matrix, and therefore positive definite under the usual assumption of no multicolinearity (The Appendix of Matrices, Section 20.4: Other Terms and Operations). Since it is positive definite, it is invertible. Thus, we may be able to do a trick not unlike what we did for weighted least squares.

    ✦•················• ✦ •··················•✦

    The Mathematics

    For a reminder, here are the model equations for ordinary, weighted, and general least squares:

    Ordinary Least Squares     Y = XB + E
    Weighted Least Squares     Y = XB + E
    General Least Squares      Y = XB + E
    

    They sure do look similar. That's because this is still the classical linear model (CLM). The requirements on the residuals differs, however:

    Ordinary Least Squares     E ~ N(0; σ2 I)
    Weighted Least Squares     E ~ N(0; σ2 D)
    General Least Squares      E ~ N(0; Σ)
    

    For ordinary least squares, the covariance matrix of the residuals is a constant multiple of the identity matrix, \(\mathbf{I}\). This indicates the residuals are independent and have the same variance (uncertainty). For weighted least squares, the covariance matrix of the residuals is a constant multiple of a diagonal matrix, \(\mathbf{D}\). This indicates the residuals are independent, but possibly with unequal variances.

    For generalized least squares (GLS), the covariance matrix is (a constant multiple of) a symmetric, positive definite matrix, \(\Sigma\). This indicates the residuals are possibly correlated and with possibly unequal variances.

    As with weighted least squares, you do need to know the structure of the covariance matrix. This requirement is sometimes met by the structure of the problem. The following are two examples showing how one can determine the \(\Sigma\) matrix. An understanding of these examples is not needed. They are here only to illustrate that there are times \(\Sigma\) can be determined from the problem.

    GLM for Time Series Issues

    When data are collected on a single unit over time, the measurements will tend to be correlated... called "temporal correlation" (or "serial correlation" or "autocorrelation"). For instance, the unemployment rate in Ruritania over the past 20 years is 11.35, 11.41, 11.12, 11.08, 10.93, 10.86, 10.96, 11.05, 11.10, 10.87, 10.79, 10.76, 10.94, 10.94, 10.92, 11.01, 11.04, 11.16, 11.13, and 11.14.

    Modeling.
    Let us fit this using ordinary least squares regression, then examine the residuals for autocorrelation (correlation between subsequent values).

    unemp = c(11.35, 11.41, 11.12, 11.08, 10.93, 10.86, 10.96,
        11.05, 11.10, 10.87, 10.79, 10.76, 10.94, 10.94, 10.92,
        11.01, 11.04, 11.16, 11.13, 11.14)
    year = 1:20
    
    mod = lm(unemp~year)
    E = residuals(mod)
    
    autocor.test(E)
    

    Note the sample autocorrelation is 0.719 with a p-value of 0.0005 and a 95% confidence interval from 0.393 to 0.884. The p-value indicates the autocorrelation is not 0. The confidence interval indicates that the residuals are moderately-to-highly correlated. In other words, adjacent observations are not independent, as both ordinary and weighted least squares require. Really, this makes sense because next year's unemployment rate will be heavily influenced by this year's rate.

    Example \(\PageIndex{1}\): Time Series Example

    Given that the correlation between adjacent measurements is 0.500, determine the covariance matrix.

    Solution.
    There are many ways of modeling such a situation. One is called "Autoregressive-1" or AR(1) or ARIMA(1,0,0). This model assumes that the primary correlation is only directly between adjacent years. The covariance matrix, \(\Sigma\), would have this structure if the correlation between those adjacent years is \(\rho=0.500\):

    \begin{equation}
    \Sigma\ =\ \sigma^2\ \left[ \begin{matrix}
    1 & 0.5 & 0.25 & 0.125 & \cdots & \\
    0.5 & 1 & 0.5 & 0.25 & \cdots & \\
    0.25 & 0.5 & 1 & 0.5 & \cdots & \\
    0.125 & 0.25 & 0.5 & 1 & & \\
    \vdots & \vdots & \vdots & \vdots & \ddots & \\
    & & & & & 1 \\
    \end{matrix}\right]
    \end{equation}

    You can get this particular matrix using this R code:

    Sigma = diag(20)
    Sigma = 0.5^abs(row(Sigma)-col(Sigma))
    


    Note that the matrix has 1s along the diagonal and higher powers of 0.5 farther from the diagonal. The zeroes arise from the fact that the matrix is \(20 \times 20\); that is, e.g., the entry in cell \((1,20)\) is actually \(0.50^{19} \approx 0\).

    \(\blacksquare\)

    Note

    Again, this was just an example to show that the structure of \(\Sigma\) can be determined from some problems. There are entire sub-disciplines of statistics that examine such serial correlation. This sub-discipline is called time series analysis.

    GLM for Geographic Issues

    When data are collected from geographical units, such as neighborhoods, counties, or states, the residuals may be spatially correlated. This is a violation of the independence assumption of ordinary least squares. How that geographic correlation is modeled is up to the expert (researcher). The subject of spatial modeling is extensive and quite interesting... and important. It can, with appropriate matrices, be extended to modeling three-dimensional spatial correlation over time. If you have the opportunity, I suggest studying this topic (Bivand, Pebesma, and Gómez-Rubio 2013, Blangiardo and Cameletti 2013, Sen 2016). If nothing else, it leads to fun maps!

    fig-ch01_patchfile_01.jpg
    Figure \(\PageIndex{1}\): A map showing the administrative divisions (kraj) or the Kingdom of Ruritania. For this example, note that no kraj abuts all other kraj. Yes, I know that this is a map of San Marino. I am working on putting together a map of Ruritania.

    Figure \(\PageIndex{1}\) is a map of Ruritania showing the nine kraj. Note that some kraj abut some kraj but not others. For instance, region CS does not touch region CC. If we are trying to model the spread of something (disease, unemployment, wealth), we may decide to take into consideration the fact that some units neighbor others. Thus, from the map above, we know there is a first-level transmission between CS and CD but not between CS and CF.

    Example \(\PageIndex{2}\): Geographic Adjacency Example

    Let us determine a matrix describing the adjacencies for the nine kraj.

    Solution.
    Check that the following is the adjacency matrix for Ruritania

    \begin{equation}
    \Sigma\ =\ \left[ \begin{matrix}
    1 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \\ % C-Aquaviva
    1 & 1 & 0 & 1 & 1 & 1 & 0 & 1 & 1 \\ % C-Borgo
    0 & 0 & 1 & 0 & 0 & 1 & 0 & 0 & 1 \\ % C-Chiesanouva
    0 & 1 & 0 & 1 & 1 & 0 & 0 & 1 & 0 \\ % C-Domagnano
    0 & 1 & 0 & 1 & 1 & 1 & 1 & 0 & 0 \\ % C-Faetano
    0 & 1 & 1 & 0 & 1 & 1 & 1 & 0 & 1 \\ % C-Fiorentino
    0 & 0 & 0 & 0 & 1 & 1 & 1 & 0 & 0 \\ % C-Montegiardino
    0 & 1 & 0 & 1 & 0 & 0 & 0 & 1 & 0 \\ % C-Serravalle
    1 & 1 & 1 & 0 & 0 & 1 & 0 & 0 & 1 \\ % C-San Marino
    \end{matrix} \right]
    \end{equation}

    It is important to ensure the kraj order for the columns is the same as for the rows. The kraj ordering is: CA, CB, CC, CD, CF, CI, CM, CS, CSM.

    \(\blacksquare\)

    Note that the adjacency matrix is symmetric. Why will this matrix always be symmetric, regardless of the map? Now, that we know \(\Sigma\) is symmetric, we can use the discussion following this Lemma to conclude that \(\Sigma\) is positive definite, which indicates it is allowable as a covariance matrix.

    Note on Adjacency

    One area of geographical analysis tries to decide what adjacency rules are appropriate for a given research question. This example uses a simple 0-1 scheme. Other schemes include distances (measured in some manner).

    We do not know the constant multiplier, \(\sigma^2\). No probs. We only need to know the structure of the covariance matrix. We use the data to estimate the constant multiplier \(\sigma^2\).

    Also, note that the analysis based on this covariance structure is only as good as our assumption that the contagion spreads through touch. If it spreads based on some sort of distance, then the \(\Sigma\) is not correct and we will need to create an appropriate covariance matrix given our scientific understanding... if such exists.

    Finally, let me reiterate a point I made above. The purpose of this example is only to illustrate that these covariance structures can be determined from the problem without resorting to estimating them from the data.

    Caution

    However, there is a lesson for all of us here. If we do not know the correct structure of the correlation matrix, then we should use several and see how sensitive our estimates, confidence intervals, and p-values are to that matrix. The results may be very sensitive, which is not a good position to be in, especially if we do not know the right mode of transmission.

    If the estimates, etc. are not sensitive to our choice of covariance matrix, then we need not be as concerned.

    The rule is to explore all models that make sense and see how important our assumptions are to our results.

    One Last Note

    Above, we have only focused on being able to determine the structure of the covariance matrix (at least to a scalar multiple). There is one more thing that we need to pay attention to:

    What is the square root of \(\Sigma\)?

    When we introduced \(\mathbf{D}^{1/2}\) in the previous section, we knew we could calculate it. After all, one square root for a diagonal matrix would be

    \begin{equation}
    \left[\mathbf{D}^{1/2}\right]_{i,j}\ =\ d_{i,j}^{1/2}
    \end{equation}

    That is, the elements of the square root matrix are the square root of the entries of the matrix. This shortcut works because \(\mathbf{D}\) is diagonal.

    Sadness Abounds

    In general, \(\Sigma\) does not have to have a well-defined square root. Some do, but some do not. Without \(\Sigma^{1/2}\), calculating the GLS estimates is not possible using this method.

    Sadness abounds. 😿


    This page titled 10.3: Generalized Least Squares is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Ole Forsberg.

    • Was this article helpful?