Skip to main content
Statistics LibreTexts

5.4: The Working-Hotelling Bands

  • Page ID
    57727
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Building upon our exploration of confidence intervals for individual predictions, one last confidence interval we may be interested in is a confidence interval for the regression line itself. This concept extends beyond predicting a single future observation to instead defining a confidence region for the entire conditional mean function — the true average response for all possible values of the predictor. To address this, we employ Working-Hotelling (1929) confidence bands, which construct a set of simultaneous confidence intervals for the expected value of the response across the entire range of X. Unlike pointwise intervals that guarantee coverage at a single x-value, this method creates a hyperbolic band around the fitted line that ensures, with a specified confidence level (e.g., 95%), that the entire true regression line lies within the band simultaneously. This broader, more conservative approach is essential when making inferences about the underlying linear relationship as a whole.

    ✦•················• ✦ •··················•✦

    Note that all of the confidence intervals in this chapter (except for the non-central \(\sigma^2\)) have been of the form

    \begin{equation}
    \text{point estimate}\ \pm\ K\ \times\ se
    \end{equation}

    That is because they were confidence intervals for a measure of center. The Working-Hotelling (1929) confidence band for the regression line follows this format. It is

    \begin{equation}\label{eq:lm3-workinghotelling}
    \left( b_0 + b_1 x \right)\ \pm\ F_{2, n-2}\ \times\ \sqrt{\ \mathrm{MSE}\ \left( \frac{1}{n} + \frac{(x-\bar{x})^2}{S_{xx}} \right) }
    \end{equation}

    The proof is very interesting. The key is to focus on the joint distribution of \(b_0\) and \(b_1\). This joint distribution is bivariate Normal. Thus, "confidence intervals" take the form of confidence ellipses with the same meaning and interpretation. However, as is common for confidence regions, the distribution of interest is the Chi-squared, instead of the Normal. Why?

    Ans: Think about the formula for an ellipse.

    Finally, the problem is transformed from the \(\beta_0\)-\(\beta_1\) plane to the \(x\)-\(y\) plane.

    Believe it or not, the hardest part of the proof is the algebra.

    Question: So, where does the F distribution come from in the formula?

    Ans: The same place as the Student's t distribution in the univariate case: the fact that we do not know the population variances involved.

    Note

    Technically, Working and Hotelling only worked in the case of knowing the population variances \(\sigma_0^2\) and \(\sigma_1^2\) , which led to a Chi-square distribution in the formula. This is because the F distribution had not been invented (or discovered) yet. It was not until George W. Snedecor in 1934 that we were able to take that final step.


    The rest of this section proves this result. It progresses in three phases.

    The Proof of Working-Hotelling (1929)*

    This extra proof follows this structure.

    1. First, it determines the joint distribution of the OLS estimators \(\mathbf{b}\).
    2. Second, it thinks about confidence regions in that coordinate system.
    3. Finally, it transforms that distribution of \(\mathbf{b}\) into a distribution of \(\mathbf{\ell}\), the lines.

    That is all. The first part is rather straightforward and should remind us of what we have already done in this chapter. The second is a simple exercise in algebra. The third is intense algebra and geometry.

    The reason I am providing this proof is because this is probably the first time you have been exposed to thinking about a problem by working on a similar problem, then transforming it. Thus, the most important thing about this proof is not the actual proof, but the logic behind it that you can take with you to other problems.

    Part 1: The Joint Distribution of \(b_0\), \(b_1\)

    The first thing to note is that the line of best fit depends on both \(b_0\) and \(b_1\), which are random variables. Thus, to speak of a confidence region for the regression line, we can first look at the joint confidence region of \(b_0\) and \(b_1\). From previous theorems, we determined that the OLS estimator of \(\mathbf{B}\) has this distribution:

    \begin{equation}
    \mathbf{b} \sim N\left( \mathbf{B};\, \sigma^2\left(\mathbf{X}^\prime\mathbf{X}\right)^{-1/2} \right)
    \end{equation}

    That is, their joint distribution is a bivariate Normal distribution. Bivariate Normal distributions have this form

    \begin{equation}\label{eq:lm3-bivariateNormal}
    f(\mathbf{b}) = \frac{1}{2\pi} (\det \mathbf{\Sigma})^{-1/2} e^{-\frac{1}{2} (\mathbf{b}-\mathbf{B})^\prime \mathbf{\Sigma}^{-1}(\mathbf{b}-\mathbf{B})}
    \end{equation}

    In Equation \(\ref{eq:lm3-bivariateNormal}\), \(\mathbf{\Sigma}\) is the covariance matrix of \(\mathbf{b}\), which we already calculated to be

    \begin{equation}
    \sigma^2(\mathbf{X}^\prime\mathbf{X})^{-1/2}
    \end{equation}

    Question: What does this distribution look like???

    fig-ch01_patchfile_01.jpg fig-ch01_patchfile_01.jpg
    Figure \(\PageIndex{1}\): Two representations of a Bivariate Normal distribution. The left panel is a simple scatter plot with ellipses of equal density marked. The right panel is an attempt at a three-dimensional density mesh plot. You can think of the red ellipses on the left as being horizontal cuts through the graphic on the right.

    Figure \(\PageIndex{1}\), above, is a plot of the \(b_0\) and \(b_1\) values generated from the distribution of \(\mathbf{b}\). The red curves in the left panel are ellipses of equal density. The smallest contains 50% of the distribution. The outermost contains 99% of the distribution. Thus, a 99% confidence region for \(\mathbf{B}\) is inside that outer ellipse.

    This point is rather important. We now have a confidence region for the estimators \(b_0\) and \(b_1\). This means, with a little algebra, we have a confidence region for the lines of best fit.

    Part 2: The Confidence Region

    Let us return to Equation \(\ref{eq:lm3-bivariateNormal}\). The only place that the \(\mathbf{b}\) estimator occurs is in the exponent. So, it would behoove us to look closely at the exponent \((\mathbf{b}-\mathbf{B})^\prime \mathbf{\Sigma}^{-1}(\mathbf{b}-\mathbf{B})\). If we can get a distribution for it, then we have simplified the problem a bit.

    Lemma \(\PageIndex{1}\)

    The quantity \((\mathbf{b}-\mathbf{B})^\prime \mathbf{\Sigma}^{-1}(\mathbf{b}-\mathbf{B}) \sim \chi^2_{2}\).

    Proof.
    Recall that the \(\chi^2\) distribution is defined as the sum of squared standard Normal random variables, with \(\nu\) of them being independent. Yeppers, that is the definition. Only later did we find a probability density function for it.

    So, we have two things to do for this part of the proof:

    1. First, show that the quantity is the sum of squared standard Normal distributions.
    2. Second, to determine how many are independent.

    Both parts are trivial.

    First, the distribution of \(\mathbf{b}\) is \(N\left( \mathbf{B};\, \mathbf{\Sigma} \right)\). Thus,

    \begin{align}
    \mathbf{b}-\mathbf{B} &\sim N\left(\mathbf{0};\, \mathbf{\Sigma} \right) \\[1em]
    \mathbf{\Sigma}^{-1/2}(\mathbf{b}-\mathbf{B}) &\sim N\left(\mathbf{0};\, \mathbf{I}_2 \right) \\[1em]
    \left( \mathbf{\Sigma}^{-1/2}(\mathbf{b}-\mathbf{B})\right)^\prime\left( \mathbf{\Sigma}^{-1/2}(\mathbf{b}-\mathbf{B}) \right) &\sim \chi^2_{\nu=2} \\[1em]
    (\mathbf{b}-\mathbf{B})^\prime \mathbf{\Sigma}^{-1}(\mathbf{b}-\mathbf{B}) &\sim \chi^2_{\nu=2} \label{eq:whellipse}
    \end{align}

    The second part is determining the number of degrees of freedom, \(\nu\). In general, the number of degrees of freedom is the rank of the matrix. Here, the rank of \((\mathbf{b}-\mathbf{B})^\prime \mathbf{\Sigma}^{-1}(\mathbf{b}-\mathbf{B})\) is \(2\), the number of parameters in \(\mathbf{b}\).

    Thus, we have shown that \((\mathbf{b}-\mathbf{B})^\prime \mathbf{\Sigma}^{-1}(\mathbf{b}-\mathbf{B}) \sim \chi^2_{2}\).

    \(\blacksquare\)

    By the way, in the simple linear regression case, what is \((\mathbf{b}-\mathbf{B})^\prime \mathbf{\Sigma}^{-1}(\mathbf{b}-\mathbf{B})\)? It is

    \begin{equation}
    \frac{\big(b_0 - \beta_0\big)}{\sigma_0^2} + \frac{\big(b_1 - \beta_1\big)}{\sigma_1^2} = \chi^2_{\nu=2}
    \end{equation}

    So, we have determined a confidence region for \(\mathbf{b} = \left[ b_0, b_1 \right]^{\prime}\). All we have to do is transform from \(\mathbf{b}\) coordinates to (\(x, y\)) coordinates. That is done in the next section.

    Part 3: The Envelope for \(\ell\)

    The previous section gave us a distribution for the quantity \((\mathbf{b}-\mathbf{B})^\prime \mathbf{\Sigma}^{-1}(\mathbf{b}-\mathbf{B})\). That is, we now have a confidence region for \(b_0\) and \(b_1\) jointly (see left panel of Figure \(\PageIndex{2}\), below). This final part of the proof finds the equivalent region in terms of \(x\) and \(y\) (see right panel of Figure \(\PageIndex{2}\), below). In other words, given that the envelope for \(\mathbf{b}\) is elliptical, what is the envelope for the equivalent lines?

    fig-ch01_patchfile_01.jpg
    Figure \(\PageIndex{2}\): A graphic showing the distribution of the \(\mathbf{b}\) matrix on the left and the corresponding lines on the left. The orange dot identifies the orange line. On the left, the shape is clearly elliptical. What is the shape on the right? Hyperbolic.

    It turns out that the envelope for the lines is a hyperbola! This leads to Equation \(\ref{eq:lm3-workinghotelling}\).

    Since the confidence region in \(b_0\)-\(b_1\)-space is an ellipse (see equation \(\ref{eq:whellipse}\)), we have this

    \begin{equation}
    \frac{(b_0-\beta_0)^2}{\sigma_0^2} + \frac{(b_1-\beta_1)^2}{\sigma_1^2} = \chi^2
    \end{equation}

    Here, "\(\chi^2\)" is the appropriate quantile for the confidence region.

    Note that there are two parameters in this equation, \(b_0\) and \(b_1\). As usual, since we are trying to get a function in terms of \(x, y, \beta_0\), and \(\beta_1\), we try to eliminate them. First, we solve \(y = b_0 + b_1 x\) for \(b_0\), substitute into the equation above, and simplify.

    This gives

    \begin{equation}
    \frac{ \big(y - \beta_0 - b(x-\bar{x}) \big)^2}{\sigma_0^2} + \frac{\big(b_1-\beta_1\big)^2}{\sigma_1^2} = \chi^2 \label{eq:whStep2}
    \end{equation}

    Note that this is a family of lines, parametrized by \(b_1\). Each line is defined by parameters \(\beta_0, \beta_1, \sigma_0^2\), and \(\sigma_1^2\) and the pairs (\(x,y\)).

    Because we have set the boundary of the ellipse to be constant (we set the confidence level), we take the derivative of the above with respect to \(b_1\) to get

    \begin{equation}
    \frac{-(x - \bar{x}) \left( y - \beta_0- b_1(x-\bar{x})\right)^2}{\sigma_0^2} + \frac{b_1-\beta_1}{\sigma_1^2} = 0
    \end{equation}

    Solving this for \(b_1\), substituting this in to \(\ref{eq:whStep2}\), and simplifying gives us the answer

    \begin{equation}
    y = \beta_0 + \beta_1(x-\bar{x}) \pm \sqrt{ \chi^2_2\ \big(\sigma_0^2 + (x-\bar{x})^2 \sigma_1^2 \big)}
    \end{equation}

    Note

    This is as far as Working and Hotelling got. They made the assumption that we knew the population variances, \(\sigma_0^2\) and \(\sigma_1^2\). In reality, we never know these, and we use \(s_0^2\) and \(s_1^2\) in their stead.

    This additional source of uncertainty (variability) is important... and easily handled. When in the univariate cases, it changes our distribution from a Normal to a Student's t distribution. In this bivariate case, it changes the \(\chi^2\) into an F distribution. This is what Snedecor (1934) offered to this problem.

    Because of this, we have our final result:

    \begin{equation}
    y = \beta_0 + \beta_1(x-\bar{x}) \pm \sqrt{2\ F_{2,n-2} \left( \mathrm{MSE}\ \left( \frac{1}{n} + \frac{(x-\bar{x})^2}{S_{xx}} \right) \right)}
    \end{equation}

    Exploring the Results

    Alright. So, the confidence region for the regression line is a hyperbola. What do we know about hyperbolas? A couple of things. First, they have a center.

    Corollary \(\PageIndex{2}\): Center of Gravity

    The center of gravity for the confidence region for the regression line is \(y = b_0 + b_1(x - \bar{x})\)

    Second, they have asymptotes.

    Corollary \(\PageIndex{3}\): Asymptotes

    The asymptotes for the confidence region are \(y = \beta_0 + \beta_1 \pm \chi \sigma_1)(x-\bar{x})\)

    Using the Bounds

    fig-ch01_patchfile_01.jpg
    Figure \(\PageIndex{2}\): Graphic showing the relationship between the property crime rate in the United States in 1990 and the violent crime rate in 2000. The thick link is the line of best fit showing the relationship between these variables. The two hyperbolas are the 95% Working-Hotelling bounds for the line.

    This graphic illustrates the Working-Hotelling bounds. We are 95% confident that the true relationship between the property crime rate in 1990 and the violent crime rate in 2000 fits between the hyperbolas.

    Here is the code to get the graphic:

    library(KnoxStats)
    
    dt = read.csv("http://rfs.kvasaheim.com/data/crime.csv")
    summary(dt)
    attach(dt)
    
    mod = lm(vcrime00 ~ pcrime90)
    summary(mod)
    
    newX = seq(min(pcrime90),max(pcrime90))
    ybounds = predictWH(mod, newdata=data.frame(pcrime90=newX))
    yest = predict(mod, newdata=data.frame(pcrime90=newX))
    
    
    par(mar=c(4,4,1,2), las=1)
    par(font.lab=2, cex.lab=1, cex.axis=0.9)
    par(xaxs="i", yaxs="i")
    
    plot.new()
    plot.window(xlim=c(0,9000), ylim=c(0,1600))
    
    axis(1); axis(2)
    title(xlab="Property Crime Rate (1990)", line=2.5)
    title(ylab="Violent Crime Rate (2000)", line=2.75)
    
    points(pcrime90,vcrime00, pch=21, bg="pink")
    lines(newX,ybounds$lcb)
    lines(newX,ybounds$ucb)
    lines(newX,yest, lwd=2)
    
    

    This page titled 5.4: The Working-Hotelling Bands is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Ole Forsberg.

    • Was this article helpful?