Skip to main content
Statistics LibreTexts

6.3: Homoskedasticity

  • Page ID
    57733
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    Of the core assumptions underpinning classical linear regression, the final one we will examine is the requirement that the model's residuals exhibit homoskedasticity — that is, they possess a constant variance, denoted as \(\sigma^2\), across all levels of the independent variables. This assumption was foundational to the derivations in Section 3.3: Our First Assumptions. Every instance where the term \(\sigma^2\) appeared, we implicitly relied on it being a single, stable parameter, not a variable \(\sigma^2_i\) that changes with each observation. When this assumption is violated — a condition known as heteroskedasticity, where the error variance systematically changes with the predictors — the standard machinery of inference breaks down. The ordinary least squares estimators, while still unbiased, no longer have the minimum variance among linear unbiased estimators.

    More critically, the estimated standard errors of the coefficients become biased. Consequently, the test statistics (such as t-statistics and F-statistics) lose their validity, and the confidence intervals constructed using these flawed standard errors will be incorrect, leading to potentially misleading conclusions about the significance of the predictors. Therefore, diagnosing and addressing heteroskedasticity is not a mere technicality but a crucial step in ensuring the reliability of hypothesis tests and the accuracy of the uncertainty estimates we place on our model parameters.

    ✦•················• ✦ •··················•✦

    A note on vocabulary. If the variance of the residuals is constant, we claim the residuals are "homoskedastic." Otherwise, they are "heteroskedastic." Recall that "homo" is a ancient Greek combining form meaning "same." Similarly, "hetero" means different, and "skedastic" means scatter.

    Graphical Check

    The graphical test is very similar to that for checking constant expected value. In that assumption, a residual plot was created. The middle of the vertical spread was traced out and checked to see if it was always near the zero line. For testing constant variance, a residual plot can also be used. The vertical spread of the data is traced out and checked that it does not vary too much. Note that the "vertical spread" is not the range, but the middle portion that contains about two-thirds of the data.

    There are three stereotypical shapes that suggest heteroskedasticity. These three shapes, along with a shape suggesting homoskedasticity, are presented in the figure below.

    fig-ch01_patchfile_01.jpg
    Figure \(\PageIndex{1}\): Residual plots illustrating homoskedasticity (top-left) and three typical types of heteroskedasticity: trumpet (top-right), funnel (bottom-left), and bulge (bottom-right).

    Numeric Test

    In addition to what your eyes may tell you, it may be better to also perform a numeric test of homoskedasticity.

    Note

    I recommend performing both, when possible. If the p-value on the numeric test is too low, then the graphical test either gives you clues on where the problem is, or that the problem is practically minor and can be ignored.

    There are a couple of tests that can be used. However, the usual test used in the regression area is the Breusch-Pagan test.

    The way that the Breusch-Pagan test works is to refit the model including higher-order terms of the independent variables and compare the ratio of the two \(SSE\) values to a specific distribution. Since the null hypothesis is homoskedasticity (constant variance), a ratio close to one (large p-value) indicates that the model passes this test (the higher-order terms add little to the predictive ability of the model). A sufficiently small p-value indicates the presence of heteroskedasticity (those larger-order terms should be included to the model), which means the model needs improvement.

    So, to examine the Breusch-Pagan test, let us generate our data, first according to our assumptions, then in violation of them. The Breusch-Pagan test is contained in the lmtest package as the function bptest.

    set.seed(30)
    
    x = seq(0,100)
    n = length(x)
    e = rnorm(n, m=0, s=1)
    y = 3 + 2*x + e
    
    plot(x,y, pch=21, bg="dodgerblue")
    

    The scatter plot of this data does not seem to suggest a changing variation in the residuals. Let us perform the Breusch-Pagan test to see if this numeric test also fails to detect a problem with the constant-variance assumption.

    mod = lm(y~x)
    bptest(mod)
    

    If you get

    Error: could not find function "bptest"

    then you need to load (or install and load) the lmtest package.


    Once you do, your output should look exactly like this

            studentized Breusch-Pagan test
    
    data:  mod
    BP = 0.23429, df = 1, p-value = 0.6284
    

    Because the p-value is greater than \(\alpha\), we cannot reject the null hypothesis of homoskedasticity. The model passes the test. ✅

    Let us look at what happens when the residuals are heteroskedastic.

    set.seed(30)
    x = seq(0,100)
    n = length(x)
    e = rnorm(n, m=0, s=sqrt(x))
    y = 3 + 2*x + e
    
    plot(x,y, pch=21,bg="dodgerblue")
    

    The scatter plot of this data definitely suggests increasing variation in the residuals, a trumpet shape that indicates heteroskedasticity. Let us perform the Breusch-Pagan test to see if this numeric test also detects a problem with the constant-variance assumption.

    mod = lm(y~x)
    bptest(mod)
    

    Your output should look like this

            studentized Breusch-Pagan test
    data:  mod
    BP = 8.0992, df = 1, p-value = 0.004428
    

    Because the p-value is less than \(\alpha\), we reject the null hypothesis of homoskedasticity. The model does not pass the test. ❌

    Note

    Recall from your previous statistics course that a true null hypothesis will be rejected \(\alpha\) proportion of the time and a false null hypothesis will be wrongly accepted \(\beta\) proportion of the time, where \(\alpha\) is the Type I error rate and \(\beta\) is the Type II error rate.

    Neither of these numbers can be zero without making the other 1. So, be aware that you may be wrong.

    Exploration of the Effects of Non-Constant Variance (coverage)

    Instead of looking at proofs or the distribution of p-values, let us generate data and look at the confidence intervals estimated using OLS. While it is easy to see that there is an effect using proofs, this method may make it easier to see how serious those effects are.

    The steps to estimate coverage are as follows:

    1. generate heteroskedastic data
    2. calculate the 95% confidence intervals
    3. check if the true population parameters \(\beta_0\) and \(\beta_1\) are in the intervals
    4. repeat this many, many, many times

    The first three steps check the coverage of a single random sample. It is step 4 that allows you to better estimate the coverage rate.

    If the coverage of the estimated intervals are close to the claimed 95%, then the effect of heteroskedasticity is minor.

    Here is the entire code

    set.seed(30)
    
    b0covered = numeric()
    b1covered = numeric()
    
    for(i in 1:1e4) {
      x = seq(0, 10, 0.5)
      n = length(x)
      e = rnorm(n, m=0, s=sqrt(x))
      y = 3 + 2*x + e
      mod = lm(y ~ x)
      ci = confint(mod)
      b0covered[i] = (3>ci[1,1]) && (3<ci[1,2])
      b1covered[i] = (2>ci[2,1]) && (2<ci[2,2])
    }
    

    Running this may take anywhere from a few seconds to a minute or more, depending on your computer. At the end, you will have two variables, b0covered and b1covered, each containing 10,000 Boolean values. A TRUE indicates the population parameter was covered by the confidence interval for that random sample; a FALSE indicates it was not. The proportion of TRUE values can quickly be calculated using mean(b0covered) and mean(b1covered).

    In this particular example, the proportion of times the \(\beta_0\) parameter was covered by the confidence interval was \(0.996\). Thus, the estimated coverage for \(\beta_0\) when the data are heteroskedastic in this manner is \(99.6%\). The coverage rate for \(\beta_1\) is about \(95.1%\). Both should be close to \(1-\alpha\), 95%.

    From these results, we know a couple of things.

    1. First, this amount of heteroskedasticity did not (from a practical standpoint) significantly affect the confidence intervals for the slope parameter, \(\beta_1\).
    2. Second, it did (from a practical standpoint) significantly affect it for the intercept parameter.

    In fact, because the reported confidence interval is much wider that it should have been, a researcher will reject too infrequently. This means the power of the test is too low.


    Let us try a greater level of heteroskedasticity. And, let us make it funnel instead of the above trumpet:

    set.seed(30)
    b0covered = b1covered = numeric()
    
    for(i in 1:1e4) {
      x = seq(0,10,0.5)
      n = length(x)
      e = rnorm(n, m=0, s=15-x)
      y = 3 + 2*x + e
      mod = lm(y~x)
      ci = confint(mod)
      b0covered[i] = (3>ci[1,1]) && (3<ci[1,2])
      b1covered[i] = (2>ci[2,1]) && (2<ci[2,2])
    }
    

    Note what changed: the standard deviation of the residuals. It starts high and gets smaller — a funnel shape.

    This scheme produces a coverage estimate of 88.8% for the intercept parameter and 93.9% for the slope parameter. Thus, the confidence interval for the intercept is again affected more than that of the slope parameter. In fact, the slope parameter is relatively unchanged.

    Let us try once more. This time, let us look at bulge heteroskedasticity.

    set.seed(30)
    b0covered = b1covered = numeric()
    
    for(i in 1:1e4) {
      x = seq(0,10,0.5)
      n = length(x)
      e = rnorm(n, m=0, s=11-2*abs(x-5))
      y = 3 + 2*x + e
      mod = lm(y~x)
      ci = confint(mod)
      b0covered[i] = (3>ci[1,1]) && (3<ci[1,2])
      b1covered[i] = (2>ci[2,1]) && (2<ci[2,2])
    }
    

    Again, note the common code and the slight changes. Here, the variance starts low (1), gets higher (11), then decreases again to 1 — a bulge heteroskedasticity.

    The coverage for both parameters are far from 95%. For \(\beta_0\), the coverage is 99.3%; for \(\beta_1\), 99.8%. Both indicate that the researcher will reject the hypotheses far too infrequently.

    Type I and Type II Trade-off

    The test statistics for \(\beta_0\) and \(\beta_1\) are calculated using the MSE. That test statistic, if all of the assumptions are true, follows the Student's t distribution with \(n-p\) degrees of freedom. However, if the variance is a function of the independent variable, the distribution of the test statistic is also a function of the independent variable. This is a problem, because the "correct" p-values and confidence intervals would also be functions of \(x\)

    The analysis of the effect of heteroskdasticity can be repeated for the distribution of test statistics. Or, we can realize that we reject the null hypothesis when the confidence interval does not contain our hypothesized value. That is, the analysis for the confidence interval is sufficient for our understanding of the p-value:

    1. If the confidence interval is too wide, then the p-values will be too high (reject the null hypothesis too infrequently).
    2. If the confidence intervals are too narrow, then the p-values are too low (reject too frequently).

    Of the two possibilities, the first (which produces a higher Type I error rate) may be better from the researcher's point of view in some cases: While the researcher would end up rejecting too infrequently, those rejections are more sure because the true p-value is less than the observed p-value. One may prefer this when it worse to commit a Type I error (rejecting a true null hypothesis).

    On the other hand, however, the second may be better in certain circumstances. It will have a lower Type II error rate. If it is more important to protect against a Type II error, then this will be the better scenario.

    Note

    These findings regarding the hypothesis tests about \(\mathbf{B}\) strictly relate only to these particular three types of heteroskedasticity at these three particular levels. If you find your heteroskedasticity is much higher than those used in these examples, you should run a coverage analysis similar to these to see how bad the heteroskedasticity really affects the confidence intervals.

    The above analysis examined the effects of heteroskedasticity on the parameter estimates. It did not examine its effect on the prediction of y-values. Those effects are much more pronounced and are also a function of the variance at the x-value.

    To see this in one example, let us generate trumpet-shaped heteroskedastic residuals, predict the value of \(y\) at three points along the x-axis, and determine how frequently those points are covered by the calculated confidence intervals.

    Before we do this, let us see if we can determine what to expect.

    • For low values of x, there is very little true variation in the predicted value. Thus, we would expect the predicted value to fall in the calculated confidence interval very frequently.
    • For middling values of x, the true and estimated variances are close to each other. Thus, we would expect coverage to be close to the nominal 95\%.
    • For large values of x, the true variation is much higher than the estimated variation. Thus, a lot of the predicted y-values should fall outside the confidence interval. We should expect the coverage to be relatively low.

    Let's see if these expectations are met by reality.

    Here is the entire code

    set.seed(30)
    
    y1covered = numeric()
    y2covered = numeric()
    y3covered = numeric()
    
    for(i in 1:1e4) {
      x = seq(0,10,0.5)
      n = length(x)
      e = rnorm(n, m=0, s=sqrt(x))
      y = 3 + 2*x + e
      
      mod = lm(y~x)
      
      ypr = predict(mod, newdata=data.frame(x=c(0,5,10)), interval="confidence")
      
      y1covered[i] = ( 3>ypr[1,2]) && ( 3<ypr[1,3])
      y2covered[i] = (13>ypr[2,2]) && (13<ypr[2,3])
      y3covered[i] = (23>ypr[3,2]) && (23<ypr[3,3])
    }
    
    mean(y1covered)
    mean(y2covered)
    mean(y3covered)
    

    The coverage for \(y\) when \(x\) is very low is 99.6%, which is very high. The coverage for \(y\) when \(x\) is in the middle is 94.5%, which is about what we want. The coverage for \(y\) when \(x\) is very high is 88.7%, which is relatively low.

    These results are what we expected.

    Note

    Realize again the connection between p-values and confidence intervals. To ensure that our p-values are "protected" — are no more than estimated — we need to limit ourselves to the areas where the true spread is no larger than the average

    Huber-White Heteroskedastic Estimators

    Because heteroskedasticity is common in several areas of study, it would be helpful to find a method for reducing its effect. This was the task that Huber and White (1980) set for themselves. They created the so-called "sandwich estimators" to reduce (not eliminate) the effect of heteroskedasticity on the estimates. Ultimately, their solution arises from using matrix multiplication to make the residuals homoskedastic.

    Remember that in classical linear regression under the Gauss-Markov assumptions, the ordinary least squares (OLS) estimator \( \mathbf{b} = \mathbf{(X^{\prime} X)^{-1} X^{\prime} Y}\) has a covariance matrix estimated in the usual manner, \(V[\mathbf{b}] = \mathbf{ \mathrm{MSE}\ (X^{\prime} X)^{-1}}\), where \( \mathrm{MSE} \) is the average residual variance. This formula (as with most we've covered so far) depends on the assumption of homoskedasticity — that the variance of the residuals is constant across all observations.

    When errors are heteroskedastic, this covariance estimator is biased. This leads to incorrect standard errors, invalid hypothesis tests, and unreliable confidence intervals. The Huber-White sandwich estimator (also called the heteroskedasticity-consistent covariance matrix estimator, or HCME) provides a consistent estimate of the covariance matrix of \(\mathbf{b}\) even in the presence of heteroskedasticity of unknown form.

    Derivation and Matrix Form

    Under our usual conditions, the asymptotic covariance matrix of \(\mathbf{b}\) is:

    \begin{equation}
    \text{A.Cov}(\mathbf{b}) = \mathbf{(X^{\prime} X)^{-1} \left( X^{\prime} \Sigma X \right) (X^{\prime} X)^{-1}} \label{eq:rawSandwich}
    \end{equation}

    where \(\Sigma = \text{diag}(\sigma_1^2, \sigma_2^2, \dots, \sigma_n^2)\) is the \(n \times n\) covariance matrix of the errors. The "sandwich" form is evident: \( \mathbf{(X^{\prime} X)^{-1}}\) forms the "bread" and \( \mathbf{X^{\prime} \Sigma X}\) is the "meat." Or, as I learned it, the \(\mathbf{X}\) is the bread and the \(\Sigma\) is the meat. But, this way, there is a whole lotta bread on that sandwich.

    Time for a quick sanity check:

    If there is no heteroskedasticity, then \(\Sigma = \sigma^2 \mathbf{I}\). Substituting this into equation \(\ref{eq:rawSandwich}\) returns us to our usual estimator of \(V[\mathbf{b}]\).

    It is always a good idea to check that we are merely extending what we already know.

    Since \(\Sigma\) is unknown (it is the population covariance matrix), White (1980) proposed replacing \(\sigma_i^2\) with the squared OLS residuals \(\hat{\epsilon}_i^2 = (y_i - x_i'\hat{\beta})^2\). This gives the "HC0" estimator:

    \begin{equation}
    \widehat{\text{A.Cov}}_{\text{HC0}}(\hat{\beta}) = (X^{\prime} X)^{-1} \left( X^{\prime} \text{diag}(\hat{\epsilon}_i^2) X \right) (X^{\prime} X)^{-1}
    \end{equation}

    In full matrix notation, if we define an \(n \times n\) diagonal matrix \(\hat{\Sigma} = \text{diag}(\hat{\epsilon}_1^2, \dots, \hat{\epsilon}_n^2)\), then:

    \begin{equation}
    \widehat{\text{A.Cov}}_{\text{HC0}}(\hat{\beta}) = (X^{\prime} X)^{-1} (X^{\prime} \hat{\Sigma} X) (X^{\prime} X)^{-1}.
    \end{equation}

    Small-Sample Corrections

    For a first attempt, this was fantastic! However, we later discovered that the HC0 estimator can be biased in small samples. Several refinements adjust the residuals to improve finite-sample performance. These are (thankfully) beyond the scope of this course.

    BTW: The function in KnoxStats uses HC0 for adjustments.

    Properties and Usage

    • Consistency:
      The sandwich estimator is consistent for \(\text{A.Cov}(\textbf{b})\) under heteroskedasticity, provided \(E[\varepsilon_i | x_i] = 0\) and the other usual conditions.
    • Robustness:
      It is "robust" in the sense of providing valid inference without specifying the form of heteroskedasticity.
    • Limitation:
      It does not change the point estimates \(\mathbf{b}\); it only adjusts (corrects?) the estimated covariance matrix. For efficiency gains, weighted least squares or other methods may be needed.
    Example \(\PageIndex{1}\): Violent and Property Crime

    Is there a significant relationship in the United States between the violent crime rate and the property crime rate a decade earlier?

    Solution.
    To test this, let us use the years 2000 and 1990 and the crime dataset.

    The usual initial analysis is done.

    library(KnoxStats)
    
    dt = read.csv("http://rfs.kvasaheim.com/data/crime.csv")
    attach(dt)
    
    mod = lm(vcrime00 ~ pcrime90)
    bptest(mod)

    The Breusch-Pagan test indicates strong evidence of heteroskedasticity (p-value = 0.0008917). As such, at this point in your understanding, you should use summaryHCE instead of summary to obtain proper estimates of the standard errors. Here it is:

    summaryHCE(mod)
    

    with output

                    Estimate HC Std. Error HC t value HC Pr(>|t|)
    (Intercept) -203.8202734  153.08525598  -1.331417    0.183052
    pcrime90       0.1363034    0.03457255   3.942531    0.000081
    

    According to this analysis, there is a significant relationship between the property crime rate in 1990 and the violent crime rate in 2000. For every 1 increase in the property crime rate in 1990, the violent crime rate increases by 0.1363, on average.

    If we did not use the sandwich correction, we would have seen this table:

                  Estimate Std. Error t value Pr(>|t|)    
    (Intercept) -203.82027  104.24087  -1.955   0.0563 .  
    pcrime90       0.13630    0.02136   6.381 6.04e-08 ***
    

    Note that the estimated effects do not change. Only the standard error (and what depends on it) changes between the two models. This is because the sandwich correction acts only on \(V[\mathbf{b}]\), not on the parameter estimates, \(\mathbf{b}\).


    This page titled 6.3: Homoskedasticity is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Ole Forsberg.

    • Was this article helpful?