6.1: Normality
- Page ID
- 57731
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Let us tackle the important requirement of normality in our linear models. As established in Chapter 5: Improved! Now with Probabilities, the assumption of normally distributed errors is what grants us the powerful ability to derive exact sampling distributions for our test statistics. This foundation is rather important — it allows for the precise calculation of p-values and the construction of reliable confidence intervals, forming the backbone of statistical inference. While the Central Limit Theorem often provides a safety net with large samples, rigorously checking this assumption is essential, especially with smaller datasets or when making precise probabilistic statements.
In this section, we will move from theory to practice, exploring both graphical diagnostics (like Q-Q plots and histograms) and formal statistical tests (Shapiro-Wilk) to assess whether your model's residuals meet this key assumption, ensuring the validity of your conclusions.
✦•················• ✦ •··················•✦
Graphical Checks
From your elementary statistics course, you were most likely introduced to a pair of graphical tests of Normality: Q-Q plots and histograms. In this section, we examine each when the residuals are generated from a Normal process and when they are not. To do this, let's do some experiments using R.
To begin, let us generate 100 residuals from a Normal process, with mean \(\mu=0\) and standard deviation \(\sigma=2\):
e = rnorm(100, m=0, s=2)
The rnorm function generates n random values from a Normal distribution with mean m and standard deviation s. Running this line only stores those 100 random values in the e variable.
Be aware that R, like most statistics programs, parameterizes the Normal distribution using the standard deviation instead of the variance.
With that one line, we have some "residuals" to play with that are generated "under the null hypothesis" (i.e., they are generated from the assumed Normal distribution). This will allow us to better understand what the Normal distribution looks like.
The Q-Q Plot
So, let us look at how to generate a Normal-based quantile-quantile (Q-Q) plot using R.

qqnorm(e)
That's it. After running that one line, R creates the usual Q-Q plot for the Normal quantiles. It is a default graphic, so it does not look awesome, but it does get the point across to the statistician. One shortcoming of the default Q-Q plot in R is that it does not provide the diagonal line by default. You can add it by also running the command.
qqline(e)
Figure \(\PageIndex{1}\), top panel, is the graphic produced by running these lines. Note that most of the circles cluster around the diagonal target line. Let us compare that graphic to a Q-Q plot from a non-Normal distribution. The non-Normal (and highly skewed) distribution will be the Exponential(\(\lambda=0.10\)) distribution.
enon = rexp(100, rate=1/10) qqnorm(enon) qqline(enon)
Compare the two graphics in Figure \(\PageIndex{1}\). Note that the shape of the Q-Q plot from the Normal residuals (top) is very different from the one generated from the skewed distribution (bottom). This particular shape indicates that the distribution of the residuals is positively skewed (right skewed).
Those who are particularly skilled are reading Q-Q plots can also determine if the distribution is kurtotic.
The above only looks at a single random sample from a Normal. It is dangerous to draw conclusions from just a single instance. It cannot tell you the inherent randomness of a Q-Q plot when showing a Normal. To see that, you need to generate many, many, many Normal samples and plot them. That will show how precise (or imprecise) the Q-Q plot is.
We do this next.
Just how variable is the Q-Q plot?
Solution.
To figure this out, let us generate multiple samples from the same Normal distribution, then plot all of the samples on a single Q-Q plot. This will show us the expected spread.
set.seed(370)
x = rnorm(1e5, m=0, s=1)
y = matrix(x, ncol=100)
qqnorm(y[1,], xlim=c(-4,4), ylim=c(-4,4), pch=16, col=rgb(0,0,0,0.1))
for(i in 2:dim(y)[1]) {
qq = qqnorm(y[i,], plot=FALSE)
points(qq$x,qq$y, pch=16, col=rgb(0,0,0,0.1))
}
abline(a=0,b=1, col="white", lwd=3)
abline(a=0,b=1, col="red", lwd=1)
Note that I used an "rgb" color designation in the code. This rgb designation has four parameters. The first is the amount of red in the color; the second, green; third, blue. The last is the level of opacity. Me specifying an opacity of 0.1 means that it will take 10 dots on top of each other to show black. Why would I do this? It helps to show the density of the dots. Darker areas have a higher number of dots. It helps to see where "most" of the dots are, as opposed to "any" dots.
Run the above code to see all of the Q-Q plots (well, the dots defining the Q-Q plot). Note the amount of variability in the dots. Remember that all of the dots come from a Normal distribution... all of them.
What lesson can you take away from this experiment?
Histogram
In the previous section, we examined quantile-quantile plots. We saw that a Q-Q plot with the dots aligning closely to the diagonal target line suggests Normality. In this section, we will use histograms to obtain a better view of the distribution of the data.

Again, let us generate Normally-distributed residuals.
e = rnorm(100, m=0, s=2)
The function to create a basic histogram is just
hist(e)
or
overlay(e)
The former is just a basic, utilitarian histogram. The latter, from the KnoxStats package, overlays the histogram with a Normal curve.
This basic histogram produced is provided as in the top panel of Figure \(\PageIndex{2}\). Note that it has the stereotypical "bell shape" to it, thus suggesting the data come from a Normal distribution. This is the shape you are seeking when using a histogram to explore the distribution of the residuals.
What does the histogram look like when the data come from a highly skewed distribution like the Exponential(\(\lambda=0.10\)) distribution above? See Figure \(\PageIndex{2}\), bottom panel. Note the lack of bell shape in the top histogram (and enhanced in Figure \(\PageIndex{2}\), below). In fact, one can easily see that the residuals are positively skewed. Recall that the direction of the skew is in the same direction as the long tail.
I find using the histogram much easier than using the Q-Q plot. I can "see" how the residuals are distributed in the histogram. In the Q-Q plot, I have to interpret much more, remembering what the different shapes indicate.
With that said, however, a Q-Q plot is much more useful than a histogram when there are few data points. So, if your sample size is small, you will want to use a Q-Q plot. Otherwise, a histogram may be best
Numeric Tests
Because of the importance of the Normal distribution in Statistics, there are several Normality tests available, with more being created yearly. I mean, we gotta award those PhD degrees for something, right? The reality is that it matters little which Normality test you use. As you will see below, the Normality assumption in OLS reflects the Normality assumption in t-tests and the like. The Central Limit Theorem ensures that a sufficiently large sample size makes the distribution of the data largely irrelevant; the distribution of sample sums is "close" to Normal.
So, for me, I default to using the Shapiro-Wilk test as my Normality test. In base R, this test is implemented as the function shapiro.test. In the KnoxStats package, it is implemented as shapiroTest. I recommend the latter, because it adds some helpful functionality.
So, let us generate some Normally-distributed residuals and see how we can use the Shapiro-Wilk test.
Let us first set the random number seed. Why? Technically, there is no such thing as a truly random number when they are generated by a computer. This is because these pseudo-random numbers are just functions of a number called a seed. Think of a seed as being the name of an entire list of "random-looking" numbers. Thus, if we use the same seed, then our "random numbers" will be the same.
To see this, run the following three lines:
set.seed(370) e = rnorm(100, m=0, s=2) head(e, 5)
The numbers you get are
-0.5566842 -1.7264618 1.3320962 -0.5392740 0.3477423
Setting the random number seed ensured that your values are mine. Even more interesting: if you rerun those three lines, you will get the same "random" numbers.
In research, why would one want to set the random number seed?
The head function returns the first six values in the variable, by default. A similar function is the tail function, which returns the last six values in the variable. I included the number 5 to have R only return five numbers.
To perform the Shapiro-Wilk test, just run the following line
shapiroTest(e)
The output you get will be
Shapiro-Wilk Normality test
data: e
W = 0.98554, p-value = 0.3472
The p-value is interpreted in the usual way: If the p-value is greater than your pre-determined value of \(\alpha\), then you fail to reject the null hypothesis. If the p-value is less than \(\alpha\), then you reject the null hypothesis. The null hypothesis in this case is that the data are generated from a Normal process (that the data come from a Normal distribution).
The Normal distribution is identical to the Gaussian distribution. The only difference is the discipline. In much of statistics, it is known as the Normal distribution. However, when we get to generalized linear models in Chapter 14: Generalized Linear Models, this same distribution will be called the Gaussian distribution. In the Francophone world, this distribution is routinely called the Laplace-Gauss distribution.
Since the p-value is greater than \(\alpha=0.05\), we fail to reject the null hypothesis. We do not have sufficient evidence that the data are non-Normal.
We did not conclude that the data do come from a Normal distribution. We only concluded that there is not enough evidence to the contrary.
This point is very important. We first assume that the residuals are from a Normal distribution, then we test to see how reasonable that assumption is, given our data.
To drive home this point, let us generate our data from a non-Normal distribution and perform the Shapiro-Wilk test:
set.seed(370) e = rt(100, df=100) shapiroTest(e)
By the way that we generated the data, we know that the residuals do not come from a Normal distribution; they come from a Student's t distribution with 100 degrees of freedom (\(\nu=100\)). Here is the output:
Shapiro-Wilk Normality test data: y W = 0.98979, p-value = 0.6474
Note that the p-value is greater than our \(\alpha\) value of 0.05. As such, we again fail to reject the null hypothesis. There is not sufficient evidence that the residuals do not come from a Normal distribution.
However, we absolutely know they don't.
Never "accept" the null hypothesis. The p-value depends on how reasonable the null hypothesis is, as well as how good the test is and how large the sample is.
If we are good statisticians, then there are two possible reasons that we "fail to reject" the null hypothesis.
- The first is that the null hypothesis is actually true.
- The second is that there is not enough power in our test (the sample size is too small).
With our given set of data, there is no way to determine which of the two conditions is true.
Remember that when making decisions, there are really three usual decisions: Yes, No, and Maybe. In statistics, since we are just testing how reasonable the null hypothesis is, we only have two possible decisions: Reject and Don't Reject the null hypothesis. These correspond to "the null hypothesis is wrong" and "the null hypothesis is right or we don't know."
Let us now generate severely non-Normal residuals and use the Shapiro-Wilk test to test if they do come from a Normal distribution:
set.seed(370) e = rexp(100, rate=1/10) shapiroTest(e)
Here are the results:
Shapiro-Wilk Normality test data: y W = 0.68002, p-value = 1.891e-13
Note that the p-value of \(1.891 \times 10^{-13} = 0.000\ 000\ 000\ 000\ 189\ 1\) is much less than \(\alpha = 0.05\). That strongly indicates that the residuals were not generated from a Normal process.
Remember the Central Limit Theorem and the effect of sample size on the Normality of the sample sums. In the next section, we will explore the effects of non-Normality on our estimators and their distributions.
Why should I invoke the holy CLT in this context? Recall that the CLT speaks to the distribution of sums (and means) of independent random variables. As the sample size increases, the distribution of that sum approaches Normality.
Look at the formulas for \(b_0\) and \(b_1\). They both include the sum of \(y_i\). Thus, it is the distribution of \(\sum y_i\) that we really care about. If the \(y_i\) are from a Normal distribution, then this requirement is met. However, if the sample size is large enough, this condition is also met, as long as the data are from a distribution with a finite variance. In other words, the CLT rules the world.
How "large" is "large enough"?
Exploration of the Effects of Non-Normality
Mathematically speaking, if the Normality assumption does not hold, then nothing in Chapter 5: Improved! Now with Probabilities is absolutely true for the model. However, the Central Limit Theorem tells us that a large sample size usually mitigates the effects of non-Normality in the residuals.
Let us explore that here...
An Appropriate Test?
Recall from your previous statistics course two main types of errors: Type I and Type II. A Type I Error happens when the null hypothesis is correct, but the test tells you to reject it. For a statistical test to be appropriate, the first requirement is that the Type I Error rate is equal (or sufficiently close to) the claimed \(\alpha\) level. Thus, to determine if OLS is an appropriate test, let us examine the Type I Error rate to check that it is (close enough to) \(\alpha\).
This means we generate our data from the null hypothesis while meeting all of the requirements of the test.
Here is the script to generate the data once
set.seed(30) beta0 = 3 beta1 = 0 x = 1:20 e = rnorm(20, m=0, s=1) y = beta0 + beta1*x + e
This tests the null hypothesis that \(\beta_1 = 0\):
model = lm(y~x) summary(model)
The p-value corresponding to the test of our hypothesis is \(0.3784\). We can have R just give that number to us:
summary(model)[2,4]
That is one p-value.
If the test is entirely appropriate (for all values of \(\alpha\)), we would reject our true null hypothesis about \(\alpha\) of the time. This statement is equivalent to the statement that the p-values follow a standard Uniform distribution (see the section on multiple comparisons).
\begin{equation}P \sim Unif(0,1)\end{equation}
Why?
If the test is appropriate, and if we reject when the p-value is less than \(\alpha\), what is the probability of rejecting? It should be \(\alpha\). Why?
Why should the probability of committing a Type I error be \(\alpha\)?
In statistical symbols, this is
\begin{equation}
P[P \le \alpha] = \alpha
\end{equation}
This is just the cumulative distribution function for the standard Uniform distribution. Thus, \(P \sim Unif(0,1)\).
The code above gave us just one p-value. To investigate the distribution of p-values, you need to obtain thousands (or millions) of p-values. The easiest way to do this is to loop through the above steps and save the p-value from each test. That is what the following code does:
set.seed(30)
pval = numeric()
beta0 = 3
beta1 = 0
x = 1:20
for(i in 1:1e4) {
e = rnorm(20, m=0, s=1)
y = beta0 + beta1*x + e
model = lm(y~x)
pval[i] = summary(model)[2,4]
}
At the end of running this code, the variable pval contains 10,000 (\(\texttt{1e4} = 1 \times 10^4\)) observed p-values.
To determine if these p-values follow a standard Uniform distribution, we can create a histogram. When you do so, you note that it appears to closely follow a standard Uniform distribution, although not exactly. We would not expect it to follow the distribution exactly; those p-values are based on random samples and are therefore random values and the results will therefore be random.
Understanding the meaning of the p-value (above), we can check if the test is appropriate for \(\alpha = 0.05\) by checking that the rejection rate is sufficiently close to \(0.05\). In other words, we can test if the proportion of p-values less than \(\alpha\) is close enough to \(\alpha\). The Binomial test can accomplish this.
The Binomial test is the exact test for checking that the rejection rate is equal to the claimed rate, 0.05. In your introductory course, you may have learned either the proportions test or the Wald test. Both are approximate tests that rely on the Normal approximation to the Binomial distribution
Here is the code to run the Binomial test:
binom.test( x=sum(pval<0.05), n=length(pval), p=0.05 )
The null hypothesis is that the proportion of rejections is equal to \(0.05\). The p-value of \(0.9817\) indicates that the claim that the Type I Error rate is 0.05 is reasonable; that is, the test seems reasonable. In other words, the test appears to be fine for \(\alpha=0.05\).
To determine if the test is appropriate for all possible \(\alpha\)-values, we could repeat the above for every possible value of \(\alpha\). That would only take \(\infty\) years. On the other hand, we can test all possible \(\alpha\) values at once by recognizing that the p-values should follow a standard Uniform distribution and testing if they do. We can perform a statistical test to determine if the distribution of the observed p-values is sufficiently non-Uniform:
ks.test(pval, "punif")
This line performs the Kolmogorov-Smirnov test (Massey 1951). Its null hypothesis is that the observed values follow the distribution stated. Because the p-value is \(0.4361\), we fail to reject the null hypothesis that the p-values follow a standard Uniform distribution.
In other words: the test appears to be universally appropriate. That is, it seems to be appropriate for any value of \(\alpha\) you select. Thus, we know that the test associated with testing the null hypothesis \(\beta_1 = 0\) is appropriate for our usual value of \(\alpha\) as well as for all values of \(\alpha\).
It is good to know that OLS works when all of the assumptions are met. What about when they are not met?
When running tests, you will tend to just look for the p-value and draw conclusions based on that one number. If you do, be very clear what that p-value measures. In the previous example, there were 10,002 different p-values. The first 10,000 were p-values for the test that \(\beta_1 = 0\) for 10,000 different sets of data. The 10,001\(^{\text{th}}\) p-value determined if the distribution of those p-value was standard Uniform. The last p-value determined if the proportion of those p-values less than \(\alpha=0.05\) was \(0.05\).
There were 10,002 different tests in this example.
Non-Normal Residuals I
Now, what about if the Normality assumption is not met? What if the residuals follow an Exponential distribution? The following code generates 10,000 p-values where the residuals follow an Exponential distribution with \(\lambda=1\). Be able to compare it to the previous code listing and find the one difference.
set.seed(30)
pval = numeric()
beta0 = 3
beta1 = 0
x = 1:20
for(i in 1:1e4) {
e = rexp(20, rate=1)
y = beta0 + beta1*x + e
model = lm(y~x)
pval[i] = summary(model)[2,4]
}
binom.test( x=sum(pval<0.05), n=length(pval), p=0.05 )
ks.test(pval, "punif")
The Binomial test returns a p-value of \(0.422\), which suggests to us that the OLS test is appropriate for \(\alpha=0.05\). Furthermore, the Kolmogorov-Smirnov test returns a p-value of \(0.2867\). Thus, even if the residuals are skewed this much (\(\gamma_1 = 2\)), the tests arising from the ordinary least squares estimation method appear universally appropriate.
The sample size in these test is \(n=20\). This is much lower than the usual "rule of thum" of \(n=30\).
Also note that what we actually discovered is that OLS is very robust to violations of some of its requirements.
Not all of them, just some.
Exploration helps us determine which and how violated the requirements must be before the tests give useless results.
Non-Normal Residuals II
Now, let's create a different skewed distribution for the residuals that is even more skewed: a Chi-square(\(\nu=1\)) distribution (\(\gamma_1 = \sqrt{8} \approx 2.8\)). Again, be able to compare it to the previous code listing and find the one difference.
set.seed(30)
pval = numeric()
beta0 = 3
beta1 = 0
x = 1:20
for(i in 1:1e4) {
e = rchisq(20, df=1)
y = beta0 + beta1*x + e
model = lm(y~x)
pval[i] = summary(model)[2,4]
}
The Kolmogorov-Smirnov test returns a p-value of \(0.2667\). Thus, even if the residuals are this skewed, the tests arising from OLS appear universally appropriate. The Binomial test returns a p-value of \(0.0939\), which tells us that the OLS test appears to be appropriate for \(\alpha=0.05\). Thus, even when the residuals follow this heavily skewed distribution, the conclusions based on our OLS tests seem to be appropriate for a sample size of \(n=20\).
Non-Normal Residuals III
Now, for fun, let's use a symmetric distribution for the residuals, but one where the CLT does not apply: the Cauchy distribution. Again, be able to compare it to the previous code listing and find the one difference.
set.seed(30)
pval = numeric()
beta0 = 3
beta1 = 0
x = 1:20
for(i in 1:1e4) {
e = rcauchy(20)
y = beta0 + beta1*x + e
model = lm(y~x)
pval[i] = summary(model)[2,4]
}
The Kolmogorov-Smirnov test returns a p-value of essentially \(0\). The Binomial test also returns a p-value of essentially \(0\). This tells us that the OLS test is not appropriate everywhere or even at \(\alpha=0.05\) when the residuals come from a Cauchy distribution.
Increasing the sample size, which usually fixes the issue, will not fix this issue, either:
set.seed(30)
pval = numeric()
beta0 = 3
beta1 = 0
x = 1:5000
for(i in 1:1e4) {
e = rcauchy(5000)
y = beta0 + beta1*x + e
model = lm(y~x)
pval[i] = summary(model)[2,4]
}
In this code, the sample size is \(5000\). The conclusions are the same.
To see what is happening, create the histogram of the p-values and examine it. You should see that the first bar is much smaller than the others. This means the OLS tests reject at a lower rate than \(0.05\).
When the underlying distribution does not have a finite variance, such as the Cauchy distribution, the Central Limit Theorem does not apply. That means increasing the sample size has absolutely no effect on the Normality of the sums.
Such sample sums are never Normally distributed.


