7.1: Violent Crime
- Page ID
- 57738
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)To help settle all of this, let's see a simple extended example of modeling the violent crime rate in 2000 using just the violent crime rate in 1990.
Determine the relationship between the violent crime rate in 1990 and in 2000.
If you find an appropriate model, estimate the expected 2000 violent crime rate for a state with a 1990 violent crime rate of 100. Finally, predict the 2000 violent crime rate for a state with a 1990 violent crime rate of 100
The Preamble
The preamble is the part of the code that imports the extra functions, loads the data, and gives us an overview of it. This is a typical preamble
### Preamble
library(KnoxStats)
library(lawstat)
library(lmtest)
dt = read.csv("http://rur.kvasaheim.com/data/crime.csv", stringsAsFactors = TRUE)
attach(dt)
summary(dt)
Note that there are many variables in this particular data set. Since we are modeling the violent crime rate in 2000 using the rate in 1990, we will only use the variables vcrime00 and vcrime90.
Back in the early part of this century, R used to read strings in the data as factors, by default. However, a few years back, it changed the default behavior. So, if you are loading categorical variables — and using them in your analysis — you will need to include stringsAsFactors = TRUE in your read.csv call. Just be aware of this.
The Model
To fit the model and estimate the parameters using ordinary least squares, run this line:
crimeMod = lm(vcrime00 ~ vcrime90)
Nothing gets outputted by this line. R just echoes it if you typed it correctly. However, a lot has happened behind the scenes. Inside R, the model was fit using ordinary least squares (using matrices). The parameters were estimated. All of this was done behind the scenes.
Test Requirements
The next step is to check that the model does not violate any of the assumptions/requirements.
Normality of the Residuals
The first we will check is the Normality of the residuals:
e = residuals(crimeMod) # Normal Residuals? overlay(e) shapiroTest(e)
The histogram overlaid with the normal curve suggests the residuals are slightly skewed to the right. The Shapiro-Wilk test strongly indicates a lack of Normality (p-value = 0.004424). The sample size of \(n=51\), however, definitely seems large enough to ensure the sums of the residuals closely follows a normal distribution (this is the actual requirement, by the way).
If you would like to check this, run the following code (first think what it does and why it answers this problem):
et = numeric()
for(i in 1:1e3) {
x = sample(e, replace=TRUE)
et[i] = sum( x )
}
shapiroTest(et)
Since the reported p-value is much greater than \(\alpha\), we can conclude that the sample sums are sufficiently Normal. And, it is the sample sums that affect the distribution of \(b_0\) and \(b_1\).
Thus, the model passes the normality requirement. ✅
Constant Expected Value (Functional Form)
The second assumption to test is constant expected value (proper model form):
# Constant Expected Value of Residuals plot(vcrime90,e) runs.test(e, order=vcrime90)
The residuals plot seems a bit inconclusive to me. This is mainly due to the single point far to the right (the District of Columbia). The runs test, however, indicates that there is no significant evidence the residuals follow anything other than a horizontal line (p-value = 0.6732).
Thus, the model does not violate the second assumption. ✅
Homoskedasticity (Constant Variance)
The third assumption is that the variance of the residuals is constant:
# Constant Variance of Residuals plot(vcrime90,e) bptest(crimeMod)
For me, the graphic is inconclusive because of DC. The Breusch-Pagan test did not detect significant heteroskedasticity (p-value = 0.1041 > 0.05 = \(\alpha\)). Thus, the model passes the third requirement. ✅
Multicollinearity
The last thing to check is multicollinearity. There is no need to check it for this model because there is just the one independent variable that we already see has variation. So, it passes this test, too. ✅
The Results of the Model
Since none of the requirements are violated in this model, it seems appropriate. With that, we can see the estimates:
Call:
lm(formula = vcrime00 ~ vcrime90)
Residuals:
Min 1Q Median 3Q Max
-241.32 -42.84 -18.04 40.97 208.41
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 109.52716 21.42679 5.112 5.27e-06 ***
vcrime90 0.58065 0.03107 18.689 < 2e-16 ***
---
Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.10 1
Residual standard error: 85.55 on 49 degrees of freedom
Multiple R-squared: 0.877, Adjusted R-squared: 0.8745
F-statistic: 349.3 on 1 and 49 DF, p-value: < 2.2e-16
The first section of the output reports the model you presented. Use it to double-check you typed things in correctly or to remind you what the model is examining. The second part produces the five-number summary of the residuals. Since the mean (\(0\)) is greater than the median (\(-18.04\)), there is evidence of a positive skew to the residuals. This, we discovered above by looking at the histogram of the residuals.
The third section is the "regression table." Each row corresponds to a different independent variable (or the intercept). The columns are the estimates, the standard errors, the test statistic (estimate divided by standard error), and the p-value.
In this example, there is very strong evidence that the relationship between the violent crime rate in 1990 and in 2000 is positive with the effect \(b_1 = 0.58065\). If State A had a higher violent crime rates in 1990 than State B, then it also tended to have a higher violent crime rates in 2000.
The intercept, \(b_0 = 109.52716\) indicates that for a state with \(0\) violent crime in 1990, the expected violent crime rate in 2000 is \(109.52716\) crimes per 100,000 people. However, since no state was close to having a violent crime rate in 1990 of \(0\), so this interpretation does not make statistical sense.
Remember that we should only use our models to predict and estimate for values of the 1990 violent crime rate within the domain of the vcrime90 variable in our data. Interpolation is appropriate; extrapolation, rarely so.
The fourth part contains the statistics that describe the model, as a whole. The estimate for \(\sigma\) is \(85.55\). [Remember \(\varepsilon_i \stackrel{\text{iid}}{\sim} N(0, \sigma)\).] The model explains 87.45% of the variance contained in the null model. The model does give some information about the dependent variable (\(F = 349.3\); \(\nu_1 = 1\), \(\nu_2 = 49\); p-value \(\ll 0.0001\)).
The Graphic
The following code produces a graphic like that of Figure \(\PageIndex{1}\):
plot.new() plot.window( xlim=c(0,2500), ylim=c(0,2500) ) axis(1); axis(2) title(xlab="Violent Crime Rate (1990)", line=2.75) title(ylab="Violent Crime Rate (2000)", line=3.25) xx = seq(0,2500) yy = predict(crimeMod, newdata=data.frame(vcrime90=xx)) lines(xx,yy, col="steelblue", lwd=2) points(vcrime90,vcrime00, pch=21, bg=1+(vcrime00>vcrime90))
Note that the graphic also indicates which states had their violent crime rate increase. It comes from this line:
points(vcrime90,vcrime00, pch=21, bg=1+(vcrime00>vcrime90))
The first two slots are the x- and y-values. The third slot specifies the plotting character. A pch of 21 is a dot with its insides colorable. The fourth slot, bg, specifies the color to fill the inside of the dots (bg = "background").
The part (vcrime00>vcrime90) is a Boolean value that takes on value 1 if the statement is true (violent crime rate increased) and 0 otherwise. Adding 1 to each ensures that the two colors are 1 and 2 — black and red.
Confidence Interval for β1
In addition to calculating the point estimates of the slope and intercept, we can also calculate confidence intervals:
confint(crimeMod)
From this output, we are 95% confident that the effect of the violent crime rate in 1990 on 2000 is between \(0.518\) and \(0.643\).
Confidence Interval for Y
We can calculate the expected value of \(Y\) for a given value of \(x\):
predict(crimeMod, newdata=data.frame(vcrime90=100), interval="confidence")
We are 95% confident that the expected value of \(Y\) when \(x=100\) is between \(129.5\) and \(205.6\), with a point estimate of \(167.6\).
Prediction Interval for Y
Finally, we can predict the value of \(Y\) for a new value of \(x\):
predict(crimeMod, newdata=data.frame(vcrime90=100), interval="prediction")
We are 95% sure that the next observation of the violent crime rate in 2000 for a state with a violent crime rate in 1990 of 100 is between \(-8.5\) and \(343.7\), with a point prediction of \(167.6\).


