Skip to main content
Statistics LibreTexts

7.1: Violent Crime

  • Page ID
    57738
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    To help settle all of this, let's see a simple extended example of modeling the violent crime rate in 2000 using just the violent crime rate in 1990.

    The Example

    Determine the relationship between the violent crime rate in 1990 and in 2000.

    If you find an appropriate model, estimate the expected 2000 violent crime rate for a state with a 1990 violent crime rate of 100. Finally, predict the 2000 violent crime rate for a state with a 1990 violent crime rate of 100

    The Preamble

    The preamble is the part of the code that imports the extra functions, loads the data, and gives us an overview of it. This is a typical preamble

    ### Preamble
    library(KnoxStats)
    library(lawstat)
    library(lmtest)
    
    dt = read.csv("http://rur.kvasaheim.com/data/crime.csv", stringsAsFactors = TRUE)
    attach(dt)
    
    summary(dt)
    

    Note that there are many variables in this particular data set. Since we are modeling the violent crime rate in 2000 using the rate in 1990, we will only use the variables vcrime00 and vcrime90.

    R Feature

    Back in the early part of this century, R used to read strings in the data as factors, by default. However, a few years back, it changed the default behavior. So, if you are loading categorical variables — and using them in your analysis — you will need to include stringsAsFactors = TRUE in your read.csv call. Just be aware of this.

    The Model

    To fit the model and estimate the parameters using ordinary least squares, run this line:

    crimeMod = lm(vcrime00 ~ vcrime90)
    

    Nothing gets outputted by this line. R just echoes it if you typed it correctly. However, a lot has happened behind the scenes. Inside R, the model was fit using ordinary least squares (using matrices). The parameters were estimated. All of this was done behind the scenes.

    Test Requirements

    The next step is to check that the model does not violate any of the assumptions/requirements.

    Normality of the Residuals

    The first we will check is the Normality of the residuals:

    e = residuals(crimeMod)
    
    # Normal Residuals?
    overlay(e)
    shapiroTest(e)
    

    The histogram overlaid with the normal curve suggests the residuals are slightly skewed to the right. The Shapiro-Wilk test strongly indicates a lack of Normality (p-value = 0.004424). The sample size of \(n=51\), however, definitely seems large enough to ensure the sums of the residuals closely follows a normal distribution (this is the actual requirement, by the way).

    If you would like to check this, run the following code (first think what it does and why it answers this problem):

    et = numeric()
    for(i in 1:1e3) {
      x = sample(e, replace=TRUE)
      et[i] = sum( x )
    }
    shapiroTest(et)
    

    Since the reported p-value is much greater than \(\alpha\), we can conclude that the sample sums are sufficiently Normal. And, it is the sample sums that affect the distribution of \(b_0\) and \(b_1\).

    Thus, the model passes the normality requirement. ✅

    Constant Expected Value (Functional Form)

    The second assumption to test is constant expected value (proper model form):

    # Constant Expected Value of Residuals
    plot(vcrime90,e)
    runs.test(e, order=vcrime90)
    

    The residuals plot seems a bit inconclusive to me. This is mainly due to the single point far to the right (the District of Columbia). The runs test, however, indicates that there is no significant evidence the residuals follow anything other than a horizontal line (p-value = 0.6732).

    Thus, the model does not violate the second assumption. ✅


    Homoskedasticity (Constant Variance)

    The third assumption is that the variance of the residuals is constant:

    # Constant Variance of Residuals
    plot(vcrime90,e)
    bptest(crimeMod)
    

    For me, the graphic is inconclusive because of DC. The Breusch-Pagan test did not detect significant heteroskedasticity (p-value = 0.1041 > 0.05 = \(\alpha\)). Thus, the model passes the third requirement. ✅

    Multicollinearity

    The last thing to check is multicollinearity. There is no need to check it for this model because there is just the one independent variable that we already see has variation. So, it passes this test, too. ✅

    The Results of the Model

    Since none of the requirements are violated in this model, it seems appropriate. With that, we can see the estimates:

    Call:
    lm(formula = vcrime00 ~ vcrime90)
    
    Residuals:
        Min      1Q  Median      3Q     Max
    -241.32  -42.84  -18.04   40.97  208.41
    
    Coefficients:
                 Estimate Std. Error t value Pr(>|t|)
    (Intercept) 109.52716   21.42679   5.112 5.27e-06 ***
    vcrime90      0.58065    0.03107  18.689  < 2e-16 ***
    ---
    Signif. codes:  0 *** 0.001 ** 0.01 * 0.05 . 0.10   1
    
    Residual standard error: 85.55 on 49 degrees of freedom
    Multiple R-squared:  0.877, Adjusted R-squared:  0.8745
    F-statistic: 349.3 on 1 and 49 DF,  p-value: < 2.2e-16
    

    The first section of the output reports the model you presented. Use it to double-check you typed things in correctly or to remind you what the model is examining. The second part produces the five-number summary of the residuals. Since the mean (\(0\)) is greater than the median (\(-18.04\)), there is evidence of a positive skew to the residuals. This, we discovered above by looking at the histogram of the residuals.

    The third section is the "regression table." Each row corresponds to a different independent variable (or the intercept). The columns are the estimates, the standard errors, the test statistic (estimate divided by standard error), and the p-value.

    In this example, there is very strong evidence that the relationship between the violent crime rate in 1990 and in 2000 is positive with the effect \(b_1 = 0.58065\). If State A had a higher violent crime rates in 1990 than State B, then it also tended to have a higher violent crime rates in 2000.

    The intercept, \(b_0 = 109.52716\) indicates that for a state with \(0\) violent crime in 1990, the expected violent crime rate in 2000 is \(109.52716\) crimes per 100,000 people. However, since no state was close to having a violent crime rate in 1990 of \(0\), so this interpretation does not make statistical sense.

    Remember that we should only use our models to predict and estimate for values of the 1990 violent crime rate within the domain of the vcrime90 variable in our data. Interpolation is appropriate; extrapolation, rarely so.

    The fourth part contains the statistics that describe the model, as a whole. The estimate for \(\sigma\) is \(85.55\). [Remember \(\varepsilon_i \stackrel{\text{iid}}{\sim} N(0, \sigma)\).] The model explains 87.45% of the variance contained in the null model. The model does give some information about the dependent variable (\(F = 349.3\); \(\nu_1 = 1\), \(\nu_2 = 49\); p-value \(\ll 0.0001\)).

    The Graphic

    fig-ch01_patchfile_01.jpg
    Figure \(\PageIndex{1}\): Plot of the violent crime rate in 2000 against that in 1990. The ordinary least squares line of best fit is included. Red-colored states are those whose violent crime rate increased from 1990 to 2000.

    The following code produces a graphic like that of Figure \(\PageIndex{1}\):

    plot.new()
    plot.window( xlim=c(0,2500), ylim=c(0,2500) )
    
    axis(1); axis(2)
    
    title(xlab="Violent Crime Rate (1990)", line=2.75)
    title(ylab="Violent Crime Rate (2000)", line=3.25)
    
    xx = seq(0,2500)
    yy = predict(crimeMod, newdata=data.frame(vcrime90=xx))
    lines(xx,yy, col="steelblue", lwd=2)
    
    points(vcrime90,vcrime00, pch=21, bg=1+(vcrime00>vcrime90))
    

    Note that the graphic also indicates which states had their violent crime rate increase. It comes from this line:

    points(vcrime90,vcrime00, pch=21, bg=1+(vcrime00>vcrime90))
    

    The first two slots are the x- and y-values. The third slot specifies the plotting character. A pch of 21 is a dot with its insides colorable. The fourth slot, bg, specifies the color to fill the inside of the dots (bg = "background").

    The part (vcrime00>vcrime90) is a Boolean value that takes on value 1 if the statement is true (violent crime rate increased) and 0 otherwise. Adding 1 to each ensures that the two colors are 1 and 2 — black and red.

    Confidence Interval for β1

    In addition to calculating the point estimates of the slope and intercept, we can also calculate confidence intervals:

    confint(crimeMod)
    

    From this output, we are 95% confident that the effect of the violent crime rate in 1990 on 2000 is between \(0.518\) and \(0.643\).

    Confidence Interval for Y

    We can calculate the expected value of \(Y\) for a given value of \(x\):

    predict(crimeMod, newdata=data.frame(vcrime90=100), interval="confidence")
    

    We are 95% confident that the expected value of \(Y\) when \(x=100\) is between \(129.5\) and \(205.6\), with a point estimate of \(167.6\).

    Prediction Interval for Y

    Finally, we can predict the value of \(Y\) for a new value of \(x\):

    predict(crimeMod, newdata=data.frame(vcrime90=100), interval="prediction")
    

    We are 95% sure that the next observation of the violent crime rate in 2000 for a state with a violent crime rate in 1990 of 100 is between \(-8.5\) and \(343.7\), with a point prediction of \(167.6\).


    This page titled 7.1: Violent Crime is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Ole Forsberg.

    • Was this article helpful?