Now we will look at the hate crime data from the
fivethirtyeight package. First we need to prepare the data by getting rid of NA values and creating abbreviations for the states. To do the latter, we use the
state.name variables that come with R along with the
match() function that will match the state names in the
hate_crimes variable to those in the list.
hateCrimes <- hate_crimes %>% mutate(state_abb = state.abb[match(state,state.name)]) %>% drop_na(avg_hatecrimes_per_100k_fbi, gini_index) # manually fix the DC abbreviation hateCrimes$state_abb[hateCrimes$state=="District of Columbia"] <- 'DC'
## ## Pearson's product-moment correlation ## ## data: hateCrimes$avg_hatecrimes_per_100k_fbi and hateCrimes$gini_index ## t = 3, df = 48, p-value = 0.001 ## alternative hypothesis: true correlation is greater than 0 ## 95 percent confidence interval: ## 0.21 1.00 ## sample estimates: ## cor ## 0.42
Remember that we can also compute the p-value using randomization. To to this, we shuffle the order of one of the variables, so that we break the link between the X and Y variables — effectively making the null hypothesis (that the correlation is less than or equal to zero) true. Here we will first create a function that takes in two variables, shuffles the order of one of them (without replacement) and then returns the correlation between that shuffled variable and the original copy of the second variable.
Now we take the distribution of observed correlations after shuffling and compare them to our observed correlation, in order to obtain the empirical probability of our observed data under the null hypothesis.
mean(shuffleDist$cor >corr_results$estimate )
##  0.0066
This value is fairly close (though a bit larger) to the one obtained using