Categorical analysis can also be applied to contingency tables where there are more than two categories for each variable.
For example, let’s look at the NHANES data and compare the variable Depressed which denotes the “self-reported number of days where participant felt down, depressed or hopeless”. This variable is coded as
Most. Let’s test whether this variable is related to the SleepTrouble variable which reports whether the individual has reported sleeping problems to a doctor.
Simply by looking at these data, we can tell that it is likely that there is a relationship between the two variables; notably, while the total number of people with sleep trouble is much less than those without, for people who report being depresssed most days the number with sleep problems is greater than those without. We can quantify this directly using the chi-squared test; if our data frame contains only two variables, then we can simply provide the data frame as the argument to the
## ## Pearson's Chi-squared test ## ## data: depressedSleepTroubleTable ## X-squared = 191, df = 2, p-value <2e-16
This test shows that there is a strong relationship between depression and sleep trouble. We can also compute the Bayes factor to quantify the strength of the evidence in favor of the alternative hypothesis:
## Bayes factor analysis ## -------------- ##  Non-indep. (a=1) : 1.8e+35 ±0% ## ## Against denominator: ## Null, independence, a = 1 ## --- ## Bayes factor type: BFcontingencyTable, joint multinomial
Here we see that the Bayes factor is very large, showing that the evidence in favor of a relation between depression and sleep problems is very strong.