2.13: Practice problems

Last updated
Save as PDF

Page ID: 33226

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $ $ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $$\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$ $\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$$\newcommand{\AA}{\unicode[.8,0]{x212B}}$

2.13 Practice problems

2.1. Overtake Distance Analysis The tests for the overtake distance data were performed with two-sided alternatives and so two-sided areas used to find the p-values. Suppose that the researchers expected that the average passing distance would be less (closer) for the commute clothing than for the casual clothing group. Repeat obtaining the permutation-based p-value for the one-sided test for either the full or smaller sample data set. Hint: Your p-value should be just about half of what it was before and in the direction of the alternative.

2.2. HELP Study Data Analysis Load the HELPrct data set from the mosaicData package (Pruim, Kaplan, and Horton 2021b) (you need to install the mosaicData package once to be able to load it). The HELP study was a clinical trial for adult inpatients recruited from a detoxification unit. Patients with no primary care physician were randomly assigned to receive a multidisciplinary assessment and a brief motivational intervention or usual care and various outcomes were observed. Two of the variables in the data set are sex, a factor with levels male and female and daysanysub which is the time (in days) to first use of any substance post-detox. We are interested in the difference in mean number of days to first use of any substance post-detox between males and females. There are some missing responses and the following code will produce favstats with the missing values and then provide a data set that by applying the drop_na() function to the piped data set removes any observations with missing values.

library(mosaicData)
data(HELPrct)
# Just focus on two variables
HELPrct2 <- HELPrct %>% select(daysanysub, sex) 
# Removes subjects (complete rows) with any missing values
HELPrct3 <- HELPrct2 %>% drop_na() 
favstats(daysanysub ~ sex, data = HELPrct2)
favstats(daysanysub ~ sex, data = HELPrct3)

2.2.1. Based on the results provided, how many observations were missing for males and females? Missing values here likely mean that the subjects didn’t use any substances post-detox in the time of the study but might have at a later date – the study just didn’t run long enough. This is called censoring. What is the problem with the numerical summaries here if the missing responses were all something larger than the largest observation?

2.2.2. Make a pirate-plot and a boxplot of daysanysub ~ sex using the HELPrct3 data set created above. Compare the distributions, recommending parametric or nonparametric inferences.

2.2.3. Generate the permutation results and write out the 6+ steps of the hypothesis test.

2.2.4. Interpret the p-value for these results.

2.2.5. Generate the parametric test results using lm, reporting the test-statistic, its distribution under the null hypothesis, and compare the p-value to those observed using the permutation approach.

2.2.6. Make and interpret a 95% bootstrap confidence interval for the difference in the means.

References

Bland, J Martin, and Douglas G Altman. 1995. “Multiple Significance Tests: The Bonferroni Method.” BMJ 310 (6973): 170. https://doi.org/10.1136/bmj.310.6973.170.

Kampstra, Peter. 2008. “Beanplot: A Boxplot Alternative for Visual Comparison of Distributions.” Journal of Statistical Software, Code Snippets 28 (1): 1–9. http://www.jstatsoft.org/v28/c01/.

Phillips, Nathaniel. 2017. Yarrr: A Companion to the e-Book "YaRrr!: The Pirate’s Guide to r". www.thepiratesguidetor.com.

Pruim, Randall, Daniel T. Kaplan, and Nicholas J. Horton. 2021a. Mosaic: Project MOSAIC Statistics and Mathematics Teaching Utilities. https://CRAN.R-project.org/package=mosaic.

Pruim, Randall, Daniel Kaplan, and Nicholas Horton. 2021b. mosaicData: Project MOSAIC Data Sets. https://github.com/ProjectMOSAIC/mosaicData.

Schneck, Andreas. 2017. “Examining Publication Bias—a Simulation-Based Evaluation of Statistical Tests on Publication Bias.” PeerJ 5 (November): e4115. https://doi.org/10.7717/peerj.4115.

Smith, Michael L. 2014. “Honey Bee Sting Pain Index by Body Location.” PeerJ 2 (April): e338. https://doi.org/10.7717/peerj.338.

Walker, Ian, Ian Garrard, and Felicity Jowitt. 2014. “The Influence of a Bicycle Commuter’s Appearance on Drivers’ Overtaking Proximities: An on-Road Test of Bicyclist Stereotypes, High-Visibility Clothing and Safety Aids in the United Kingdom.” Accident Analysis & Prevention 64: 69–77. https://doi.org/https://doi.org/10.1016/j.aap.2013.11.007.

Wasserstein, Ronald L., and Nicole A. Lazar. 2016. “The ASA Statement on p-Values: Context, Process, and Purpose.” The American Statistician 70 (2): 129–33. doi.org/10.1080/00031305.2016.1154108.

You will more typically hear “data is” but that more often refers to information, sometimes even statistical summaries of data sets, than to observations made on subjects collected as part of a study, suggesting the confusion of this term in the general public. We will explore a data set in Chapter 5 related to perceptions of this issue collected by researchers at http://fivethirtyeight.com/.↩︎
Either try to remember “data is a plural word” or replace “data” with “things” or, as one former student suggested that helped her with this, replace “data” with “puppies” or “penguins” in your sentence and consider whether it sounds right.↩︎
Of particular interest to the bicycle rider might be the “close” passes and we will revisit this as a categorical response with “close” and “not close” as its two categories later.↩︎
Thanks to Ian Walker for allowing me to use and post these data.↩︎
As noted previously, we reserve the term “effect” for situations where random assignment allows us to consider causality as the reason for the differences in the response variable among levels of the explanatory variable.↩︎
Some might call this data manipulation or transformation, but those terms can have other meanings and we want a term to capture organizing, preparing, and possibly modifying the data to prepare for analysis and doing it reproducibly in what we like to call “data wrangling”.↩︎
If you’ve taken calculus, you will know that the curve is being constructed so that the integral from $-\infty$ to $\infty$ is 1. If you don’t know calculus, think of a rectangle with area of 1 based on its height and width. These cover the same area but the top of the region wiggles.↩︎
I admit that there are parts of the logic of using ggplot that are confusing to me and this is one of them – but I learned to plot in R before ggplot2 and have been growing fonder and fonder of this way of working. Now instead of searching the internet, I will just get to search my book for the code to make this version of the plot.↩︎
If you want to type this character in R Markdown, try $\sim$ outside of code chunks.↩︎
Remember the bell-shaped curve you encountered in introductory statistics? If not, you can see some at https://en.Wikipedia.org/wiki/Normal_distribution.↩︎
The package and function are intentionally amusingly titled but are based on ideas in the beanplot in Kampstra (2008) and provide what they call an RDI graphic – Raw data, Descriptive, and Inferential statistic in the same display.↩︎
The default version seems to get mis-interpreted as the box from a boxplot too easily. This display choice also matches the display style for later plots for confidence intervals in term-plots.↩︎
The hypothesis of no difference that is typically generated in the hopes of being rejected in favor of the alternative hypothesis, which contains the sort of difference that is of interest in the application.↩︎
The null model is the statistical model that is implied by the chosen null hypothesis. Here, a null hypothesis of no difference translates to having a model with the same mean for both groups.↩︎
Later we will shuffle other types of explanatory variables.↩︎
While not required, we often set our random number seed using the set.seed function so that when we re-run code with randomization in it we get the same results. ↩︎
We’ll see the shuffle function in a more common usage below; here we are creating a new variable using mutate to show the permuted results that are stored in Perm1.↩︎
This is a bit like getting a new convertible sports car and driving it to the grocery store – there might be better ways to get groceries, but we probably would want to drive our new car as soon as we got it.↩︎
This will be formalized and explained more in the next chapter when we encounter more than two groups in these same models. For now, it is recommended to start with the sample means from favstats for the two groups and then use that to sort out which direction the differencing was done in the lm output.↩︎
P-values are the probability of obtaining a result as extreme as or more extreme than we observed given that the null hypothesis is true.↩︎
In statistics, vectors are one dimensional lists of numeric elements – basically a column from a matrix of our tibble.↩︎
We often say “under” in statistics and we mean “given that the following is true”.↩︎
This is another place where the code is a bit cryptic when you are starting – just copy this entire chunk of code – you only ever need to modify the lm line in this code!↩︎
This is a fancy way of saying “in advance”, here in advance of seeing the observations.↩︎
Statistically, a conservative method is one that provides less chance of rejecting the null hypothesis in comparison to some other method or less than some pre-defined standard. A liberal method provides higher rates of false rejections.↩︎
Both approaches are reasonable. By using both tails of the distribution we can incorporate potential differences in shape in both tails of the permutation distribution.↩︎
P-values of 1 are the only result that provide no evidence against the null hypothesis but this still doesn’t prove that the null hypothesis is true.↩︎
We’ll leave the discussion of the CLT to your previous statistics coursework or an internet search. For this material, just remember that it has something to do with distributions of statistics looking more normal as the sample size increases.↩︎
The t.test function with the var.equal = T option is the more direct route to calculating this statistic (here that would be t.test(Distance ~ Condition, data = dsamp, var.equal = T)), but since we can get the result of interest by fitting a linear model, we will use that approach.↩︎
On exams, you might be asked to describe the area of interest, sketch a picture of the area of interest, and/or note the distribution you would use. Make sure you think about what you are trying to do here as much as learning the mechanics of how to get p-values from R.↩︎
In some studies, the same subject is measured in both conditions and this violates the assumptions of this procedure.↩︎
At this level, it is critical to learn the tools and learn where they might provide inaccurate inferences. If you explore more advanced statistical resources, you will encounter methods that can allow you to obtain valid inferences in even more scenarios.↩︎
Only male and female were provided as options on the survey. These data were collected as part of a project to study learning of material using online versus paper versions of this book but we focus just on the gender differences in GPA here.↩︎
The data are provided and briefly discussed in the Practice Problems for Chapter 3.↩︎
Researchers often measure multiple related response variables on the same subjects while they are conducting a study, so these would not meet the “independent studies” assumption that is used here, but we can start with the assumption of independent results across these responses as the math is easier and the results are conservative. You can consult a statistician for other related approaches that incorporate the dependency of the different responses.↩︎
You can correctly call octothorpes number symbols or, in the twitter verse, hashtags. For more on this symbol, see “http://blog.dictionary.com/octothorpe/”. Even after reading this, I call them number symbols.↩︎
An unbiased estimator is a statistic that is on average equal to the population parameter.↩︎
Some perform bootstrap sampling in this situation by re-sampling within each of the groups. We will discuss using this technique in situations without clearly defined groups, so prefer to sample with replacement from the entire data set. It also directly corresponds to situations where the data came from one large sample and then the grouping variable of interest was measured on the $n$ subjects.↩︎
The as.numeric function is also used here. It really isn’t important but makes sure the output of table is sorted by observation number by first converting the orig.id variable into a numeric vector.↩︎
In any bootstrap sample, about 1/3 of the observations are not used at all.↩︎
There are actually many ways to use this information to make a confidence interval. We are using the simplest method that is called the “percentile” method.↩︎
When hypothesis tests “work well” they have high power to detect differences while having Type I error rates that are close to what we choose a priori. When confidence intervals “work well”, they contain the true parameter value in repeated random samples at around the selected confidence level, which is called the coverage rate. ↩︎
We will often use this term to indicate perform a calculation using the favstats results – not that you need to go back to the data set and calculate the means and standard deviations yourself.↩︎
Note that this modifier is added to note less certainty than when we encounter strong evidence against the null. Also note that someone else might decide that this more like weak evidence against the null and might choose to interpret it as in the “weak” case. In cases that are near boundaries for evidence levels, it becomes difficult to find a universal answer and it is best to report that the evidence is both not strong and not weak and is somewhere in between and let the reader decide what they think it means to them. This is complicated by often needing to make decisions about next steps based on p-values where we might choose to focus on the model with a difference or without it.↩︎