# 7.10: Practice problems

- Page ID
- 33289

7.1. **Treadmill data analysis** We will continue with the treadmill data set introduced in Chapter 1 and the SLR fit in the practice problems in Chapter 6. The following code will get you back to where we stopped at the end of Chapter 6:

```
treadmill <- read_csv("http://www.math.montana.edu/courses/s217/documents/treadmill.csv")
treadmill %>% ggplot(mapping = aes(x = RunTime, y = TreadMillOx)) +
geom_point(aes(color = Age)) +
geom_smooth(method = "lm") +
geom_smooth(se = F, lty = 2, col = "red") +
theme_bw()
tm <- lm(TreadMillOx ~ RunTime, data = treadmill)
summary(tm1)
```

7.1.1. Use the output to test for a linear relationship between treadmill oxygen and run time, writing out all 6+ steps of the hypothesis test. Make sure to address scope of inference and interpret the p-value.

7.1.2. Form and interpret a 95% confidence interval for the slope coefficient “by hand” using the provided multiplier:

`qt(0.975, df = 29)`

`## [1] 2.04523`

7.1.3. Use the `confint`

function to find a similar confidence interval, checking your previous calculation.

7.1.4. Use the `predict`

function to find fitted values, 95% confidence, and 95% prediction intervals for run times of 11 and 16 minutes.

7.1.5. Interpret the CI and PI for the 11 minute run time.

7.1.6. Compare the width of either set of CIs and PIs – why are they different? For the two different predictions, why are the intervals wider for 16 minutes than for 11 minutes?

7.1.7. The Residuals vs Fitted plot considered in Chapter 6 should have suggested slight non-constant variance and maybe a little missed nonlinearity. Perform a log-transformation of the treadmill oxygen response variable and re-fit the SLR model. Remake the diagnostic plots and discuss whether the transformation changed any of them.

7.1.8 Summarize the \(\log(y) \sim x\) model and interpret the slope coefficient on the transformed and original scales, regardless of the answer to the previous question.

### References

*Stats: Data and Models, 3rd Edition*. Pearson.

*Arctic, Antarctic, and Alpine Research*42(4): 396–405. doi.org/10.1657/1938-4246-42.4.396.

*Journal of Statistical Software*8 (15): 1–27. http://www.jstatsoft.org/v08/i15/.

*carData: Companion to Applied Regression Data Sets*. https://CRAN.R-project.org/package=carData.

*Handbook of the Philosophy of Science, Vol. 7: Statistics*, edited by P. S. Bandyopadhyay and M. Forster, 1117–42. Elsevier.

*Computing Science and Statistics*34: 452–60.

*Geophysical Research Letters*34 (16). doi.org/10.1029/2007GL031022.

*The Statistical Sleuth: A Course in Methods of Data Analysis*. Cengage Learning. https://books.google.com/books?id=eSlLjA9TwkUC.

*Global Change Biology*24 (5): 2182–97. doi.org/10.1111/gcb.14042.

- We can also write this as \(E(y_i|x_i) = \mu\{y_i|x_i\} = \beta_0 + \beta_1x_i\), which is the notation you will see in books like the
*Statistical Sleuth*(Ramsey and Schafer 2012). We will use notation that is consistent with how we originally introduced the methods.↩︎ - There is an area of statistical research on how to optimally choose \(x\)-values to get the most precise estimate of a slope coefficient. In observational studies we have to deal with whatever pattern of \(x\text{'s}\) we ended up with. If you can choose, generate an even spread of \(x\text{'s}\) over some range of interest similar to what was used in the
*Beers*vs*BAC*study to provide the best distribution of values to discover the relationship across the selected range of \(x\)-values.↩︎ - See http://fivethirtyeight.com/features/which-city-has-the-most-unpredictable-weather/ for an interesting discussion of weather variability where Great Falls, MT had a very high rating on “unpredictability”.↩︎
- It is actually pretty amazing that there are hundreds of locations in the U.S. with nearly complete daily records for over 100 years.↩︎
- All joking aside, if researchers can find evidence of climate change using
methods (methods that reject the null hypothesis when it is true less often than stated), then their results are even harder to ignore.↩︎*conservative* - It took many permutations to get competitor plots this close to the real data set and they really aren’t that close.↩︎
- If the removal is of a point that is extreme in \(x\)-values, then it is appropriate to note that the results only apply to the restricted range of \(x\)-values that were actually analyzed in the scope of inference discussion. Our results only ever apply to the range of \(x\)-values we had available so this is a relatively minor change.↩︎
- Note
`exp(x)`

is the same as \(e^{(x)}\) but easier to read in-line and`exp()`

is the R function name to execute this calculation.↩︎ - You can read my dissertation if you want my take on modeling U and V-shaped valley elevation profiles that included some discussion of these models, some of which was also in M. C. Greenwood and Humphrey (2002).↩︎
- This transformation could not be applied directly to the education growth score data in Chapter 5 because there were negative “growth” scores.↩︎
- This silly nomenclature was inspired by De Veaux, Velleman, and Bock (2011)
*Stats: Data and Models*text. If you find this too cheesy, you can just call it x-vee.↩︎ - The
`geom_ribbon`

has been used inside the`geom_smooth`

function we have used before, but this is the first time we are drawing these intervals ourselves.↩︎ - I have really enjoyed writing this book and enjoy updating it yearly, but hope someone else gets to do the work of checking the level of inaccuracy of this model in another 30 years.↩︎