Skip to main content
Statistics LibreTexts

7.10: Practice problems

  • Page ID
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    7.1. Treadmill data analysis We will continue with the treadmill data set introduced in Chapter 1 and the SLR fit in the practice problems in Chapter 6. The following code will get you back to where we stopped at the end of Chapter 6:

    treadmill <- read_csv("")
    treadmill %>% ggplot(mapping = aes(x = RunTime, y = TreadMillOx)) +
      geom_point(aes(color = Age)) +
      geom_smooth(method = "lm") +
      geom_smooth(se = F, lty = 2, col = "red") +
    tm <- lm(TreadMillOx ~ RunTime, data = treadmill)

    7.1.1. Use the output to test for a linear relationship between treadmill oxygen and run time, writing out all 6+ steps of the hypothesis test. Make sure to address scope of inference and interpret the p-value.

    7.1.2. Form and interpret a 95% confidence interval for the slope coefficient “by hand” using the provided multiplier:

    qt(0.975, df = 29)
    ## [1] 2.04523

    7.1.3. Use the confint function to find a similar confidence interval, checking your previous calculation.

    7.1.4. Use the predict function to find fitted values, 95% confidence, and 95% prediction intervals for run times of 11 and 16 minutes.

    7.1.5. Interpret the CI and PI for the 11 minute run time.

    7.1.6. Compare the width of either set of CIs and PIs – why are they different? For the two different predictions, why are the intervals wider for 16 minutes than for 11 minutes?

    7.1.7. The Residuals vs Fitted plot considered in Chapter 6 should have suggested slight non-constant variance and maybe a little missed nonlinearity. Perform a log-transformation of the treadmill oxygen response variable and re-fit the SLR model. Remake the diagnostic plots and discuss whether the transformation changed any of them.

    7.1.8 Summarize the \(\log(y) \sim x\) model and interpret the slope coefficient on the transformed and original scales, regardless of the answer to the previous question.


    De Veaux, Richard D., Paul F. Velleman, and David E. Bock. 2011. Stats: Data and Models, 3rd Edition. Pearson.
    Dieser, Markus, Mark C. Greenwood, and Christine M. Foreman. 2010. “Carotenoid Pigmentation in Antarctic Heterotrophic Bacteria as a Strategy to Withstand Environmental Stresses.” Arctic, Antarctic, and Alpine Research 42(4): 396–405.
    Fox, John. 2003. “Effect Displays in R for Generalised Linear Models.” Journal of Statistical Software 8 (15): 1–27.
    ———. 2022b. carData: Companion to Applied Regression Data Sets.
    Greenwood, Mark C., Joel Harper, and Johnnie Moore. 2011. “An Application of Statistics in Climate Change: Detection of Nonlinear Changes in a Streamflow Timing Measure in the Columbia and Missouri Headwaters.” In Handbook of the Philosophy of Science, Vol. 7: Statistics, edited by P. S. Bandyopadhyay and M. Forster, 1117–42. Elsevier.
    Greenwood, Mark C., and N. F. Humphrey. 2002. “Glaciated Valley Profiles: An Application of Nonlinear Regression.” Computing Science and Statistics 34: 452–60.
    Moore, Johnnie N., Joel T. Harper, and Mark C. Greenwood. 2007. “Significance of Trends Toward Earlier Snowmelt Runoff, Columbia and Missouri Basin Headwaters, Western United States.” Geophysical Research Letters 34 (16).
    Ramsey, Fred, and Daniel Schafer. 2012. The Statistical Sleuth: A Course in Methods of Data Analysis. Cengage Learning.
    Santibáñez, Pamela A., Olivia J. Maselli, Mark C. Greenwood, Mackenzie M. Grieman, Eric S. Saltzman, Joseph R. McConnell, and John C. Priscu. 2018. “Prokaryotes in the WAIS Divide Ice Core Reflect Source and Transport Changes Between Last Glacial Maximum and the Early Holocene.” Global Change Biology 24 (5): 2182–97.

    1. We can also write this as \(E(y_i|x_i) = \mu\{y_i|x_i\} = \beta_0 + \beta_1x_i\), which is the notation you will see in books like the Statistical Sleuth (Ramsey and Schafer 2012). We will use notation that is consistent with how we originally introduced the methods.↩︎
    2. There is an area of statistical research on how to optimally choose \(x\)-values to get the most precise estimate of a slope coefficient. In observational studies we have to deal with whatever pattern of \(x\text{'s}\) we ended up with. If you can choose, generate an even spread of \(x\text{'s}\) over some range of interest similar to what was used in the Beers vs BAC study to provide the best distribution of values to discover the relationship across the selected range of \(x\)-values.↩︎
    3. See for an interesting discussion of weather variability where Great Falls, MT had a very high rating on “unpredictability”.↩︎
    4. It is actually pretty amazing that there are hundreds of locations in the U.S. with nearly complete daily records for over 100 years.↩︎
    5. All joking aside, if researchers can find evidence of climate change using conservative methods (methods that reject the null hypothesis when it is true less often than stated), then their results are even harder to ignore.↩︎
    6. It took many permutations to get competitor plots this close to the real data set and they really aren’t that close.↩︎
    7. If the removal is of a point that is extreme in \(x\)-values, then it is appropriate to note that the results only apply to the restricted range of \(x\)-values that were actually analyzed in the scope of inference discussion. Our results only ever apply to the range of \(x\)-values we had available so this is a relatively minor change.↩︎
    8. Note exp(x) is the same as \(e^{(x)}\) but easier to read in-line and exp() is the R function name to execute this calculation.↩︎
    9. You can read my dissertation if you want my take on modeling U and V-shaped valley elevation profiles that included some discussion of these models, some of which was also in M. C. Greenwood and Humphrey (2002).↩︎
    10. This transformation could not be applied directly to the education growth score data in Chapter 5 because there were negative “growth” scores.↩︎
    11. This silly nomenclature was inspired by De Veaux, Velleman, and Bock (2011) Stats: Data and Models text. If you find this too cheesy, you can just call it x-vee.↩︎
    12. The geom_ribbon has been used inside the geom_smooth function we have used before, but this is the first time we are drawing these intervals ourselves.↩︎
    13. I have really enjoyed writing this book and enjoy updating it yearly, but hope someone else gets to do the work of checking the level of inaccuracy of this model in another 30 years.↩︎

    This page titled 7.10: Practice problems is shared under a CC BY-NC 4.0 license and was authored, remixed, and/or curated by via source content that was edited to the style and standards of the LibreTexts platform; a detailed edit history is available upon request.