4.0: Prelude to Foundations for Inference
Statistical inference is concerned primarily with understanding the quality of parameter estimates. For example, a classic inferential question is, "How sure are we that the estimated mean, \( \bar {x}\), is near the true population mean, \(\mu\)?" While the equations and details change depending on the setting, the foundations for inference are the same throughout all of statistics. We introduce these common themes in Sections 4.1-4.4 by discussing inference about the population mean, \(\mu\), and set the stage for other parameters and scenarios in Section 4.5. Some advanced considerations are discussed in Section 4.6. Understanding this chapter will make the rest of this book, and indeed the rest of statistics, seem muchmore familiar.
Throughout the next few sections we consider a data set called run10, which represents all 16,924 runners who nished the 2012 Cherry Blossom 10 mile run in Washington, DC.^{1} Part of this data set is shown in Table 4.1, and the variables are described in Table 4.2.
ID | time | age | gender | state |
---|---|---|---|---|
1 2 3 4 \(\vdots\) 16923 16924 |
92.25 106.35 89.33 113.50 \(\vdots\) 122.87 93.30 |
38.00 33.00 55.00 24.00 \(\vdots\) 37.00 27.00 |
M M F F \(\vdots\) F F |
MD DC VA VA \(\vdots\) VA DC |
variable |
description |
---|---|
time age gender state |
Ten mile run time, in minutes Age, in years Gender (M for male, F for female) Home state (or country if not from the US) |
^{1}http://www.cherryblossom.org
ID | time | age | gender | state |
---|---|---|---|---|
1983 8192 11020 \(\vdots\) 1287 |
88.31 100.67 109.52 \(\vdots\) 89.49 |
59 32 33 \(\vdots\) 26 |
M M F \(\vdots\) M |
MD VA VA \(\vdots\) DC |
Figure 4.4: Histograms of time and age for the sample Cherry Blossom Run data. The average time is in the mid-90s, and the average age is in the mid-to-upper 30s. The age distribution is moderately skewed to the right.
These data are special because they include the results for the entire population of runners who nished the 2012 Cherry Blossom Run. We took a simple random sample of this population, which is represented in Table 4.3. We will use this sample, which we refer to as the run10Samp data set, to draw conclusions about the entire population. This is the practice of statistical inference in the broadest sense. Two histograms summarizing the time and age variables in the run10Samp data set are shown in Figure 4.4.