4.1: Prelude to Foundations for Inference

Last updated
Save as PDF

Page ID: 1903

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

Statistical inference is concerned primarily with understanding the quality of parameter estimates. For example, a classic inferential question is, "How sure are we that the estimated mean, \( \bar {x}\), is near the true population mean, \(\mu\)?" While the equations and details change depending on the setting, the foundations for inference are the same throughout all of statistics. We introduce these common themes in Sections 4.1-4.4 by discussing inference about the population mean, \(\mu\), and set the stage for other parameters and scenarios in Section 4.5. Some advanced considerations are discussed in Section 4.6. Understanding this chapter will make the rest of this book, and indeed the rest of statistics, seem muchmore familiar.

Throughout the next few sections we consider a data set called run10, which represents all 16,924 runners who nished the 2012 Cherry Blossom 10 mile run in Washington, DC.¹ Part of this data set is shown in Table 4.1, and the variables are described in Table 4.2.

Table 4.1: Six observations from the run10 data set.
ID	time	age	gender	state
1 2 3 4 \(\vdots\) 16923 16924	92.25 106.35 89.33 113.50 \(\vdots\) 122.87 93.30	38.00 33.00 55.00 24.00 \(\vdots\) 37.00 27.00	M M F F \(\vdots\) F F	MD DC VA VA \(\vdots\) VA DC

Table 4.2: Variables and their descriptions for the run10 data set.
variable	description
time age gender state	Ten mile run time, in minutes Age, in years Gender (M for male, F for female) Home state (or country if not from the US)

¹http://www.cherryblossom.org

Table 4.3: Four observations for the run10Samp data set, which represents a simple random sample of 100 runners from the 2012 Cherry Blossom Run.
ID	time	age	gender	state
1983 8192 11020 \(\vdots\) 1287	88.31 100.67 109.52 \(\vdots\) 89.49	59 32 33 \(\vdots\) 26	M M F \(\vdots\) M	MD VA VA \(\vdots\) DC

alt — Figure 4.4: Histograms of time and age for the sample Cherry Blossom Run data. The average time is in the mid-90s, and the average age is in the mid-to-upper 30s. The age distribution is moderately skewed to the right.

These data are special because they include the results for the entire population of runners who nished the 2012 Cherry Blossom Run. We took a simple random sample of this population, which is represented in Table 4.3. We will use this sample, which we refer to as the run10Samp data set, to draw conclusions about the entire population. This is the practice of statistical inference in the broadest sense. Two histograms summarizing the time and age variables in the run10Samp data set are shown in Figure 4.4.