# 4.0: Prelude to Foundations for Inference

Statistical inference is concerned primarily with understanding the quality of parameter estimates. For example, a classic inferential question is, "How sure are we that the estimated mean, $$\bar {x}$$, is near the true population mean, $$\mu$$?" While the equations and details change depending on the setting, the foundations for inference are the same throughout all of statistics. We introduce these common themes in Sections 4.1-4.4 by discussing inference about the population mean, $$\mu$$, and set the stage for other parameters and scenarios in Section 4.5. Some advanced considerations are discussed in Section 4.6. Understanding this chapter will make the rest of this book, and indeed the rest of statistics, seem muchmore familiar.

Throughout the next few sections we consider a data set called run10, which represents all 16,924 runners who nished the 2012 Cherry Blossom 10 mile run in Washington, DC.1  Part of this data set is shown in Table 4.1, and the variables are described in Table 4.2.

Table 4.1: Six observations from the run10 data set.
ID  time age gender state

1

2

3

4

$$\vdots$$

16923

16924

92.25

106.35

89.33

113.50

$$\vdots$$

122.87

93.30

38.00

33.00

55.00

24.00

$$\vdots$$

37.00

27.00

M

M

F

F

$$\vdots$$

F

F

MD

DC

VA

VA

$$\vdots$$

VA

DC

Table 4.2: Variables and their descriptions for the run10 data set.
variable

description

time

age

gender

state

Ten mile run time, in minutes

Age, in years

Gender (M for male, F for female)

Home state (or country if not from the US)

1http://www.cherryblossom.org

Table 4.3: Four observations for the run10Samp data set, which represents a simple random sample of 100 runners from the 2012 Cherry Blossom Run.
ID  time age gender state

1983

8192

11020

$$\vdots$$

1287

88.31

100.67

109.52

$$\vdots$$

89.49

59

32

33

$$\vdots$$

26

M

M

F

$$\vdots$$

M

MD

VA

VA

$$\vdots$$

DC

Figure 4.4: Histograms of time and age for the sample Cherry Blossom Run data. The average time is in the mid-90s, and the average age is in the mid-to-upper 30s. The age distribution is moderately skewed to the right.

These data are special because they include the results for the entire population of runners who nished the 2012 Cherry Blossom Run. We took a simple random sample of this population, which is represented in Table 4.3. We will use this sample, which we refer to as the run10Samp data set, to draw conclusions about the entire population. This is the practice of statistical inference in the broadest sense. Two histograms summarizing the time and age variables in the run10Samp data set are shown in Figure 4.4.