# 2.E: Descriptive Statistics (Exercises)

These are homework exercises to accompany the Textmap created for "Introductory Statistics" by Shafer and Zhang.

### Basic

1. Describe one difference between a frequency histogram and a relative frequency histogram.
2. Describe one advantage of a stem and leaf diagram over a frequency histogram.
3. Construct a stem and leaf diagram, a frequency histogram, and a relative frequency histogram for the following data set. For the histograms use classes $$51-60$$, $$61-70$$, and so on. $\begin{array}69 & 92 & 68 & 77 & 80 \\ 70 & 85 & 88 & 85 & 96 \\ 93 & 75 & 76 & 82 & 100 \\ 53 & 70 & 70 & 82 & 85\end{array}$
4. Construct a stem and leaf diagram, a frequency histogram, and a relative frequency histogram for the following data set. For the histograms use classes $$6.0-6.9$$, $$7.0-7.9$$, and so on. $\begin{array}8.5 & 8.2 & 7.0 & 7.0 & 4.9 \\ 6.5 & 8.2 & 7.6 & 1.5 & 9.3 \\ 9.6 & 8.5 & 8.8 & 8.5 & 8.7 \\ 8.0 & 7.7 & 2.9 & 9.2 & 6.9\end{array}$
5. A data set contains $$n = 10$$ observations. The values $$x$$ and their frequencies $$f$$ are summarized in the following data frequency table. $\begin{array}{c|cccc}x & -1 & 0 & 1 & 2 \\ \hline f & 3 & 4 & 2 & 1\end{array}$Construct a frequency histogram and a relative frequency histogram for the data set.
6. A data set contains the $$n=20$$ observations The values $$x$$ and their frequencies $$f$$ are summarized in the following data frequency table. $\begin{array}{c|ccc}x & -1 & 0 & 1 & 2 \\ \hline f & 3 & a & 2 & 1\end{array}$The frequency of the value $$0$$ is missing. Find a and then sketch a frequency histogram and a relative frequency histogram for the data set.
7. A data set has the following frequency distribution table: $\begin{array}{c|ccc}x & 1 & 2 & 3 & 4 \\ \hline f & 3 & a & 2 & 1\end{array}$The number a is unknown. Can you construct a frequency histogram? If so, construct it. If not, say why not.
8. A table of some of the relative frequencies computed from a data set is $\begin{array}{c|ccc}x & 1 & 2 & 3 & 4 \\ \hline f ∕ n & 0.3 & p & 0.2 & 0.1\end{array}$The number $$p$$ is yet to be computed. Finish the table and construct the relative frequency histogram for the data set.

### Applications

1. The IQ scores of ten students randomly selected from an elementary school are given. $\begin{array}108 & 100 & 99 & 125 & 87 \\ 105 & 107 & 105 & 119 & 118\end{array}$Grouping the measures in the $$80s$$, the $$90s$$, and so on, construct a stem and leaf diagram, a frequency histogram, and a relative frequency histogram.
2. The IQ scores of ten students randomly selected from an elementary school for academically gifted students are given. $\begin{array}133 & 140 & 152 & 142 & 137 \\ 145 & 160 & 138 & 139 & 138\end{array}$Grouping the measures by their common hundreds and tens digits, construct a stem and leaf diagram, a frequency histogram, and a relative frequency histogram.
3. During a one-day blood drive $$300$$ people donated blood at a mobile donation center. The blood types of these $$300$$ donors are summarized in the table. $\begin{array}{c|ccc}Blood\: Type\hspace{0.167em} & O & A & B & AB \\ \hline Frequency & 136 & 120 & 32 & 12\end{array}$Construct a relative frequency histogram for the data set.
4. In a particular kitchen appliance store an electric automatic rice cooker is a popular item. The weekly sales for the last $$20$$weeks are shown. $\begin{array}20 & 15 & 14 & 14 & 18 \\ 15 & 17 & 16 & 16 & 18 \\ 15 & 19 & 12 & 13 & 9 \\ 19 & 15 & 15 & 16 & 15\end{array}$Construct a relative frequency histogram with classes $$6-10$$, $$11-15$$, and $$16-20$$.

1. Random samples, each of size $$n = 10$$, were taken of the lengths in centimeters of three kinds of commercial fish, with the following results: $\begin {array}{lrcccccccc} Sample \hspace{0.167em}1 : & 108 & 100 & 99 & 125 & 87 & 105 & 107 & 105 & 119 & 118 \\ Sample \hspace{0.167em} 2 : & 133 & 140 & 152 & 142 & 137 & 145 & 160 & 138 & 139 & 138 \\ Sample \hspace{0.167em} 3 : & 82 & 60 & 83 & 82 & 82 & 74 & 79 & 82 & 80 & 80\end{array}$Grouping the measures by their common hundreds and tens digits, construct a stem and leaf diagram, a frequency histogram, and a relative frequency histogram for each of the samples. Compare the histograms and describe any patterns they exhibit.
2. During a one-day blood drive $$300$$ people donated blood at a mobile donation center. The blood types of these $$300$$ donors are summarized below. $\begin{array}{c|ccc}Blood\: Type\hspace{0.167em} & O & A & B & AB \\ \hline Frequency & 136 & 120 & 32 & 12\end{array}$Identify the blood type that has the highest relative frequency for these $$300$$ people. Can you conclude that the blood type you identified is also most common for all people in the population at large? Explain.
3. In a particular kitchen appliance store, the weekly sales of an electric automatic rice cooker for the last $$20$$ weeks are as follows. $\begin{array}20 & 15 & 14 & 14 & 18 \\ 15 & 17 & 16 & 16 & 18 \\ 15 & 19 & 12 & 13 & 9 \\ 19 & 15 & 15 & 16 & 15\end{array}$In retail sales, too large an inventory ties up capital, while too small an inventory costs lost sales and customer satisfaction. Using the relative frequency histogram for these data, find approximately how many rice cookers must be in stock at the beginning of each week if
1. the store is not to run out of stock by the end of a week for more than $$15\%$$ of the weeks; and

2. the store is not to run out of stock by the end of a week for more than $$5\%$$ of the weeks.

4. In retail sales, too large an inventory ties up capital, while too small an inventory costs lost sales and customer satisfaction. Using the relative frequency histogram for these data, find approximately how many rice cookers must be in stock at the beginning of each week if the store is not to run out of stock by the end of a week for more than $$15\%$$ of the weeks; and the store is not to run out of stock by the end of a week for more than $$5\%$$ of the weeks.

1. The vertical scale on one is the frequencies and on the other is the relative frequencies.
2. $\begin{array}{r|cccccc}5 & 3 & & & & & & \\ 6 & 8 & 9 & & & & & \\ 7 & 0 & 0 & 0 & 5 & 6 & 7 & \\ 8 & 0 & 2 & 3 & 5 & 5 & 5 & 8 \\ 9 & 2 & 3 & 6 & & & & \\ 10 & 0 & & & & & &\end{array}$
3. Noting that $$n = 10$$ the relative frequency table is: $\begin{array}{c|cccc}x & -1 & 0 & 1 & 2 \\ \hline f ∕ n & 0.3 & 0.4 & 0.2 & 0.1\end{array}$
4. Since $$n$$ is unknown, $$a$$ is unknown, so the histogram cannot be constructed.
5. $\begin{array}{r|cccc}8 & 7 & & & & \\ 9 & 9 & & & & \\ 10 & 0 & 5 & 5 & 7 & 8 \\ 11 & 8 & 9 & & \\ 12 & 5 & & & &\end{array}$ Frequency and relative frequency histograms are similarly generated.
6. Noting $$n = 300$$, the relative frequency table is therefore: $\begin{array}{c|cccc}Blood\hspace{0.167em}Type & O & A & B & AB \\ \hline f ∕ n & 0.4533 & 0.4 & 0.1067 & 0.04\end{array}$ A relative frequency histogram is then generated.
7. The stem and leaf diagrams listed for Samples $$1,\, 2,\; \text{and}\; 3$$ in that order: $\begin{array}{c|ccccc}6 & & & & & \\ 7 & & & & & \\ 8 & 7 & & & & \\ 9 & 9 & & & & \\ 10 & 0 & 5 & 5 & 7 & 8 \\ 11 & 8 & 9 & & & \\ 12 & 5 & & & & \\ 13 & & & & & \\ 14 & & & & & \\ 15 & & & & & \\ 16 & & & & &\end{array}$

$\begin{array}{c|ccccc}6 & & & & & \\ 7 & & & & & \\ 8 & & & & & \\ 9 & & & & & \\ 10 & & & & & \\ 11 & & & & & \\ 12 & & & & & \\ 13 & 3 & 7 & 8 & 8 & 9 \\ 14 & 0 & 2 & 5 & & \\ 15 & 2 & & & & \\ 16 & 0 & & & &\end{array}$

$\begin{array}{c|ccccccc}6 & 0 & & & & \\ 7 & 4 & 9 & & & \\ 8 & 0 & 0 & 2 & 2 & 2 & 2 & 3 \\ 9 & & & & & \\ 10 & & & & & \\ 11 & & & & & \\ 12 & & & & & \\ 13 & & & & & \\ 14 & & & & & \\ 15 & & & & & \\ 16 & & & & &\end{array}$

The frequency tables are given below in the same order:

$\begin{array}{c|ccc}Length\hspace{0.167em} & 80 \sim 89 & 90 \sim 99 & 100 \sim 109 \\ \hline f & 1 & 1 & 5\end{array}$

$\begin{array}{c|cc}Length\hspace{0.167em} & 110 \sim 119 & 120 \sim 129 \\ \hline f & 2 & 1\end{array}$

$\begin{array}{c|ccc}Length\hspace{0.167em} & 130 \sim 139 & 140 \sim 149 & 150 \sim 159 \\ \hline f & 5 & 3 & 1\end{array}$

$\begin{array}{c|ccc}Length\hspace{0.167em} & 160 \sim 169 \\ \hline f & 1\end{array}$

$\begin{array}{c|ccc}Length\hspace{0.167em} & 60 \sim 69 & 70 \sim 79 & 80 \sim 89 \\ \hline f & 1 & 2 & 7\end{array}$

The relative frequency tables are also given below in the same order:

$\begin{array}{c|ccc}Length\hspace{0.167em} & 80 \sim 89 & 90 \sim 99 & 100 \sim 109 \\ \hline f ∕ n & 0.1 & 0.1 & 0.5\end{array}$ $\begin{array}{c|cc}Length\hspace{0.167em} & 110 \sim 119 & 120 \sim 129 \\ \hline f ∕ n & 0.2 & 0.1\end{array}$ $\begin{array}{c|ccc}Length\hspace{0.167em} & 130 \sim 139 & 140 \sim 149 & 150 \sim 159 \\ \hline f ∕ n & 0.5 & 0.3 & 0.1\end{array}$ $\begin{array}{c|c}Length\hspace{0.167em} & 160 \sim 169 \\ \hline f ∕ n & 0.1\end{array}$ $\begin{array}{c|ccc}Length\hspace{0.167em} & 60 \sim 69 & 70 \sim 79 & 80 \sim 89 \\ \hline f ∕ n & 0.1 & 0.2 & 0.7\end{array}$
1. 19
2. 20

## 2.2: Measures of Central Location

### Basic

1. For the sample data set $$\{1,2,6\}$$ find
1. $$\sum x$$
2. $$\sum x^2$$
3. $$\sum (x-3)$$
4. $$\sum (x-3)^2$$
2. For the sample data set $$\{-1,0,1,4\}$$ find
1. $$\sum x$$
2. $$\sum x^2$$
3. $$\sum (x-1)$$
4. $$\sum (x-1)^2$$
3. Find the mean, the median, and the mode for the sample $1\; 2\; 3\; 4$
4. Find the mean, the median, and the mode for the sample $3\; 3\; 4\; 4$
5. Find the mean, the median, and the mode for the sample $2\; 1\; 2\; 7$
6. Find the mean, the median, and the mode for the sample $-1\; 0\; 1\; 4\; 1\; 1$
7. Find the mean, the median, and the mode for the sample data represented by the table $\begin{array}{c|c c c}x & 1 & 2 & 7 \\ \hline f & 1 & 2 & 1\\ \end{array}$
8. Find the mean, the median, and the mode for the sample data represented by the table $\begin{array}{c|c c c c}x & -1 & 0 & 1 & 4 \\ \hline f & 1 & 1 & 3 & 1\\ \end{array}$
9. Create a sample data set of size $$n=3$$ for which the mean $$\bar{x}$$ is greater than the median $$\tilde{x}$$.
10. Create a sample data set of size $$n=3$$ for which the mean $$\bar{x}$$ is less than the median $$\tilde{x}$$.
11. Create a sample data set of size $$n=4$$ for which the mean $$\bar{x}$$, the median $$\tilde{x}$$, and the mode are all identical.
12. Create a sample data set of size $$n=4$$ for which the median $$\tilde{x}$$ and the mode are identical but the mean $$\bar{x}$$ is different.

### Applications

1. Find the mean and the median for the LDL cholesterol level in a sample of ten heart patients. $\begin{matrix} 132 & 162 & 133 & 145 & 148\\ 139 & 147 & 160 & 150 & 153 \end{matrix}$
2. Find the mean and the median, for the LDL cholesterol level in a sample of ten heart patients on a special diet. $\begin{matrix} 127 & 152 & 138 & 110 & 152\\ 113 & 131 & 148 & 135 & 158 \end{matrix}$
3. Find the mean, the median, and the mode for the number of vehicles owned in a survey of $$52$$ households. $\begin{array}{c|c c c c c c c c} x & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7\\ \hline f &2 &12 &15 &11 &6 &3 &1 &2\\ \end{array}$
4. The number of passengers in each of $$120$$ randomly observed vehicles during morning rush hour was recorded, with the following results. $\begin{array}{c|c c c c c } x & 1 & 2 & 3 & 4 & 5\\ \hline f &84 &29 &3 &3 &1\\ \end{array}$Find the mean, the median, and the mode of this data set.
5. Twenty-five $$1-lb$$ boxes of $$16d$$ nails were randomly selected and the number of nails in each box was counted, with the following results. $\begin{array}{c|c c c c c } x & 47 & 48 & 49 & 50 & 51\\ \hline f &1 &3 &18 &2 &1\\ \end{array}$Find the mean, the median, and the mode of this data set.

1. Five laboratory mice with thymus leukemia are observed for a predetermined period of $$500$$ days. After $$500$$ days, four mice have died but the fifth one survives. The recorded survival times for the five mice are $\begin{matrix} 493 & 421 & 222 & 378 & 500^* \end{matrix}$where $$500^*$$ indicates that the fifth mouse survived for at least $$500$$ days but the survival time (i.e., the exact value of the observation) is unknown.
1. Can you find the sample mean for the data set? If so, find it. If not, why not?
2. Can you find the sample median for the data set? If so, find it. If not, why not?
2. Five laboratory mice with thymus leukemia are observed for a predetermined period of $$500$$ days. After $$450$$ days, three mice have died, and one of the remaining mice is sacrificed for analysis. By the end of the observational period, the last remaining mouse still survives. The recorded survival times for the five mice are $\begin{matrix} 222 & 421 & 378 & 450^* & 500^* \end{matrix}$where $$^*$$ indicates that the mouse survived for at least the given number of days but the exact value of the observation is unknown.
1. Can you find the sample mean for the data set? If so, find it. If not, explain why not.
2. Can you find the sample median for the data set? If so, find it. If not, explain why not.
3. A player keeps track of all the rolls of a pair of dice when playing a board game and obtains the following data. $\begin{array}{c|c c c c c c } x & 2 & 3 & 4 & 5 & 6 & 7\\ \hline f &10 &29 &40 &56 &68 &77 \\ \end{array}$ $\begin{array}{c|c c c c c } x & 8 & 9 & 10 & 11 & 12 \\ \hline f &67 &55 &39 &28 &11 \\ \end{array}$Find the mean, the median, and the mode.
4. Cordelia records her daily commute time to work each day, to the nearest minute, for two months, and obtains the following data. $\begin{array}{c|c c c c c c c } x & 26 & 27 & 28 & 29 & 30 & 31 & 32\\ \hline f &3 &4 &16 &12 &6 &2 &1 \\ \end{array}$
1. Based on the frequencies, do you expect the mean and the median to be about the same or markedly different, and why?
2. Compute the mean, the median, and the mode.
5. An ordered stem and leaf diagram gives the scores of $$71$$ students on an exam. $\begin{array}{c|c c c c c c c c c c c c c c c c c c } 10 & 0 & 0 \\ 9 &1 &1 &1 &1 &2 &3\\ 8 &0 &1 &1 &2 &2 &3 &4 &5 &7 &8 &8 &9\\ 7 &0 &0 &0 &1 &1 &2 &4 &4 &5 &6 &6 &6 &7 &7 &7 &8 &8 &9\\ 6 &0 &1 &2 &2 &2 &3 &4 &4 &5 &7 &7 &7 &7 &8 &8\\ 5 &0 &2 &3 &3 &4 &4 &6 &7 &7 &8 &9\\ 4 &2 &5 &6 &8 &8\\ 3 &9 &9 \end{array}$
1. Based on the shape of the display, do you expect the mean and the median to be about the same or markedly different, and why?
2. Compute the mean, the median, and the mode.
6. A man tosses a coin repeatedly until it lands heads and records the number of tosses required. (For example, if it lands heads on the first toss he records a $$1$$; if it lands tails on the first two tosses and heads on the third he records a $$3$$.) The data are shown. $\begin{array}{c|c c c c c c c c c c } x & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ \hline f &384 &208 &98 &56 &28 &12 &8 &2 &3 &1 \end{array}$
1. Find the mean of the data.
2. Find the median of the data.
1. Construct a data set consisting of ten numbers, all but one of which is above average, where the average is the mean.
2. Is it possible to construct a data set as in part (a) when the average is the median? Explain.
7. Show that no matter what kind of average is used (mean, median, or mode) it is impossible for all members of a data set to be above average.
1. Twenty sacks of grain weigh a total of $$1,003\; lb$$. What is the mean weight per sack?
2. Can the median weight per sack be calculated based on the information given? If not, construct two data sets with the same total but different medians.
8. Begin with the following set of data, call it $$\text{Data Set I}$$. $\begin{matrix} 5 & -2 & 6 & 14 & -3 & 0 & 1 & 4 & 3 & 2 & 5 \end{matrix}$
1. Compute the mean, median, and mode.
2. Form a new data set, $$\text{Data Set II}$$, by adding $$3$$ to each number in $$\text{Data Set I}$$. Calculate the mean, median, and mode of $$\text{Data Set II}$$.
3. Form a new data set, $$\text{Data Set III}$$, by subtracting $$6$$ from each number in $$\text{Data Set I}$$. Calculate the mean, median, and mode of $$\text{Data Set III}$$.
4. Comparing the answers to parts (a), (b), and (c), can you guess the pattern? State the general principle that you expect to be true.

### Large Data Set Exercises

Note: For Large Data Set Exercises below, all of the data sets associated with these questions are missing, but the questions themselves are included here for reference.

1. Large $$\text{Data Set 1}$$ lists the SAT scores and GPAs of $$1,000$$ students.
1. Compute the mean and median of the $$1,000$$ SAT scores.
2. Compute the mean and median of the $$1,000$$ GPAs.
2. Large $$\text{Data Set 1}$$ lists the SAT scores of $$1,000$$ students.
1. Regard the data as arising from a census of all students at a high school, in which the SAT score of every student was measured. Compute the population mean $$\mu$$.
2. Regard the first $$25$$ observations as a random sample drawn from this population. Compute the sample mean