2.E: Descriptive Statistics (Exercises)

Last updated
Save as PDF

Page ID: 1095

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $

$ \newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$

( \newcommand{\kernel}{\mathrm{null}\,}\) $ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$ \newcommand{\Span}{\mathrm{span}}$

$ \newcommand{\id}{\mathrm{id}}$

$ \newcommand{\Span}{\mathrm{span}}$

$ \newcommand{\kernel}{\mathrm{null}\,}$

$ \newcommand{\range}{\mathrm{range}\,}$

$ \newcommand{\RealPart}{\mathrm{Re}}$

$ \newcommand{\ImaginaryPart}{\mathrm{Im}}$

$ \newcommand{\Argument}{\mathrm{Arg}}$

$ \newcommand{\norm}[1]{\| #1 \|}$

$ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$

$ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\AA}{\unicode[.8,0]{x212B}}$

$ \newcommand{\vectorA}[1]{\vec{#1}} % arrow$

$ \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow$

$ \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vectorC}[1]{\textbf{#1}} $

$ \newcommand{\vectorD}[1]{\overrightarrow{#1}} $

$ \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} $

$ \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} $

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $

$ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $

$\newcommand{\avec}{\mathbf a}$ $\newcommand{\bvec}{\mathbf b}$ $\newcommand{\cvec}{\mathbf c}$ $\newcommand{\dvec}{\mathbf d}$ $\newcommand{\dtil}{\widetilde{\mathbf d}}$ $\newcommand{\evec}{\mathbf e}$ $\newcommand{\fvec}{\mathbf f}$ $\newcommand{\nvec}{\mathbf n}$ $\newcommand{\pvec}{\mathbf p}$ $\newcommand{\qvec}{\mathbf q}$ $\newcommand{\svec}{\mathbf s}$ $\newcommand{\tvec}{\mathbf t}$ $\newcommand{\uvec}{\mathbf u}$ $\newcommand{\vvec}{\mathbf v}$ $\newcommand{\wvec}{\mathbf w}$ $\newcommand{\xvec}{\mathbf x}$ $\newcommand{\yvec}{\mathbf y}$ $\newcommand{\zvec}{\mathbf z}$ $\newcommand{\rvec}{\mathbf r}$ $\newcommand{\mvec}{\mathbf m}$ $\newcommand{\zerovec}{\mathbf 0}$ $\newcommand{\onevec}{\mathbf 1}$ $\newcommand{\real}{\mathbb R}$ $\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}$ $\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}$ $\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}$ $\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}$ $\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$ $\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}$ $\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$ $\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}$ $\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}$ $\newcommand{\laspan}[1]{\text{Span}\{#1\}}$ $\newcommand{\bcal}{\cal B}$ $\newcommand{\ccal}{\cal C}$ $\newcommand{\scal}{\cal S}$ $\newcommand{\wcal}{\cal W}$ $\newcommand{\ecal}{\cal E}$ $\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}$ $\newcommand{\gray}[1]{\color{gray}{#1}}$ $\newcommand{\lgray}[1]{\color{lightgray}{#1}}$ $\newcommand{\rank}{\operatorname{rank}}$ $\newcommand{\row}{\text{Row}}$ $\newcommand{\col}{\text{Col}}$ $\renewcommand{\row}{\text{Row}}$ $\newcommand{\nul}{\text{Nul}}$ $\newcommand{\var}{\text{Var}}$ $\newcommand{\corr}{\text{corr}}$ $\newcommand{\len}[1]{\left|#1\right|}$ $\newcommand{\bbar}{\overline{\bvec}}$ $\newcommand{\bhat}{\widehat{\bvec}}$ $\newcommand{\bperp}{\bvec^\perp}$ $\newcommand{\xhat}{\widehat{\xvec}}$ $\newcommand{\vhat}{\widehat{\vvec}}$ $\newcommand{\uhat}{\widehat{\uvec}}$ $\newcommand{\what}{\widehat{\wvec}}$ $\newcommand{\Sighat}{\widehat{\Sigma}}$ $\newcommand{\lt}{<}$ $\newcommand{\gt}{>}$ $\newcommand{\amp}{&}$ $\definecolor{fillinmathshade}{gray}{0.9}$

These are homework exercises to accompany the Textmap created for "Introductory Statistics" by Shafer and Zhang.

2.1: Three popular data displays

Basic

Describe one difference between a frequency histogram and a relative frequency histogram.
Describe one advantage of a stem and leaf diagram over a frequency histogram.
Construct a stem and leaf diagram, a frequency histogram, and a relative frequency histogram for the following data set. For the histograms use classes $51-60$, $61-70$, and so on. \[\begin{array}69 & 92 & 68 & 77 & 80 \\ 70 & 85 & 88 & 85 & 96 \\ 93 & 75 & 76 & 82 & 100 \\ 53 & 70 & 70 & 82 & 85\end{array}\]
Construct a stem and leaf diagram, a frequency histogram, and a relative frequency histogram for the following data set. For the histograms use classes $6.0-6.9$, $7.0-7.9$, and so on. \[\begin{array}8.5 & 8.2 & 7.0 & 7.0 & 4.9 \\ 6.5 & 8.2 & 7.6 & 1.5 & 9.3 \\ 9.6 & 8.5 & 8.8 & 8.5 & 8.7 \\ 8.0 & 7.7 & 2.9 & 9.2 & 6.9\end{array}\]
A data set contains $n = 10$ observations. The values $x$ and their frequencies $f$ are summarized in the following data frequency table. \[\begin{array}{c|cccc}x & -1 & 0 & 1 & 2 \\ \hline f & 3 & 4 & 2 & 1\end{array}\]Construct a frequency histogram and a relative frequency histogram for the data set.
A data set contains the $n=20$ observations The values $x$ and their frequencies $f$ are summarized in the following data frequency table. \[\begin{array}{c|ccc}x & -1 & 0 & 1 & 2 \\ \hline f & 3 & a & 2 & 1\end{array}\]The frequency of the value $0$ is missing. Find a and then sketch a frequency histogram and a relative frequency histogram for the data set.
A data set has the following frequency distribution table: \[\begin{array}{c|ccc}x & 1 & 2 & 3 & 4 \\ \hline f & 3 & a & 2 & 1\end{array}\]The number a is unknown. Can you construct a frequency histogram? If so, construct it. If not, say why not.
A table of some of the relative frequencies computed from a data set is \[\begin{array}{c|ccc}x & 1 & 2 & 3 & 4 \\ \hline f ∕ n & 0.3 & p & 0.2 & 0.1\end{array}\]The number $p$ is yet to be computed. Finish the table and construct the relative frequency histogram for the data set.

Applications

The IQ scores of ten students randomly selected from an elementary school are given. \[\begin{array}108 & 100 & 99 & 125 & 87 \\ 105 & 107 & 105 & 119 & 118\end{array}\]Grouping the measures in the $80s$, the $90s$, and so on, construct a stem and leaf diagram, a frequency histogram, and a relative frequency histogram.
The IQ scores of ten students randomly selected from an elementary school for academically gifted students are given. \[\begin{array}133 & 140 & 152 & 142 & 137 \\ 145 & 160 & 138 & 139 & 138\end{array}\]Grouping the measures by their common hundreds and tens digits, construct a stem and leaf diagram, a frequency histogram, and a relative frequency histogram.
During a one-day blood drive $300$ people donated blood at a mobile donation center. The blood types of these $300$ donors are summarized in the table. \[\begin{array}{c|ccc}Blood\: Type\hspace{0.167em} & O & A & B & AB \\ \hline Frequency & 136 & 120 & 32 & 12\end{array}\]Construct a relative frequency histogram for the data set.
In a particular kitchen appliance store an electric automatic rice cooker is a popular item. The weekly sales for the last $20$weeks are shown. \[\begin{array}20 & 15 & 14 & 14 & 18 \\ 15 & 17 & 16 & 16 & 18 \\ 15 & 19 & 12 & 13 & 9 \\ 19 & 15 & 15 & 16 & 15\end{array}\]Construct a relative frequency histogram with classes $6-10$, $11-15$, and $16-20$.

Additional Exercises

Random samples, each of size $n = 10$, were taken of the lengths in centimeters of three kinds of commercial fish, with the following results: \[\begin {array}{lrcccccccc} Sample \hspace{0.167em}1 : & 108 & 100 & 99 & 125 & 87 & 105 & 107 & 105 & 119 & 118 \\ Sample \hspace{0.167em} 2 : & 133 & 140 & 152 & 142 & 137 & 145 & 160 & 138 & 139 & 138 \\ Sample \hspace{0.167em} 3 : & 82 & 60 & 83 & 82 & 82 & 74 & 79 & 82 & 80 & 80\end{array}\]Grouping the measures by their common hundreds and tens digits, construct a stem and leaf diagram, a frequency histogram, and a relative frequency histogram for each of the samples. Compare the histograms and describe any patterns they exhibit.
During a one-day blood drive $300$ people donated blood at a mobile donation center. The blood types of these $300$ donors are summarized below. \[\begin{array}{c|ccc}Blood\: Type\hspace{0.167em} & O & A & B & AB \\ \hline Frequency & 136 & 120 & 32 & 12\end{array}\]Identify the blood type that has the highest relative frequency for these $300$ people. Can you conclude that the blood type you identified is also most common for all people in the population at large? Explain.
In a particular kitchen appliance store, the weekly sales of an electric automatic rice cooker for the last $20$ weeks are as follows. \[\begin{array}20 & 15 & 14 & 14 & 18 \\ 15 & 17 & 16 & 16 & 18 \\ 15 & 19 & 12 & 13 & 9 \\ 19 & 15 & 15 & 16 & 15\end{array}\]In retail sales, too large an inventory ties up capital, while too small an inventory costs lost sales and customer satisfaction. Using the relative frequency histogram for these data, find approximately how many rice cookers must be in stock at the beginning of each week if
1. the store is not to run out of stock by the end of a week for more than $15\%$ of the weeks; and
2. the store is not to run out of stock by the end of a week for more than $5\%$ of the weeks.
In retail sales, too large an inventory ties up capital, while too small an inventory costs lost sales and customer satisfaction. Using the relative frequency histogram for these data, find approximately how many rice cookers must be in stock at the beginning of each week if the store is not to run out of stock by the end of a week for more than $15\%$ of the weeks; and the store is not to run out of stock by the end of a week for more than $5\%$ of the weeks.

Answers

The vertical scale on one is the frequencies and on the other is the relative frequencies.
\[\begin{array}{r|cccccc}5 & 3 & & & & & & \\ 6 & 8 & 9 & & & & & \\ 7 & 0 & 0 & 0 & 5 & 6 & 7 & \\ 8 & 0 & 2 & 3 & 5 & 5 & 5 & 8 \\ 9 & 2 & 3 & 6 & & & & \\ 10 & 0 & & & & & &\end{array}\]
Noting that $n = 10$ the relative frequency table is: \[\begin{array}{c|cccc}x & -1 & 0 & 1 & 2 \\ \hline f ∕ n & 0.3 & 0.4 & 0.2 & 0.1\end{array}\]
Since $n$ is unknown, $a$ is unknown, so the histogram cannot be constructed.
\[\begin{array}{r|cccc}8 & 7 & & & & \\ 9 & 9 & & & & \\ 10 & 0 & 5 & 5 & 7 & 8 \\ 11 & 8 & 9 & & \\ 12 & 5 & & & &\end{array}\] Frequency and relative frequency histograms are similarly generated.
Noting $n = 300$, the relative frequency table is therefore: \[\begin{array}{c|cccc}Blood\hspace{0.167em}Type & O & A & B & AB \\ \hline f ∕ n & 0.4533 & 0.4 & 0.1067 & 0.04\end{array}\] A relative frequency histogram is then generated.
The stem and leaf diagrams listed for Samples $1,\, 2,\; \text{and}\; 3$ in that order: \[\begin{array}{c|ccccc}6 & & & & & \\ 7 & & & & & \\ 8 & 7 & & & & \\ 9 & 9 & & & & \\ 10 & 0 & 5 & 5 & 7 & 8 \\ 11 & 8 & 9 & & & \\ 12 & 5 & & & & \\ 13 & & & & & \\ 14 & & & & & \\ 15 & & & & & \\ 16 & & & & &\end{array}\]

\[\begin{array}{c|ccccc}6 & & & & & \\ 7 & & & & & \\ 8 & & & & & \\ 9 & & & & & \\ 10 & & & & & \\ 11 & & & & & \\ 12 & & & & & \\ 13 & 3 & 7 & 8 & 8 & 9 \\ 14 & 0 & 2 & 5 & & \\ 15 & 2 & & & & \\ 16 & 0 & & & &\end{array}\]

\[\begin{array}{c|ccccccc}6 & 0 & & & & \\ 7 & 4 & 9 & & & \\ 8 & 0 & 0 & 2 & 2 & 2 & 2 & 3 \\ 9 & & & & & \\ 10 & & & & & \\ 11 & & & & & \\ 12 & & & & & \\ 13 & & & & & \\ 14 & & & & & \\ 15 & & & & & \\ 16 & & & & &\end{array}\]

The frequency tables are given below in the same order:

\[\begin{array}{c|ccc}Length\hspace{0.167em} & 80 \sim 89 & 90 \sim 99 & 100 \sim 109 \\ \hline f & 1 & 1 & 5\end{array}\]

\[\begin{array}{c|cc}Length\hspace{0.167em} & 110 \sim 119 & 120 \sim 129 \\ \hline f & 2 & 1\end{array}\]

\[\begin{array}{c|ccc}Length\hspace{0.167em} & 130 \sim 139 & 140 \sim 149 & 150 \sim 159 \\ \hline f & 5 & 3 & 1\end{array}\]

\[\begin{array}{c|ccc}Length\hspace{0.167em} & 160 \sim 169 \\ \hline f & 1\end{array}\]

\[\begin{array}{c|ccc}Length\hspace{0.167em} & 60 \sim 69 & 70 \sim 79 & 80 \sim 89 \\ \hline f & 1 & 2 & 7\end{array}\]

The relative frequency tables are also given below in the same order:

\[\begin{array}{c|ccc}Length\hspace{0.167em} & 80 \sim 89 & 90 \sim 99 & 100 \sim 109 \\ \hline f ∕ n & 0.1 & 0.1 & 0.5\end{array}\] \[\begin{array}{c|cc}Length\hspace{0.167em} & 110 \sim 119 & 120 \sim 129 \\ \hline f ∕ n & 0.2 & 0.1\end{array}\] \[\begin{array}{c|ccc}Length\hspace{0.167em} & 130 \sim 139 & 140 \sim 149 & 150 \sim 159 \\ \hline f ∕ n & 0.5 & 0.3 & 0.1\end{array}\] \[\begin{array}{c|c}Length\hspace{0.167em} & 160 \sim 169 \\ \hline f ∕ n & 0.1\end{array}\] \[\begin{array}{c|ccc}Length\hspace{0.167em} & 60 \sim 69 & 70 \sim 79 & 80 \sim 89 \\ \hline f ∕ n & 0.1 & 0.2 & 0.7\end{array}\]

1. 19
2. 20

2.2: Measures of Central Location

Basic

For the sample data set $\{1,2,6\}$ find
1. $\sum x$
2. $\sum x^2$
3. $\sum (x-3)$
4. $\sum (x-3)^2$
For the sample data set $\{-1,0,1,4\}$ find
1. $\sum x$
2. $\sum x^2$
3. $\sum (x-1)$
4. $\sum (x-1)^2$
Find the mean, the median, and the mode for the sample \[1\; 2\; 3\; 4\]
Find the mean, the median, and the mode for the sample \[3\; 3\; 4\; 4\]
Find the mean, the median, and the mode for the sample \[2\; 1\; 2\; 7\]
Find the mean, the median, and the mode for the sample \[-1\; 0\; 1\; 4\; 1\; 1\]
Find the mean, the median, and the mode for the sample data represented by the table \[\begin{array}{c|c c c}x & 1 & 2 & 7 \\ \hline f & 1 & 2 & 1\\ \end{array}\]
Find the mean, the median, and the mode for the sample data represented by the table \[\begin{array}{c|c c c c}x & -1 & 0 & 1 & 4 \\ \hline f & 1 & 1 & 3 & 1\\ \end{array}\]
Create a sample data set of size $n=3$ for which the mean $\bar{x}$ is greater than the median $\tilde{x}$.
Create a sample data set of size $n=3$ for which the mean $\bar{x}$ is less than the median $\tilde{x}$.
Create a sample data set of size $n=4$ for which the mean $\bar{x}$, the median $\tilde{x}$, and the mode are all identical.
Create a sample data set of size $n=4$ for which the median $\tilde{x}$ and the mode are identical but the mean $\bar{x}$ is different.

Applications

Find the mean and the median for the LDL cholesterol level in a sample of ten heart patients. \[\begin{matrix} 132 & 162 & 133 & 145 & 148\\ 139 & 147 & 160 & 150 & 153 \end{matrix}\]
Find the mean and the median, for the LDL cholesterol level in a sample of ten heart patients on a special diet. \[\begin{matrix} 127 & 152 & 138 & 110 & 152\\ 113 & 131 & 148 & 135 & 158 \end{matrix}\]
Find the mean, the median, and the mode for the number of vehicles owned in a survey of $52$ households. \[\begin{array}{c|c c c c c c c c} x & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7\\ \hline f &2 &12 &15 &11 &6 &3 &1 &2\\ \end{array}\]
The number of passengers in each of $120$ randomly observed vehicles during morning rush hour was recorded, with the following results. \[\begin{array}{c|c c c c c } x & 1 & 2 & 3 & 4 & 5\\ \hline f &84 &29 &3 &3 &1\\ \end{array}\]Find the mean, the median, and the mode of this data set.
Twenty-five $1-lb$ boxes of $16d$ nails were randomly selected and the number of nails in each box was counted, with the following results. \[\begin{array}{c|c c c c c } x & 47 & 48 & 49 & 50 & 51\\ \hline f &1 &3 &18 &2 &1\\ \end{array}\]Find the mean, the median, and the mode of this data set.

Additional Exercises

Five laboratory mice with thymus leukemia are observed for a predetermined period of $500$ days. After $500$ days, four mice have died but the fifth one survives. The recorded survival times for the five mice are \[\begin{matrix} 493 & 421 & 222 & 378 & 500^* \end{matrix}\]where $500^*$ indicates that the fifth mouse survived for at least $500$ days but the survival time (i.e., the exact value of the observation) is unknown.
1. Can you find the sample mean for the data set? If so, find it. If not, why not?
2. Can you find the sample median for the data set? If so, find it. If not, why not?
Five laboratory mice with thymus leukemia are observed for a predetermined period of $500$ days. After $450$ days, three mice have died, and one of the remaining mice is sacrificed for analysis. By the end of the observational period, the last remaining mouse still survives. The recorded survival times for the five mice are \[\begin{matrix} 222 & 421 & 378 & 450^* & 500^* \end{matrix}\]where $^*$ indicates that the mouse survived for at least the given number of days but the exact value of the observation is unknown.
1. Can you find the sample mean for the data set? If so, find it. If not, explain why not.
2. Can you find the sample median for the data set? If so, find it. If not, explain why not.
A player keeps track of all the rolls of a pair of dice when playing a board game and obtains the following data. \[\begin{array}{c|c c c c c c } x & 2 & 3 & 4 & 5 & 6 & 7\\ \hline f &10 &29 &40 &56 &68 &77 \\ \end{array}\] \[\begin{array}{c|c c c c c } x & 8 & 9 & 10 & 11 & 12 \\ \hline f &67 &55 &39 &28 &11 \\ \end{array}\]Find the mean, the median, and the mode.
Cordelia records her daily commute time to work each day, to the nearest minute, for two months, and obtains the following data. \[\begin{array}{c|c c c c c c c } x & 26 & 27 & 28 & 29 & 30 & 31 & 32\\ \hline f &3 &4 &16 &12 &6 &2 &1 \\ \end{array}\]
1. Based on the frequencies, do you expect the mean and the median to be about the same or markedly different, and why?
2. Compute the mean, the median, and the mode.
An ordered stem and leaf diagram gives the scores of $71$ students on an exam. \[\begin{array}{c|c c c c c c c c c c c c c c c c c c } 10 & 0 & 0 \\ 9 &1 &1 &1 &1 &2 &3\\ 8 &0 &1 &1 &2 &2 &3 &4 &5 &7 &8 &8 &9\\ 7 &0 &0 &0 &1 &1 &2 &4 &4 &5 &6 &6 &6 &7 &7 &7 &8 &8 &9\\ 6 &0 &1 &2 &2 &2 &3 &4 &4 &5 &7 &7 &7 &7 &8 &8\\ 5 &0 &2 &3 &3 &4 &4 &6 &7 &7 &8 &9\\ 4 &2 &5 &6 &8 &8\\ 3 &9 &9 \end{array}\]
1. Based on the shape of the display, do you expect the mean and the median to be about the same or markedly different, and why?
2. Compute the mean, the median, and the mode.
A man tosses a coin repeatedly until it lands heads and records the number of tosses required. (For example, if it lands heads on the first toss he records a $1$; if it lands tails on the first two tosses and heads on the third he records a $3$.) The data are shown. \[\begin{array}{c|c c c c c c c c c c } x & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 \\ \hline f &384 &208 &98 &56 &28 &12 &8 &2 &3 &1 \end{array}\]
1. Find the mean of the data.
2. Find the median of the data.
1. Construct a data set consisting of ten numbers, all but one of which is above average, where the average is the mean.
2. Is it possible to construct a data set as in part (a) when the average is the median? Explain.
Show that no matter what kind of average is used (mean, median, or mode) it is impossible for all members of a data set to be above average.
1. Twenty sacks of grain weigh a total of $1,003\; lb$. What is the mean weight per sack?
2. Can the median weight per sack be calculated based on the information given? If not, construct two data sets with the same total but different medians.
Begin with the following set of data, call it $\text{Data Set I}$. \[\begin{matrix} 5 & -2 & 6 & 14 & -3 & 0 & 1 & 4 & 3 & 2 & 5 \end{matrix}\]
1. Compute the mean, median, and mode.
2. Form a new data set, $\text{Data Set II}$, by adding $3$ to each number in $\text{Data Set I}$. Calculate the mean, median, and mode of $\text{Data Set II}$.
3. Form a new data set, $\text{Data Set III}$, by subtracting $6$ from each number in $\text{Data Set I}$. Calculate the mean, median, and mode of $\text{Data Set III}$.
4. Comparing the answers to parts (a), (b), and (c), can you guess the pattern? State the general principle that you expect to be true.

Large Data Set Exercises

Note: For Large Data Set Exercises below, all of the data sets associated with these questions are missing, but the questions themselves are included here for reference.

Large $\text{Data Set 1}$ lists the SAT scores and GPAs of $1,000$ students.
1. Compute the mean and median of the $1,000$ SAT scores.
2. Compute the mean and median of the $1,000$ GPAs.
Large $\text{Data Set 1}$ lists the SAT scores of $1,000$ students.
1. Regard the data as arising from a census of all students at a high school, in which the SAT score of every student was measured. Compute the population mean $\mu$.
2. Regard the first $25$ observations as a random sample drawn from this population. Compute the sample mean $\bar{x}$ $\bar{x}$ and compare it to $\mu$.
3. Regard the next $25$ observations as a random sample drawn from this population. Compute the sample mean $\bar{x}$ $\bar{x}$ and compare it to $\mu$.
Large $\text{Data Set 1}$ lists the GPAs of $1,000$ students.
1. Regard the data as arising from a census of all freshman at a small college at the end of their first academic year of college study, in which the GPA of every such person was measured. Compute the population mean $\mu$.
2. Regard the first $25$ observations as a random sample drawn from this population. Compute the sample mean $\bar{x}$ $\bar{x}$ and compare it to $\mu$.
3. Regard the next $25$ observations as a random sample drawn from this population. Compute the sample mean $\bar{x}$ $\bar{x}$ and compare it to $\mu$.
Large $\text{Data Sets}\: 7,\: 7A,\: \text{and}\: 7B$ list the survival times in days of $140$ laboratory mice with thymic leukemia from onset to death.
1. Compute the mean and median survival time for all mice, without regard to gender.
2. Compute the mean and median survival time for the $65$ male mice (separately recorded in Large $\text{Data Set 7A}$).
3. Compute the mean and median survival time for the $75$ female mice (separately recorded in Large $\text{Data Set 7B}$).

Answers

1. 9
2. 41
3. 0
4. 14
$\bar x= 2.5,\; \tilde{x} = 2.5,\; \text{mode} = \{1,2,3,4\}$
$\bar x= 3,\; \tilde{x} = 2,\; \text{mode} = 2$
$\bar x= 3,\; \tilde{x} = 2,\; \text{mode} = 2$
$\{0, 0, 3\}$
$\{0, 1, 1, 2\}$
$\bar x = 146.9,\; \tilde x = 147.5$
$\bar x=2.6 ,\; \tilde{x} = 2,\; \text{mode} = 2$
$\bar x= 48.96,\; \tilde{x} = 49,\; \text{mode} = 49$
1. No, the survival times of the fourth and fifth mice are unknown.
2. Yes, $\tilde{x}=421$.
$\bar x= 28.55,\; \tilde{x} = 28,\; \text{mode} = 28$
$\bar x= 2.05,\; \tilde{x} = 2,\; \text{mode} = 1$
Mean: $nx_{min}\leq \sum x$ so dividing by $n$ yields $x_{min} \leq \bar{x}$ $x_{min}\leq \bar{x}$, so the minimum value is not above average. Median: the middle measurement, or average of the two middle measurements, $\tilde{x}$, is at least as large as $x_{min}$, so the minimum value is not above average. Mode: the mode is one of the measurements, and is not greater than itself
1. $\bar x= 3.18,\; \tilde{x} = 3,\; \text{mode} = 5$
2. $\bar x= 6.18,\; \tilde{x} = 6,\; \text{mode} = 8$
3. $\bar x= -2.81,\; \tilde{x} = -3,\; \text{mode} = -1$
4. If a number is added to every measurement in a data set, then the mean, median, and mode all change by that number.
1. $\mu = 1528.74$
2. $\bar{x}=1502.8$
3. $\bar{x}=1532.2$
1. $\bar x= 553.4286,\; \tilde{x} = 552.5$
2. $\bar x= 665.9692,\; \tilde{x} = 667$
3. $\bar x= 455.8933,\; \tilde{x} = 448$

2.3 Measures of Variability

Basic

Find the range, the variance, and the standard deviation for the following sample.
\[1\; 2\; 3\; 4\]
Find the range, the variance, and the standard deviation for the following sample.
\[2\; -3\; 6\; 0\; 3\; 1\]
Find the range, the variance, and the standard deviation for the following sample.
\[2\; 1\; 2\; 7\]
Find the range, the variance, and the standard deviation for the following sample.
\[-1\; 0\; 1\; 4\; 1\; 1\]
Find the range, the variance, and the standard deviation for the sample represented by the data frequency table.
\[\begin{array}{c|c c c} x & 1 & 2 & 7 \\ \hline f &1 &2 &1\\ \end{array}\]
Find the range, the variance, and the standard deviation for the sample represented by the data frequency table.
\[\begin{array}{c|c c c c} x & -1 & 0 & 1 & 4 \\ \hline f &1 &1 &3 &1\\ \end{array}\]

Applications

Find the range, the variance, and the standard deviation for the sample of ten IQ scores randomly selected from a school for academically gifted students.
\[\begin{matrix} 132 & 162 & 133 & 145 & 148\\ 139 & 147 & 160 & 150 & 153 \end{matrix}\]
Find the range, the variance and the standard deviation for the sample of ten IQ scores randomly selected from a school for academically gifted students.
\[\begin{matrix} 142 & 152 & 138 & 145 & 148\\ 139 & 147 & 155 & 150 & 153 \end{matrix}\]

Additional Exercises

Consider the data set represented by the table \[\begin{array}{c|c c c c c c c} x & 26 & 27 & 28 & 29 & 30 & 31 & 32 \\ \hline f &3 &4 &16 &12 &6 &2 &1\\ \end{array}\]
1. Use the frequency table to find that $\sum x=1256$ and $\sum x^2=35,926$.
2. Use the information in part (a) to compute the sample mean and the sample standard deviation.
Find the sample standard deviation for the data
\[\begin{array}{c|c c c c c} x & 1 & 2 & 3 & 4 & 5 \\ \hline f &384 &208 &98 &56 &28 \\ \end{array}\]

\[\begin{array}{c|c c c c c} x & 6 & 7 & 8 & 9 & 10 \\ \hline f &12 &8 &2 &3 &1 \\ \end{array}\]
A random sample of $49$ invoices for repairs at an automotive body shop is taken. The data are arrayed in the stem and leaf diagram shown. (Stems are thousands of dollars, leaves are hundreds, so that for example the largest observation is $3,800$.)

\[\begin{array}{c|c c c c c c c c c c c} 3 & 5 & 6 & 8 \\ 3 &0 &0 &1 &1 &2 &4 \\ 2 &5 &6 &6 &7 &7 &8 &8 &9 &9 \\ 2 &0 &0 &0 &0 &1 &2 &2 &4 \\ 1 &5 &5 &5 &6 &6 &7 &7 &7 &8 &8 &9 \\ 1 &0 &0 &1 &3 &4 &4 &4 \\ 0 &5 &6 &8 &8 \\ 0 &4 \end{array}\]

For these data, $\sum x=101$, $\sum x^2=244,830,000$.
1. Compute the mean, median, and mode.
2. Compute the range.
3. Compute the sample standard deviation.
What must be true of a data set if its standard deviation is $0$?
A data set consisting of $25$ measurements has standard deviation $0$. One of the measurements has value $17$. What are the other $24$ measurements?
Create a sample data set of size $n=3$ for which the range is $0$ and the sample mean is $2$.
Create a sample data set of size $n=3$ for which the sample variance is $0$ and the sample mean is $1$.
The sample $\{-1,0,1\}$ has mean $\bar{x}=0$ and standard deviation $\bar{x}=0$. Create a sample data set of size $n=3$ for which $\bar{x}=0$ and $s$ is greater than $1$.
The sample $\{-1,0,1\}$ has mean $\bar{x}=0$ and standard deviation $\bar{x}=0$. Create a sample data set of size $n=3$ for which $\bar{x}=0$ and the standard deviation $s$ is less than $1$.
Begin with the following set of data, call it $\text{Data Set I}$.
\[5\; -2\; 6\; 1\; 4\; -3\; 0\; 1\; 4\; 3\; 2\; 5\]
1. Compute the sample standard deviation of $\text{Data Set I}$.
2. Form a new data set, $\text{Data Set II}$, by adding $3$ to each number in $\text{Data Set I}$. Calculate the sample standard deviation of $\text{Data Set II}$.
3. Form a new data set, $\text{Data Set III}$, by subtracting $6$ from each number in $\text{Data Set I}$. Calculate the sample standard deviation of $\text{Data Set III}$.
4. Comparing the answers to parts (a), (b), and (c), can you guess the pattern? State the general principle that you expect to be true.

Large Data Set Exercises

Note: For Large Data Set Exercises below, all of the data sets associated with these questions are missing, but the questions themselves are included here for reference.

$\text{Large Data Set 1}$ lists the SAT scores and GPAs of $1,000$ students.
1. Compute the range and sample standard deviation of the $1,000$ SAT scores.
2. Compute the range and sample standard deviation of the $1,000$ GPAs.
$\text{Large Data Set 1}$ lists the SAT scores of $1,000$ students.
1. Regard the data as arising from a census of all students at a high school, in which the SAT score of every student was measured. Compute the population range and population standard deviation $\sigma$.
2. Regard the first $25$ observations as a random sample drawn from this population. Compute the sample range and sample standard deviation $s$ and compare them to the population range and $\sigma$.
3. Regard the next $25$ observations as a random sample drawn from this population. Compute the sample range and sample standard deviation $s$ and compare them to the population range and $\sigma$.
$\text{Large Data Set 1}$ lists the GPAs of $1,000$ students.
1. Regard the data as arising from a census of all freshman at a small college at the end of their first academic year of college study, in which the GPA of every such person was measured. Compute the population range and population standard deviation $\sigma$.
2. Regard the first $25$ observations as a random sample drawn from this population. Compute the sample range and sample standard deviation $s$ and compare them to the population range and $\sigma$.
3. Regard the next $25$ observations as a random sample drawn from this population. Compute the sample range and sample standard deviation $s$ and compare them to the population range and $\sigma$.
$\text{Large Data Set 7, 7A, and 7B }$ list the survival times in days of $140$ laboratory mice with thymic leukemia from onset to death.
1. Compute the range and sample standard deviation of survival time for all mice, without regard to gender.
2. Compute the range and sample standard deviation of survival time for the $65$ male mice (separately recorded in $\text{Large Data Set 7A}$).
3. Compute the range and sample standard deviation of survival time for the $75$ female mice (separately recorded in $\text{Large Data Set 7B}$). Do you see a difference in the results for male and female mice? Does it appear to be significant?

Answers

$R = 3,\; s^2 = 1.7,\; s = 1.3$.
$R = 6,\; s^2=7.\bar{3},\; s = 2.7$.
$R = 6,\; s^2=7.3,\; s = 2.7$.

$R = 30,\; s^2 = 103.2,\; s = 10.2$.

$\bar{x}=28.55,\; s = 1.3$.
1. $\bar{x}=2063,\; \tilde{x} =2000,\; \text{mode}=2000$.
2. $R = 3400$.
3. $s = 869$.
All are $17$.
$\{1,1,1\}$
One example is $\{-.5,0,.5\}$.

1. $R = 1350$ and $s = 212.5455$
2. $R = 4.00$ and $s = 0.7407$
1. $R = 4.00$ and $\sigma = 0.740375$
2. $R = 3.04$ and $s = 0.808045$
3. $R = 2.49$ and $s = 0.657843$

2.4 Relative Position of Data

Basic

Consider the data set \[\begin{matrix} 69 & 92 & 68 & 77 & 80\\ 93 & 75 & 76 & 82 & 100\\ 70 & 85 & 88 & 85 & 96\\ 53 & 70 & 70 & 82 & 85 \end{matrix}\]
1. Find the percentile rank of $82$.
2. Find the percentile rank of $68$.
Consider the data set \[\begin{matrix} 8.5 & 8.2 & 7.0 & 7.0 & 4.9\\ 9.6 & 8.5 & 8.8 & 8.5 & 8.7\\ 6.5 & 8.2 & 7.6 & 1.5 & 9.3\\ 8.0 & 7.7 & 2.9 & 9.2 & 6.9 \end{matrix}\]
1. Find the percentile rank of $6.5$.
2. Find the percentile rank of $7.7$.
Consider the data set represented by the ordered stem and leaf diagram \[\begin{array}{c|c c c c c c c c c c c c c c c c c c} 10 & 0 & 0 \\ 9 &1 &1 &1 &1 &2 &3\\ 8 &0 &1 &1 &2 &2 &3 &4 &5 &7 &8 &8 &9\\ 7 &0 &0 &0 &1 &1 &2 &4 &4 &5 &6 &6 &6 &7 &7 &7 &8 &8 &9\\ 6 &0 &1 &2 &2 &2 &3 &4 &4 &5 &7 &7 &7 &7 &8 &8\\ 5 &0 &2 &3 &3 &4 &4 &6 &7 &7 &8 &9\\ 4 &2 &5 &6 &8 &8\\ 3 &9 &9 \end{array}\]
1. Find the percentile rank of the grade $75$.
2. Find the percentile rank of the grade $57$.
Is the $90^{th}$ percentile of a data set always equal to $90\%$? Why or why not?
The $29^{th}$ percentile in a large data set is $5$.
1. Approximately what percentage of the observations are less than $5$?
2. Approximately what percentage of the observations are greater than $5$?
The $54^{th}$ percentile in a large data set is $98.6$.
1. Approximately what percentage of the observations are less than $98.6$?
2. Approximately what percentage of the observations are greater than $98.6$?
In a large data set the $29^{th}$ percentile is $5$ and the $79^{th}$ percentile is $10$. Approximately what percentage of observations lie between $5$ and $10$?
In a large data set the $40^{th}$ percentile is $125$ and the $82^{nd}$ percentile is $158$. Approximately what percentage of observations lie between $125$ and $158$?
Find the five-number summary and the IQR and sketch the box plot for the sample represented by the stem and leaf diagram in Figure 2.1.2 "Ordered Stem and Leaf Diagram".
Find the five-number summary and the IQR and sketch the box plot for the sample explicitly displayed in "Example 2.2.7" in Section 2.2.
Find the five-number summary and the IQR and sketch the box plot for the sample represented by the data frequency table \[\begin{array}{c|c c c c c} x & 1 & 2 & 5 & 8 & 9 \\ \hline f &5 &2 &3 &6 &4\\ \end{array}\]
Find the five-number summary and the IQR and sketch the box plot for the sample represented by the data frequency table \[\begin{array}{c|c c c c c c c c c} x & -5 & -3 & -2 & -1 & 0 & 1 & 3 & 4 & 5 \\ \hline f &2 &1 &3 &2 &4 &1 &1 &2 &1\\ \end{array}\]
Find the $z$-score of each measurement in the following sample data set. \[-5\; \; 6\; \; 2\; \; -1\; \; 0\]
Find the $z$-score of each measurement in the following sample data set. \[1.6\; \; 5.2\; \; 2.8\; \; 3.7\; \; 4.0\]
The sample with data frequency table \[\begin{array}{c|c c c} x & 1 & 2 & 7 \\ \hline f &1 &2 &1\\ \end{array}\] has mean $\bar{x}=3$ and standard deviation $s\approx 2.71$. Find the $z$-score for every value in the sample.
The sample with data frequency table \[\begin{array}{c|c c c c} x & -1 & 0 & 1 & 4 \\ \hline f &1 &1 &3 &1\\ \end{array}\] has mean $\bar{x}=1$ and standard deviation $s\approx 1.67$. Find the $z$-score for every value in the sample.
For the population \[0\; \; 0\; \; 2\; \; 2\]compute each of the following.
1. The population mean $\mu$.
2. The population variance $\sigma ^2$.
3. The population standard deviation $\sigma $.
4. The $z$-score for every value in the population data set.
For the population \[0.5\; \; 2.1\; \; 4.4\; \; 1.0\]compute each of the following.
1. The population mean $\mu$.
2. The population variance $\sigma ^2$.
3. The population standard deviation $\sigma $.
4. The $z$-score for every value in the population data set.
A measurement $x$ in a sample with mean $\bar{x}=10$ and standard deviation $s=3$ has $z$-score $z=2$. Find $x$.
A measurement $x$ in a sample with mean $\bar{x}=10$ and standard deviation $s=3$ has $z$-score $z=-1$. Find $x$.
A measurement $x$ in a population with mean $\mu =2.3$ and standard deviation $\sigma =1.3$ has $z$-score $z=2$. Find $x$.
A measurement $x$ in a sample with mean $\mu =2.3$ and standard deviation $\sigma =1.3$ has $z$-score $z=-1.2$. Find $x$.

Applications

The weekly sales for the last $20$ weeks in a kitchen appliance store for an electric automatic rice cooker are \[\begin{matrix} 20 & 15 & 14 & 14 & 18\\ 15 & 19 & 12 & 13 & 9\\ 15 & 17 & 16 & 16 & 18\\ 19 & 15 & 15 & 16 & 15 \end{matrix}\]
1. Find the percentile rank of $15$.
2. If the sample accurately reflects the population, then what percentage of weeks would an inventory of $15$ rice cookers be adequate?
The table shows the number of vehicles owned in a survey of 52 households. \[\begin{array}{c|c c c c c c c c} x & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 \\ \hline f &2 &12 &15 &11 &6 &3 &1 &2\\ \end{array}\]
1. Find the percentile rank of $2$.
2. If the sample accurately reflects the population, then what percentage of households have at most two vehicles?
For two months Cordelia records her daily commute time to work each day to the nearest minute and obtains the following data: \[\begin{array}{c|c c c c c c c} x & 26 & 27 & 28 & 29 & 30 & 31 & 32 \\ \hline f &3 &4 &16 &12 &6 &2 &1 \\ \end{array}\]Cordelia is supposed to be at work at $8:00\; a.m$. but refuses to leave her house before $7:30\; a.m$.
1. Find the percentile rank of $30$, the time she has to get to work.
2. Assuming that the sample accurately reflects the population of all of Cordelia’s commute times, use your answer to part (a) to predict the proportion of the work days she is late for work.
The mean score on a standardized grammar exam is $49.6$; the standard deviation is $1.35$. Dromio is told that the $z$-score of his exam score is $-1.19$.
1. Is Dromio’s score above average or below average?
2. What was Dromio’s actual score on the exam?
A random sample of $49$ invoices for repairs at an automotive body shop is taken. The data are arrayed in the stem and leaf diagram shown. (Stems are thousands of dollars, leaves are hundreds, so that for example the largest observation is $3,800$.) \[\begin{array}{c|c c c c c c c c c c c} 3 & 5 & 6 & 8 \\ 3 &0 &0 &1 &1 &2 &4 \\ 2 &5 &6 &6 &7 &7 &8 &8 &9 &9 \\ 2 &0 &0 &0 &0 &1 &2 &2 &4 \\ 1 &5 &5 &5 &6 &6 &7 &7 &7 &8 &8 &9 \\ 1 &0 &0 &1 &3 &4 &4 &4 \\ 0 &5 &6 &8 &8 \\ 0 &4 \end{array}\]For these data, $\sum x=101,100$, $\sum x^2=244,830,000$.
1. Find the $z$-score of the repair that cost $\$1,100$.
2. Find the $z$-score of the repairs that cost $\$2,700$.
The stem and leaf diagram shows the time in seconds that callers to a telephone-order center were on hold before their call was taken. \[\begin{array}{c|c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c c} 0 &0 &0 &0 &0 &0 &0 &1 &1 &1 &1 &1 &1 &1 &1 &2 &2 &2 &2 &2 &3 &3 &3 &3 &3 &3 &3 &4 &4 &4 &4 &4 \\ 0 &5 &5 &5 &5 &5 &5 &5 &5 &5 &6 &6 &6 &6 &6 &6 &6 &6 &6 &6 &7 &7 &7 &7 &7 &7 &8 &8 &8 &9 &9 \\ 1 &0 &0 &1 &1 &1 &1 &2 &2 &2 &2 &4 &4 \\ 1 &5 &6 &6 &8 &9 \\ 2 &2 &4 \\ 2 &5 \\ 3 &0 \\ \end{array}\]
1. Find the quartiles.
2. Give the five-number summary of the data.
3. Find the range and the IQR.

Additional Exercises

Consider the data set represented by the ordered stem and leaf diagram \[\begin{array}{c|c c c c c c c c c c c c c c c c c c} 10 &0 &0 \\ 9 &1 &1 &1 &1 &2 &3\\ 8 &0 &1 &1 &2 &2 &3 &4 &5 &7 &8 &8 &9\\ 7 &0 &0 &0 &1 &1 &2 &4 &4 &5 &6 &6 &6 &7 &7 &7 &8 &8 &9\\ 6 &0 &1 &2 &2 &2 &3 &4 &4 &5 &7 &7 &7 &7 &8 &8\\ 5 &0 &2 &3 &3 &4 &4 &6 &7 &7 &8 &9\\ 4 &2 &5 &6 &8 &8\\ 3 &9 &9 \end{array}\]
1. Find the three quartiles.
2. Give the five-number summary of the data.
3. Find the range and the IQR.
For the following stem and leaf diagram the units on the stems are thousands and the units on the leaves are hundreds, so that for example the largest observation is $3,800$. \[\begin{array}{c|c c c c c c c c c c c} 3 &5 &6 &8 \\ 3 &0 &0 &1 &1 &2 &4\\ 2 &5 &6 &6 &7 &7 &8 &8 &9 &9 \\ 2 &0 &0 &0 &0 &1 &2 &2 &4 \\ 1 &5 &5 &5 &6 &6 &7 &7 &7 &8 &8 &9 \\ 1 &0 &0 &1 &3 &4 &4 &4 \\ 0 &5 &6 &8 &8\\ 0 &4 \end{array}\]
1. Find the percentile rank of $800$.
2. Find the percentile rank of $3,200$.
Find the five-number summary for the following sample data. \[\begin{array}{c|c c c c c c c} x &26 &27 &28 &29 &30 &31 &32 \\ \hline f &3 &4 &16 &12 &6 &2 &1\\ \end{array}\]
Find the five-number summary for the following sample data. \[\begin{array}{c|c c c c c c c c c c} x &1 &2 &3 &4 &5 &6 &7 &8 &9 &10 \\ \hline f &384 &208 &98 &56 &28 &12 &8 &2 &3 &1\\ \end{array}\]
For the following stem and leaf diagram the units on the stems are thousands and the units on the leaves are hundreds, so that for example the largest observation is $3,800$. \[\begin{array}{c|c c c c c c c c c c c} 3 &5 &6 &8 \\ 3 &0 &0 &1 &1 &2 &4\\ 2 &5 &6 &6 &7 &7 &8 &8 &9 &9 \\ 2 &0 &0 &0 &0 &1 &2 &2 &4 \\ 1 &5 &5 &5 &6 &6 &7 &7 &7 &8 &8 &9 \\ 1 &0 &0 &1 &3 &4 &4 &4 \\ 0 &5 &6 &8 &8\\ 0 &4 \end{array}\]
1. Find the three quartiles.
2. Find the IQR.
3. Give the five-number summary of the data.
Determine whether the following statement is true. “In any data set, if an observation $x_1$ is greater than another observation $x_2$, then the $z$-score of $x_1$ is greater than the $z$-score of $x_2$.”
Emilia and Ferdinand took the same freshman chemistry course, Emilia in the fall, Ferdinand in the spring. Emilia made an $83$ on the common final exam that she took, on which the mean was $76$ and the standard deviation $8$. Ferdinand made a $79$ on the common final exam that he took, which was more difficult, since the mean was $65$ and the standard deviation $12$. The one who has a higher $z$-score did relatively better. Was it Emilia or Ferdinand?
Refer to the previous exercise. On the final exam in the same course the following semester, the mean is $68$ and the standard deviation is $9$. What grade on the exam matches Emilia’s performance? Ferdinand’s?
Rosencrantz and Guildenstern are on a weight-reducing diet. Rosencrantz, who weighs $178\; lb$, belongs to an age and body-type group for which the mean weight is $145\; lb$ and the standard deviation is $15\; lb$. Guildenstern, who weighs $204\; lb$, belongs to an age and body-type group for which the mean weight is $165\; lb$ and the standard deviation is $20\; lb$. Assuming z-scores are good measures for comparison in this context, who is more overweight for his age and body type?

Large Data Set Exercises

Note: For Large Data Set Exercises below, all of the data sets associated with these questions are missing, but the questions themselves are included here for reference.

Large $\text{Data Set 1}$ lists the SAT scores and GPAs of $1,000$ students.
1. Compute the three quartiles and the interquartile range of the $1,000$ SAT scores.
2. Compute the three quartiles and the interquartile range of the $1,000$ GPAs.
Large $\text{Data Set 10}$ records the scores of $72$ students on a statistics exam.
1. Compute the five-number summary of the data.
2. Describe in words the performance of the class on the exam in the light of the result in part (a).
Large $\text{Data Sets 3 and 3A}$ list the heights of $174$ customers entering a shoe store.
1. Compute the five-number summary of the heights, without regard to gender.
2. Compute the five-number summary of the heights of the men in the sample.
3. Compute the five-number summary of the heights of the women in the sample.
Large $\text{Data Sets 7, 7A, and 3B}$ list the survival times in days of $140$ laboratory mice with thymic leukemia from onset to death.
1. Compute the three quartiles and the interquartile range of the survival times for all mice, without regard to gender.
2. Compute the three quartiles and the interquartile range of the survival times for the $65$ male mice (separately recorded in $\text{Data Set 7A}$).
3. Compute the three quartiles and the interquartile range of the survival times for the $75$ female mice (separately recorded in $\text{Data Sets 7B}$).

Answer

1. 60
2. 10
1. 59
2. 23
1. 29
2. 71
$50\%$
$x_{min}=25,\; \; Q_1=70,\; \; Q_2=77.5\; \; Q_3=90,\; \; x_{max}=100, \; \; IQR=20$
$x_{min}=1,\; \; Q_1=1.5,\; \; Q_2=6.5\; \; Q_3=8,\; \; x_{max}=9, \; \; IQR=6.5$
$-1.3,\; 1.39,\; 0.4,\; -0.35,\; -0.11$
$z=-0.74\; \text{for}\; x = 1,\; z=-0.37\; \text{for}\; x = 2,\; z = 1.48\; \text{for}\; x = 7$
1. 1
2. 1
3. 1
4. $z=-1\; \text{for}\; x = 0,\; z=1\; \text{for}\; x = 2$
16
4.9
1. 55
2. 55
1. 93
2. 0.07
1. -1.11
2. 0.73
1. $Q_1=59,\; Q_2=70,\; Q_3=81$
2. $x_{min}=39,\; Q_1=59,\; Q_2=70,\; Q_3=81,\; x_{max}=100$
3. $R = 61,\; IQR=22$
$x_{min}=26,\; Q_1=28,\; Q_2=28,\; Q_3=29,\; x_{max}=32$
1. $Q_1=1450,\; Q_2=2000,\; Q_3=2800$
2. $IQR=1350$
3. $x_{min}=400,\; Q_1=1450,\; Q_2=2000,\; Q_3=2800,\; x_{max}=3800$
Emilia: $z=0.875$, Ferdinand: $z=1.1\bar{6}$
Rosencrantz: $z=2.2$, Guildenstern: $z=1.95$. Rosencrantz is more overweight for his age and body type.
1. $x_{min}=15,\; Q_1=51,\; Q_2=67,\; Q_3=82,\; x_{max}=97$
2. The data set appears to be skewed to the left.
1. $Q_1=440,\; Q_2=552.5,\; Q_3=661\; \; \text{and}\; \; IQR=221$
2. $Q_1=641,\; Q_2=667,\; Q_3=700\; \; \text{and}\; \; IQR=59$
3. $Q_1=407,\; Q_2=448,\; Q_3=504\; \; \text{and}\; \; IQR=97$

2.5 The Empirical Rule and Chebyshev's Theorem

Basic

State the Empirical Rule.
Describe the conditions under which the Empirical Rule may be applied.
State Chebyshev’s Theorem.
Describe the conditions under which Chebyshev’s Theorem may be applied.
A sample data set with a bell-shaped distribution has mean $\bar{x}=6$ and standard deviation $s=2$. Find the approximate proportion of observations in the data set that lie:
1. between $4$ and $8$;
2. between $2$ and $10$;
3. between $0$ and $12$.
A population data set with a bell-shaped distribution has mean $\mu =6$ and standard deviation $\sigma =2$. Find the approximate proportion of observations in the data set that lie:
1. between $4$ and $8$;
2. between $2$ and $10$;
3. between $0$ and $12$.
A population data set with a bell-shaped distribution has mean $\mu =2$ and standard deviation $\sigma =1.1$. Find the approximate proportion of observations in the data set that lie:
1. above $2$;
2. above $3.1$;
3. between $2$ and $3.1$.
A sample data set with a bell-shaped distribution has mean $\bar{x}=2$ and standard deviation $s=1.1$. Find the approximate proportion of observations in the data set that lie:
1. below $-0.2$;
2. below $3.1$;
3. between $-1.3$ and $0.9$.
A population data set with a bell-shaped distribution and size $N=500$ has mean $\mu =2$ and standard deviation $\sigma =1.1$. Find the approximate number of observations in the data set that lie:
1. above $2$;
2. above $3.1$;
3. between $2$ and $3.1$.
A sample data set with a bell-shaped distribution and size $n=128$ has mean $\bar{x}=2$ and standard deviation $s=1.1$. Find the approximate number of observations in the data set that lie:
1. below $-0.2$;
2. below $3.1$;
3. between $-1.3$ and $0.9$.
A sample data set has mean $\bar{x}=6$ and standard deviation $s=2$. Find the minimum proportion of observations in the data set that must lie:
1. between $2$ and $10$;
2. between $0$ and $12$;
3. between $4$ and $8$.
A population data set has mean $\mu =2$ and standard deviation $\sigma =1.1$. Find the minimum proportion of observations in the data set that must lie:
1. between $-0.2$ and $4.2$;
2. between $-1.3$ and $5.3$.
A population data set of size $N=500$ has mean $\mu =5.2$ and standard deviation $\sigma =1.1$. Find the minimum number of observations in the data set that must lie:
1. between $3$ and $7.4$;
2. between $1.9$ and $8.5$.
A sample data set of size $n=128$ has mean $\bar{x}=2$ and standard deviation $s=2$. Find the minimum number of observations in the data set that must lie:
1. between $-2$ and $6$ (including $-2$ and $6$);
2. between $-4$ and $8$ (including $-4$ and $8$).
A sample data set of size $n=30$ has mean $\bar{x}=6$ and standard deviation $s=2$.
1. What is the maximum proportion of observations in the data set that can lie outside the interval $(2,10)$?
2. What can be said about the proportion of observations in the data set that are below $2$?
3. What can be said about the proportion of observations in the data set that are above $10$?
4. What can be said about the number of observations in the data set that are above $10$?
A population data set has mean $\mu =2$ and standard deviation $\sigma =1.1$.
1. What is the maximum proportion of observations in the data set that can lie outside the interval $(-1.3,5.3)$?
2. What can be said about the proportion of observations in the data set that are below $-1.3$?
3. What can be said about the proportion of observations in the data set that are above $5.3$?

Applications

Scores on a final exam taken by $1,200$ students have a bell-shaped distribution with mean $72$ and standard deviation $9$.
1. What is the median score on the exam?
2. About how many students scored between $63$ and $81$?
3. About how many students scored between $72$ and $90$?
4. About how many students scored below $54$?
Lengths of fish caught by a commercial fishing boat have a bell-shaped distribution with mean $23$ inches and standard deviation $1.5$ inches.
1. About what proportion of all fish caught are between $20$ inches and $26$ inches long?
2. About what proportion of all fish caught are between $20$ inches and $23$ inches long?
3. About how long is the longest fish caught (only a small fraction of a percent are longer)?
Hockey pucks used in professional hockey games must weigh between $5.5$ and $6$ ounces. If the weight of pucks manufactured by a particular process is bell-shaped, has mean $5.75$ ounces and standard deviation $0.125$ ounce, what proportion of the pucks will be usable in professional games?
Hockey pucks used in professional hockey games must weigh between $5.5$ and $6$ ounces. If the weight of pucks manufactured by a particular process is bell-shaped and has mean $5.75$ ounces, how large can the standard deviation be if $99.7\%$ of the pucks are to be usable in professional games?
Speeds of vehicles on a section of highway have a bell-shaped distribution with mean $60\; mph$ and standard deviation $2.5\; mph$.
1. If the speed limit is $55\; mph$, about what proportion of vehicles are speeding?
2. What is the median speed for vehicles on this highway?
3. What is the percentile rank of the speed $65\; mph$?
4. What speed corresponds to the $16_{th}$ percentile?
Suppose that, as in the previous exercise, speeds of vehicles on a section of highway have mean $60\; mph$ and standard deviation $2.5\; mph$, but now the distribution of speeds is unknown.
1. If the speed limit is $55\; mph$, at least what proportion of vehicles must speeding?
2. What can be said about the proportion of vehicles going $65\; mph$ or faster?
An instructor announces to the class that the scores on a recent exam had a bell-shaped distribution with mean $75$ and standard deviation $5$.
1. What is the median score?
2. Approximately what proportion of students in the class scored between $70$ and $80$?
3. Approximately what proportion of students in the class scored above $85$?
4. What is the percentile rank of the score $85$?
The GPAs of all currently registered students at a large university have a bell-shaped distribution with mean $2.7$ and standard deviation $0.6$. Students with a GPA below $1.5$ are placed on academic probation. Approximately what percentage of currently registered students at the university are on academic probation?
Thirty-six students took an exam on which the average was $80$ and the standard deviation was $6$. A rumor says that five students had scores $61$ or below. Can the rumor be true? Why or why not?

Additional Exercises

For the sample data \[\begin{array}{c|c c c c c c c} x &26 &27 &28 &29 &30 &31 &32 \\ \hline f &3 &4 &16 &12 &6 &2 &1\\ \end{array}\] \[\sum x=1,256\; \; \text{and}\; \; \sum x^2=35,926\]
1. Compute the mean and the standard deviation.
2. About how many of the measurements does the Empirical Rule predict will be in the interval $\left (\bar{x}-s,\bar{x}+s \right )$, the interval $\left (\bar{x}-2s,\bar{x}+2s \right )$, and the interval $\left (\bar{x}-3s,\bar{x}+3s \right )$?
3. Compute the number of measurements that are actually in each of the intervals listed in part (a), and compare to the predicted numbers.
A sample of size $n = 80$ has mean $139$ and standard deviation $13$, but nothing else is known about it.
1. What can be said about the number of observations that lie in the interval $(126,152)$?
2. What can be said about the number of observations that lie in the interval $(113,165)$?
3. What can be said about the number of observations that exceed $165$?
4. What can be said about the number of observations that either exceed $165$ or are less than $113$?
For the sample data \[\begin{array}{c|c c c c c } x &1 &2 &3 &4 &5 \\ \hline f &84 &29 &3 &3 &1\\ \end{array}\] \[\sum x=168\; \; \text{and}\; \; \sum x^2=300\]
1. Compute the sample mean and the sample standard deviation.
2. Considering the shape of the data set, do you expect the Empirical Rule to apply? Count the number of measurements within one standard deviation of the mean and compare it to the number predicted by the Empirical Rule.
3. What does Chebyshev’s Rule say about the number of measurements within one standard deviation of the mean?
4. Count the number of measurements within two standard deviations of the mean and compare it to the minimum number guaranteed by Chebyshev’s Theorem to lie in that interval.
For the sample data set \[\begin{array}{c|c c c c c } x &47 &48 &49 &50 &51 \\ \hline f &1 &3 &18 &2 &1\\ \end{array}\] \[\sum x=1224\; \; \text{and}\; \; \sum x^2=59,940\]
1. Compute the sample mean and the sample standard deviation.
2. Considering the shape of the data set, do you expect the Empirical Rule to apply? Count the number of measurements within one standard deviation of the mean and compare it to the number predicted by the Empirical Rule.
3. What does Chebyshev’s Rule say about the number of measurements within one standard deviation of the mean?
4. Count the number of measurements within two standard deviations of the mean and compare it to the minimum number guaranteed by Chebyshev’s Theorem to lie in that interval.

Answers

See the displayed statement in the text.
See the displayed statement in the text.
1. $0.68$
2. $0.95$
3. $0.997$
1. $0.5$
2. $0.16$
3. $0.34$
1. $250$
2. $80$
3. $170$
1. $3/4$
2. $8/9$
3. $0$
1. $375$
2. $445$
1. At most $0.25$.
2. At most $0.25$.
3. At most $0.25$.
4. At most $7$.
1. $72$
2. $816$
3. $570$
4. $30$
$0.95$
1. $0.975$
2. $60$
3. $97.5$
4. $57.5$
1. $75$
2. $0.68$
3. $0.025$
4. $0.975$
By Chebyshev’s Theorem at most $1/9$ of the scores can be below $62$, so the rumor is impossible.
1. Nothing.
2. It is at least $60$.
3. It is at most $20$.
4. It is at most $20$.
1. $\bar{x}=48.96$, $s = 0.7348$.
2. Roughly bell-shaped, the Empirical Rule should apply. True count: $18$, Predicted: $17$.
3. Nothing.
4. True count: $23$, Guaranteed: at least $18.75$, hence at least $19$.

Contributor

Anonymous

Search

Text Color

Text Size

Margin Size

Font Type

Applications

Additional Exercises

Large Data Set Exercises

Answer

Applications

Additional Exercises

Answers