10.E: Correlation and Regression (Exercises)

Last updated
Save as PDF

Page ID: 1115

Anonymous
LibreTexts

$ \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } $ $ \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} $$\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$ $\newcommand{\id}{\mathrm{id}}$ $ \newcommand{\Span}{\mathrm{span}}$ $ \newcommand{\kernel}{\mathrm{null}\,}$ $ \newcommand{\range}{\mathrm{range}\,}$ $ \newcommand{\RealPart}{\mathrm{Re}}$ $ \newcommand{\ImaginaryPart}{\mathrm{Im}}$ $ \newcommand{\Argument}{\mathrm{Arg}}$ $ \newcommand{\norm}[1]{\| #1 \|}$ $ \newcommand{\inner}[2]{\langle #1, #2 \rangle}$ $ \newcommand{\Span}{\mathrm{span}}$$\newcommand{\AA}{\unicode[.8,0]{x212B}}$

These are homework exercises to accompany the Textmap created for "Introductory Statistics" by Shafer and Zhang.

10.1 Linear Relationships Between Variables

Basic

A line has equation $y=0.5x+2$.
1. Pick five distinct $x$-values, use the equation to compute the corresponding $y$-values, and plot the five points obtained.
2. Give the value of the slope of the line; give the value of the $y$-intercept.
A line has equation $y=x-0.5$.
1. Pick five distinct $x$-values, use the equation to compute the corresponding $y$-values, and plot the five points obtained.
2. Give the value of the slope of the line; give the value of the $y$-intercept.
A line has equation $y=-2x+4$.
1. Pick five distinct $x$-values, use the equation to compute the corresponding $y$-values, and plot the five points obtained.
2. Give the value of the slope of the line; give the value of the $y$-intercept.
A line has equation $y=-1.5x+1$.
1. Pick five distinct $x$-values, use the equation to compute the corresponding $y$-values, and plot the five points obtained.
2. Give the value of the slope of the line; give the value of the $y$-intercept.
Based on the information given about a line, determine how $y$ will change (increase, decrease, or stay the same) when $x$ is increased, and explain. In some cases it might be impossible to tell from the information given.
1. The slope is positive.
2. The $y$-intercept is positive.
3. The slope is zero.
Based on the information given about a line, determine how $y$ will change (increase, decrease, or stay the same) when $x$ is increased, and explain. In some cases it might be impossible to tell from the information given.
1. The $y$-intercept is negative.
2. The $y$-intercept is zero.
3. The slope is negative.
A data set consists of eight $(x,y)$ pairs of numbers: \[\begin{matrix} (0,12) & (4,16) & (8,22) & (15,28)\\ (2,15) & (5,14) & (13,24) & (20,30) \end{matrix}\]
1. Plot the data in a scatter diagram.
2. Based on the plot, explain whether the relationship between $x$ and $y$ appears to be deterministic or to involve randomness.
3. Based on the plot, explain whether the relationship between $x$ and $y$ appears to be linear or not linear.
A data set consists of ten $(x,y)$ pairs of numbers: \[\begin{matrix} (3,20) & (6,9) & (11,0) & (14,1) & (18,9)\\ (5,13) & (8,4) & (12,0) & (17,6) & (20,16) \end{matrix}\]
1. Plot the data in a scatter diagram.
2. Based on the plot, explain whether the relationship between $x$ and $y$ appears to be deterministic or to involve randomness.
3. Based on the plot, explain whether the relationship between $x$ and $y$ appears to be linear or not linear.
A data set consists of nine $(x,y)$ pairs of numbers: \[\begin{matrix} (8,16) & (10,4) & (12,0) & (14,4) & (16,16)\\ (9,9) & (11,1) & (13,1) & (15,9) & \end{matrix}\]
1. Plot the data in a scatter diagram.
2. Based on the plot, explain whether the relationship between $x$ and $y$ appears to be deterministic or to involve randomness.
3. Based on the plot, explain whether the relationship between $x$ and $y$ appears to be linear or not linear.
A data set consists of five $(x,y)$ pairs of numbers: \[\begin{matrix} (0,1) & (2,5) & (3,7) & (5,11) & (8,17) \end{matrix}\]
1. Plot the data in a scatter diagram.
2. Based on the plot, explain whether the relationship between $x$ and $y$ appears to be deterministic or to involve randomness.
3. Based on the plot, explain whether the relationship between $x$ and $y$ appears to be linear or not linear.

Applications

At $60^{\circ}F$ a particular blend of automotive gasoline weights $6.17$ lb/gal. The weight $y$ of gasoline on a tank truck that is loaded with $x$ gallons of gasoline is given by the linear equation \[y=6.17x\]
1. Explain whether the relationship between the weight $y$ and the amount $x$ of gasoline is deterministic or contains an element of randomness.
2. Predict the weight of gasoline on a tank truck that has just been loaded with $6,750$ gallons of gasoline.
The rate for renting a motor scooter for one day at a beach resort area is $\$25$ plus $30$ cents for each mile the scooter is driven. The total cost $y$ in dollars for renting a scooter and driving it $x$ miles is \[y=0.30x+25\]
1. Explain whether the relationship between the cost $y$ of renting the scooter for a day and the distance $x$ that the scooter is driven that day is deterministic or contains an element of randomness.
2. A person intends to rent a scooter one day for a trip to an attraction $17$ miles away. Assuming that the total distance the scooter is driven is $34$ miles, predict the cost of the rental.
The pricing schedule for labor on a service call by an elevator repair company is $\$150$ plus $\$50$ per hour on site.
1. Write down the linear equation that relates the labor cost $y$ to the number of hours $x$ that the repairman is on site.
2. Calculate the labor cost for a service call that lasts $2.5$ hours.
The cost of a telephone call made through a leased line service is $2.5$ cents per minute.
1. Write down the linear equation that relates the cost $y$ (in cents) of a call to its length $x$.
2. Calculate the cost of a call that lasts $23$ minutes.

Large Data Set Exercises

Large Data Sets not available

Large $\text{Data Set 1}$ lists the SAT scores and GPAs of $1,000$ students. Plot the scatter diagram with SAT score as the independent variable ($x$) and GPA as the dependent variable ($y$). Comment on the appearance and strength of any linear trend.
Large $\text{Data Set 12}$ lists the golf scores on one round of golf for $75$ golfers first using their own original clubs, then using clubs of a new, experimental design (after two months of familiarization with the new clubs). Plot the scatter diagram with golf score using the original clubs as the independent variable ($x$) and golf score using the new clubs as the dependent variable ($y$). Comment on the appearance and strength of any linear trend.
Large $\text{Data Set 13}$ records the number of bidders and sales price of a particular type of antique grandfather clock at $60$ auctions. Plot the scatter diagram with the number of bidders at the auction as the independent variable ($x$) and the sales price as the dependent variable ($y$). Comment on the appearance and strength of any linear trend.

Answers

1. Answers vary.
2. Slope $m=0.5$; $y$-intercept $b=2$.
1. Answers vary.
2. Slope $m=-2$; $y$-intercept $b=4$.
1. $y$ increases.
2. Impossible to tell.
3. $y$ does not change.
1. Scatter diagram needed.
2. Involves randomness.
3. Linear.
1. Scatter diagram needed.
2. Deterministic.
3. Not linear.
1. Deterministic.
2. $41,647.5$ pounds.
1. $y=50x+150$.
2. $\$275$.
There appears to a hint of some positive correlation.
There appears to be clear positive correlation.

10.2 The Linear Correlation Coefficient

Basic

With the exception of the exercises at the end of Section 10.3, the first Basic exercise in each of the following sections through Section 10.7 uses the data from the first exercise here, the second Basic exercise uses the data from the second exercise here, and so on, and similarly for the Application exercises. Save your computations done on these exercises so that you do not need to repeat them later.

For the sample data \[\begin{array}{c|c c c c c} x &0 &1 &3 &5 &8 \\ \hline y &2 &4 &6 &5 &9\\ \end{array}\]
1. Draw the scatter plot.
2. Based on the scatter plot, predict the sign of the linear correlation coefficient. Explain your answer.
3. Compute the linear correlation coefficient and compare its sign to your answer to part (b).
For the sample data \[\begin{array}{c|c c c c c} x &0 &2 &3 &6 &9 \\ \hline y &0 &3 &3 &4 &8\\ \end{array}\]
1. Draw the scatter plot.
2. Based on the scatter plot, predict the sign of the linear correlation coefficient. Explain your answer.
3. Compute the linear correlation coefficient and compare its sign to your answer to part (b).
For the sample data \[\begin{array}{c|c c c c c} x &1 &3 &4 &6 &8 \\ \hline y &4 &1 &3 &-1 &0\\ \end{array}\]
1. Draw the scatter plot.
2. Based on the scatter plot, predict the sign of the linear correlation coefficient. Explain your answer.
3. Compute the linear correlation coefficient and compare its sign to your answer to part (b).
For the sample data \[\begin{array}{c|c c c c c} x &1 &2 &4 &7 &9 \\ \hline y &5 &5 &6 &-3 &0\\ \end{array}\]
1. Draw the scatter plot.
2. Based on the scatter plot, predict the sign of the linear correlation coefficient. Explain your answer.
3. Compute the linear correlation coefficient and compare its sign to your answer to part (b).
For the sample data \[\begin{array}{c|c c c c c} x &1 &1 &3 &4 &5 \\ \hline y &2 &1 &5 &3 &4\\ \end{array}\]
1. Draw the scatter plot.
2. Based on the scatter plot, predict the sign of the linear correlation coefficient. Explain your answer.
3. Compute the linear correlation coefficient and compare its sign to your answer to part (b).
For the sample data \[\begin{array}{c|c c c c c} x &1 &3 &5 &5 &8 \\ \hline y &5 &-2 &2 &-1 &-3\\ \end{array}\]
1. Draw the scatter plot.
2. Based on the scatter plot, predict the sign of the linear correlation coefficient. Explain your answer.
3. Compute the linear correlation coefficient and compare its sign to your answer to part (b).
Compute the linear correlation coefficient for the sample data summarized by the following information: \[n=5\; \; \sum x=25\; \; \sum x^2=165\\ \sum y=24\; \; \sum y^2=134\; \; \sum xy=144\\ 1\leq x\leq 9\]
Compute the linear correlation coefficient for the sample data summarized by the following information: \[n=5\; \; \sum x=31\; \; \sum x^2=253\\ \sum y=18\; \; \sum y^2=90\; \; \sum xy=148\\ 2\leq x\leq 12\]
Compute the linear correlation coefficient for the sample data summarized by the following information: \[n=10\; \; \sum x=0\; \; \sum x^2=60\\ \sum y=24\; \; \sum y^2=234\; \; \sum xy=-87\\ -4\leq x\leq 4\]
Compute the linear correlation coefficient for the sample data summarized by the following information: \[n=10\; \; \sum x=-3\; \; \sum x^2=263\\ \sum y=55\; \; \sum y^2=917\; \; \sum xy=-355\\ -10\leq x\leq 10\]

Applications

The age $x$ in months and vocabulary $y$ were measured for six children, with the results shown in the table. \[\begin{array}{c|c c c c c c c} x &13 &14 &15 &16 &16 &18 \\ \hline y &8 &10 &15 &20 &27 &30\\ \end{array}\]
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
The curb weight $x$ in hundreds of pounds and braking distance $y$ in feet, at $50$ miles per hour on dry pavement, were measured for five vehicles, with the results shown in the table. \[\begin{array}{c|c c c c c c } x &25 &27.5 &32.5 &35 &45 \\ \hline y &105 &125 &140 &140 &150 \\ \end{array}\]
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
The age $x$ and resting heart rate $y$ were measured for ten men, with the results shown in the table. \[\begin{array}{c|c c c c c c } x &20 &23 &30 &37 &35 \\ \hline y &72 &71 &73 &74 &74 \\ \end{array}\\ \begin{array}{c|c c c c c c } x &45 &51 &55 &60 &63 \\ \hline y &73 &72 &79 &75 &77 \\ \end{array}\\\]
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
The wind speed $x$ in miles per hour and wave height $y$ in feet were measured under various conditions on an enclosed deep water sea, with the results shown in the table, \[\begin{array}{c|c c c c c c } x &0 &0 &2 &7 &7 \\ \hline y &2.0 &0.0 &0.3 &0.7 &3.3 \\ \end{array}\\ \begin{array}{c|c c c c c c } x &9 &13 &20 &22 &31 \\ \hline y &4.9 &4.9 &3.0 &6.9 &5.9 \\ \end{array}\\\]
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
The advertising expenditure $x$ and sales $y$ in thousands of dollars for a small retail business in its first eight years in operation are shown in the table. \[\begin{array}{c|c c c c c } x &1.4 &1.6 &1.6 &2.0 \\ \hline y &180 &184 &190 &220 \\ \end{array}\\ \begin{array}{c|c c c c c c } x &2.0 &2.2 &2.4 &2.6 \\ \hline y &186 &215 &205 &240 \\ \end{array}\\\]
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
The height $x$ at age $2$ and $y$ at age $20$, both in inches, for ten women are tabulated in the table. \[\begin{array}{c|c c c c c } x &31.3 &31.7 &32.5 &33.5 &34.4\\ \hline y &60.7 &61.0 &63.1 &64.2 &65.9 \\ \end{array}\\ \begin{array}{c|c c c c c } x &35.2 &35.8 &32.7 &33.6 &34.8 \\ \hline y &68.2 &67.6 &62.3 &64.9 &66.8 \\ \end{array}\\\]
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
The course average $x$ just before a final exam and the score $y$ on the final exam were recorded for $15$ randomly selected students in a large physics class, with the results shown in the table. \[\begin{array}{c|c c c c c } x &69.3 &87.7 &50.5 &51.9 &82.7\\ \hline y &56 &89 &55 &49 &61 \\ \end{array}\\ \begin{array}{c|c c c c c } x &70.5 &72.4 &91.7 &83.3 &86.5 \\ \hline y &66 &72 &83 &73 &82 \\ \end{array}\\ \begin{array}{c|c c c c c } x &79.3 &78.5 &75.7 &52.3 &62.2 \\ \hline y &92 &80 &64 &18 &76 \\ \end{array}\\\]
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
The table shows the acres $x$ of corn planted and acres $y$ of corn harvested, in millions of acres, in a particular country in ten successive years. \[\begin{array}{c|c c c c c } x &75.7 &78.9 &78.6 &80.9 &81.8\\ \hline y &68.8 &69.3 &70.9 &73.6 &75.1 \\ \end{array}\\ \begin{array}{c|c c c c c } x &78.3 &93.5 &85.9 &86.4 &88.2 \\ \hline y &70.6 &86.5 &78.6 &79.5 &81.4 \\ \end{array}\\\]
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
Fifty male subjects drank a measured amount $x$ (in ounces) of a medication and the concentration $y$ (in percent) in their blood of the active ingredient was measured $30$ minutes later. The sample data are summarized by the following information. \[n=50\; \; \sum x=112.5\; \; \sum y=4.83\\ \sum xy=15.255\; \; 0\leq x\leq 4.5\\ \sum x^2=356.25\; \; \sum y^2=0.667\]
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
In an effort to produce a formula for estimating the age of large free-standing oak trees non-invasively, the girth $x$ (in inches) five feet off the ground of $15$ such trees of known age $y$ (in years) was measured. The sample data are summarized by the following information. \[n=15\; \; \sum x=3368\; \; \sum y=6496\\ \sum xy=1,933,219\; \; 74\leq x\leq 395\\ \sum x^2=917,780\; \; \sum y^2=4,260,666\]
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
Construction standards specify the strength of concrete $28$ days after it is poured. For $30$ samples of various types of concrete the strength $x$ after $3$ days and the strength $y$ after $28$ days (both in hundreds of pounds per square inch) were measured. The sample data are summarized by the following information. \[n=30\; \; \sum x=501.6\; \; \sum y=1338.8\\ \sum xy=23,246.55\; \; 11\leq x\leq 22\\ \sum x^2=8724.74\; \; \sum y^2=61,980.14\]
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
Power-generating facilities used forecasts of temperature to forecast energy demand. The average temperature $x$ (degrees Fahrenheit) and the day’s energy demand $y$ (million watt-hours) were recorded on $40$ randomly selected winter days in the region served by a power company. The sample data are summarized by the following information. \[n=40\; \; \sum x=2000\; \; \sum y=2969\\ \sum xy=143,042\; \; 40\leq x\leq 60\\ \sum x^2=101,340\; \; \sum y^2=243,027\]
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

Additional Exercises

In each case state whether you expect the two variables $x$ and $y$ indicated to have positive, negative, or zero correlation.
1. the number $x$ of pages in a book and the age $y$ of the author
2. the number $x$ of pages in a book and the age $y$ of the intended reader
3. the weight $x$ of an automobile and the fuel economy $y$ in miles per gallon
4. the weight $x$ of an automobile and the reading $y$ on its odometer
5. the amount $x$ of a sedative a person took an hour ago and the time $y$ it takes him to respond to a stimulus
In each case state whether you expect the two variables $x$ and $y$ indicated to have positive, negative, or zero correlation.
1. the length $x$ of time an emergency flare will burn and the length $y$ of time the match used to light it burned
2. the average length $x$ of time that calls to a retail call center are on hold one day and the number $y$ of calls received that day
3. the length $x$ of a regularly scheduled commercial flight between two cities and the headwind $y$ encountered by the aircraft
4. the value $x$ of a house and the its size $y$ in square feet
5. the average temperature $x$ on a winter day and the energy consumption $y$ of the furnace
Changing the units of measurement on two variables $x$ and $y$ should not change the linear correlation coefficient. Moreover, most change of units amount to simply multiplying one unit by the other (for example, $1$ foot = $12$ inches). Multiply each $x$ value in the table in Exercise 1 by two and compute the linear correlation coefficient for the new data set. Compare the new value of $r$ to the one for the original data.
Refer to the previous exercise. Multiply each $x$ value in the table in Exercise 2 by two, multiply each $y$ value by three, and compute the linear correlation coefficient for the new data set. Compare the new value of $r$ to the one for the original data.
Reversing the roles of $x$ and $y$ in the data set of Exercise 1 produces the data set \[\begin{array}{c|c c c c c} x &2 &4 &6 &5 &9 \\ \hline y &0 &1 &3 &5 &8\\ \end{array}\]
Compute the linear correlation coefficient of the new set of data and compare it to what you got in Exercise 1.
In the context of the previous problem, look at the formula for $r$ and see if you can tell why what you observed there must be true for every data set.

Large Data Set Exercises

Large Data Sets not available

Large $\text{Data Set 1}$ lists the SAT scores and GPAs of $1,000$ students. Compute the linear correlation coefficient $r$. Compare its value to your comments on the appearance and strength of any linear trend in the scatter diagram that you constructed in the first large data set problem for Section 10.1.
Large $\text{Data Set 12}$ lists the golf scores on one round of golf for $75$ golfers first using their own original clubs, then using clubs of a new, experimental design (after two months of familiarization with the new clubs). Compute the linear correlation coefficient $r$. Compare its value to your comments on the appearance and strength of any linear trend in the scatter diagram that you constructed in the second large data set problem for Section 10.1.
Large $\text{Data Set 13}$ records the number of bidders and sales price of a particular type of antique grandfather clock at $60$ auctions. Compute the linear correlation coefficient $r$. Compare its value to your comments on the appearance and strength of any linear trend in the scatter diagram that you constructed in the third large data set problem for Section 10.1.

Answers

$r=0.921$
$r=-0.794$
$r=0.707$
$0.875$
$-0.846$
$0.948$
$0.709$
$0.832$
$0.751$
$0.965$
$0.992$
.921
1. zero
2. positive
3. negative
4. zero
5. positive
same value
same value
$r=0.4601$
$r=0.9002$

10.3 Modelling Linear Relationships with Randomness Present

Basic

State the three assumptions that are the basis for the Simple Linear Regression Model.
The Simple Linear Regression Model is summarized by the equation \[y=\beta _1x+\beta _0+\varepsilon\] Identify the deterministic part and the random part.
Is the number $\beta _1$ in the equation $y=\beta _1x+\beta _0$ a statistic or a population parameter? Explain.
Is the number $\sigma$ in the Simple Linear Regression Model a statistic or a population parameter? Explain.
Describe what to look for in a scatter diagram in order to check that the assumptions of the Simple Linear Regression Model are true.
True or false: the assumptions of the Simple Linear Regression Model must hold exactly in order for the procedures and analysis developed in this chapter to be useful.

Answers

1. The mean of $y$ is linearly related to $x$.
2. For each given $x$, $y$ is a normal random variable with mean $\beta _1x+\beta _0$ and a standard deviation $\sigma$.
3. All the observations of $y$ in the sample are independent.
$\beta _1$ is a population parameter.
A linear trend.

10.4 The Least Squares Regression Line

Basic

For the Basic and Application exercises in this section use the computations that were done for the exercises with the same number in Section 10.2.

Compute the least squares regression line for the data in Exercise 1 of Section 10.2.
Compute the least squares regression line for the data in Exercise 2 of Section 10.2.
Compute the least squares regression line for the data in Exercise 3 of Section 10.2.
Compute the least squares regression line for the data in Exercise 4 of Section 10.2.
For the data in Exercise 5 of Section 10.2
1. Compute the least squares regression line.
2. Compute the sum of the squared errors $S S E$ $\text{SSE}$ using the definition $Σ {(y - \hat{y})}^{2} .$ $\sum (y-\hat{y})^2$.
3. Compute the sum of the squared errors $S S E$ $\text{SSE}$ using the formula $S S E = S S_{y y} - {\hat{β}}_{1} S S_{x y} .$ $SSE=SS_{yy}-\widehat{\beta _1}SS_{xy}$.
For the data in Exercise 6 of Section 10.2
1. Compute the least squares regression line.
2. Compute the sum of the squared errors $S S E$ $\text{SSE}$ using the definition $Σ {(y - \hat{y})}^{2} .$ $\sum (y-\hat{y})^2$.
3. Compute the sum of the squared errors $S S E$ $\text{SSE}$ using the formula $S S E = S S_{y y} - {\hat{β}}_{1} S S_{x y} .$ $SSE=SS_{yy}-\widehat{\beta _1}SS_{xy}$.
Compute the least squares regression line for the data in Exercise 7 of Section 10.2.
Compute the least squares regression line for the data in Exercise 8 of Section 10.2.
For the data in Exercise 9 of Section 10.2
1. Compute the least squares regression line.
2. Can you compute the sum of the squared errors $S S E$ $\text{SSE}$ using the definition $Σ {(y - \hat{y})}^{2} .$ $\sum (y-\hat{y})^2$? Explain.
3. Compute the sum of the squared errors $S S E$ $\text{SSE}$ using the formula $S S E = S S_{y y} - {\hat{β}}_{1} S S_{x y} .$ $SSE=SS_{yy}-\widehat{\beta _1}SS_{xy}$.
For the data in Exercise 10 of Section 10.2
1. Compute the least squares regression line.
2. Can you compute the sum of the squared errors $S S E$ $\text{SSE}$ using the definition $Σ {(y - \hat{y})}^{2} .$ $\sum (y-\hat{y})^2$? Explain.
3. Compute the sum of the squared errors $S S E$ $\text{SSE}$ using the formula $S S E = S S_{y y} - {\hat{β}}_{1} S S_{x y} .$ $SSE=SS_{yy}-\widehat{\beta _1}SS_{xy}$.

Applications

For the data in Exercise 11 of Section 10.2
1. Compute the least squares regression line.
2. On average, how many new words does a child from $13$ to $18$ months old learn each month? Explain.
3. Estimate the average vocabulary of all $16$-month-old children.
For the data in Exercise 12 of Section 10.2
1. Compute the least squares regression line.
2. On average, how many additional feet are added to the braking distance for each additional $100$ pounds of weight? Explain.
3. Estimate the average braking distance of all cars weighing $3,000$ pounds.
For the data in Exercise 13 of Section 10.2
1. Compute the least squares regression line.
2. Estimate the average resting heart rate of all $40$-year-old men.
3. Estimate the average resting heart rate of all newborn baby boys. Comment on the validity of the estimate.
For the data in Exercise 14 of Section 10.2
1. Compute the least squares regression line.
2. Estimate the average wave height when the wind is blowing at $10$ miles per hour.
3. Estimate the average wave height when there is no wind blowing. Comment on the validity of the estimate.
For the data in Exercise 15 of Section 10.2
1. Compute the least squares regression line.
2. On average, for each additional thousand dollars spent on advertising, how does revenue change? Explain.
3. Estimate the revenue if $\$2,500$ is spent on advertising next year.
For the data in Exercise 16 of Section 10.2
1. Compute the least squares regression line.
2. On average, for each additional inch of height of two-year-old girl, what is the change in the adult height? Explain.
3. Predict the adult height of a two-year-old girl who is $33$ inches tall.
For the data in Exercise 17 of Section 10.2
1. Compute the least squares regression line.
2. Compute $\text{SSE}$ using the formula $S S E = S S_{y y} - {\hat{β}}_{1} S S_{x y} .$ $SSE=SS_{yy}-\widehat{\beta _1}SS_{xy}$.
3. Estimate the average final exam score of all students whose course average just before the exam is $85$.
For the data in Exercise 18 of Section 10.2
1. Compute the least squares regression line.
2. Compute $\text{SSE}$ using the formula $S S E = S S_{y y} - {\hat{β}}_{1} S S_{x y} .$ $SSE=SS_{yy}-\widehat{\beta _1}SS_{xy}$.
3. Estimate the number of acres that would be harvested if $90$ million acres of corn were planted.
For the data in Exercise 19 of Section 10.2
1. Compute the least squares regression line.
2. Interpret the value of the slope of the least squares regression line in the context of the problem.
3. Estimate the average concentration of the active ingredient in the blood in men after consuming $1$ ounce of the medication.
For the data in Exercise 20 of Section 10.2
1. Compute the least squares regression line.
2. Interpret the value of the slope of the least squares regression line in the context of the problem.
3. Estimate the age of an oak tree whose girth five feet off the ground is $92$ inches.
For the data in Exercise 21 of Section 10.2
1. Compute the least squares regression line.
2. The $28$-day strength of concrete used on a certain job must be at least $3,200$ psi. If the $3$-day strength is $1,300$ psi, would we anticipate that the concrete will be sufficiently strong on the $28^{th}$ day? Explain fully.
For the data in Exercise 22 of Section 10.2
1. Compute the least squares regression line.
2. If the power facility is called upon to provide more than $95$ million watt-hours tomorrow then energy will have to be purchased from elsewhere at a premium. The forecast is for an average temperature of $42$ degrees. Should the company plan on purchasing power at a premium?

Additional Exercises

Verify that no matter what the data are, the least squares regression line always passes through the point with coordinates $(\bar{x}, \bar{y}) .$ $(\bar{x},\bar{y})$. Hint: Find the predicted value of $y$ when $x = \bar{x} .$ $x=\bar{x}$.
In Exercise 1 you computed the least squares regression line for the data in Exercise 1 of Section 10.2.
1. Reverse the roles of x and y and compute the least squares regression line for the new data set \[\begin{array}{c|c c c c c c} x &2 &4 &6 &5 &9 \\ \hline y &0 &1 &3 &5 &8\\ \end{array}\]
2. Interchanging x and y corresponds geometrically to reflecting the scatter plot in a 45-degree line. Reflecting the regression line for the original data the same way gives a line with the equation $\bar{y}=1.346x-3.600$. Is this the equation that you got in part (a)? Can you figure out why not? Hint: Think about how x and y are treated differently geometrically in the computation of the goodness of fit.
3. Compute $\text{SSE}$ for each line and see if they fit the same, or if one fits the data better than the other.

Large Data Set Exercises

Large Data Sets not available

Large $\text{Data Set 1}$ lists the SAT scores and GPAs of $1,000$ students.
1. Compute the least squares regression line with SAT score as the independent variable ($x$) and GPA as the dependent variable ($y$).
2. Interpret the meaning of the slope ${\hat{β}}_{1}$ $\widehat{\beta _1}$ of regression line in the context of problem.
3. Compute $\text{SSE}$ the measure of the goodness of fit of the regression line to the sample data.
4. Estimate the GPA of a student whose SAT score is $1350$.
Large $\text{Data Set 12}$ lists the golf scores on one round of golf for $75$ golfers first using their own original clubs, then using clubs of a new, experimental design (after two months of familiarization with the new clubs).
1. Compute the least squares regression line with scores using the original clubs as the independent variable ($x$) and scores using the new clubs as the dependent variable ($y$).
2. Interpret the meaning of the slope ${\hat{β}}_{1}$ $\widehat{\beta _1}$ of regression line in the context of problem.
3. Compute $\text{SSE}$ the measure of the goodness of fit of the regression line to the sample data.
4. Estimate the score with the new clubs of a golfer whose score with the old clubs is $73$.
Large $\text{Data Set 13}$ records the number of bidders and sales price of a particular type of antique grandfather clock at $60$ auctions.
1. Compute the least squares regression line with the number of bidders present at the auction as the independent variable ($x$) and sales price as the dependent variable ($y$).
2. Interpret the meaning of the slope ${\hat{β}}_{1}$ $\widehat{\beta _1}$ of regression line in the context of problem.
3. Compute $\text{SSE}$ the measure of the goodness of fit of the regression line to the sample data.
4. Estimate the sales price of a clock at an auction at which the number of bidders is seven.

Answers

$\hat{y}=0.743x+2.675$
$\hat{y}=-0.610x+4.082$
$\hat{y}=0.625x+1.25,\; SSE=5$
$\hat{y}=0.6x+1.8$
$\hat{y}=-1.45x+2.4,\; SSE=50.25$ (cannot use the definition to compute)
1. $\hat{y}=4.848x-56$
2. $4.8$
3. $21.6$
1. $\hat{y}=0.114x+69.222$
2. $73.8$
3. $69.2$, invalid extrapolation
1. $\hat{y}=42.024x+119.502$
2. increases by $\$42,024$
3. $\$224,562$
1. $\hat{y}=1.045x-8.527$
2. $2151.93367$
3. $80.3$
1. $\hat{y}=0.043x+0.001$
2. For each additional ounce of medication consumed blood concentration of the active ingredient increases by $0.043\%$
3. $0.044\%$
1. $\hat{y}=2.550x+1.993$
2. Predicted $28$-day strength is $3,514$ psi; sufficiently strong
1. $\hat{y}=0.0016x+0.022$
2. On average, every $100$ point increase in SAT score adds $0.16$ point to the GPA.
3. $SSE=432.10$
4. $\hat{y}=2.182$
1. $\hat{y}=116.62x+6955.1$
2. On average, every $1$ additional bidder at an auction raises the price by $116.62$ dollars.
3. $SSE=1850314.08$
4. $\hat{y}=7771.44$

10.5 Statistical Inferences About β1

Basic

For the Basic and Application exercises in this section use the computations that were done for the exercises with the same number in Section 10.2 and Section 10.4.

Construct the $95\%$ confidence interval for the slope $\beta _1$ of the population regression line based on the sample data set of Exercise 1 of Section 10.2.
Construct the $90\%$ confidence interval for the slope $\beta _1$ of the population regression line based on the sample data set of Exercise 2 of Section 10.2.
Construct the $90\%$ confidence interval for the slope $\beta _1$ of the population regression line based on the sample data set of Exercise 3 of Section 10.2.
Construct the $99\%$ confidence interval for the slope $\beta _1$ of the population regression line based on the sample data set of Exercise 4 of Section 10.2.
For the data in Exercise 5 of Section 10.2 test, at the $10\%$ level of significance, whether $x$ is useful for predicting $y$ (that is, whether $\beta _1\neq 0$).
For the data in Exercise 6 of Section 10.2 test, at the $5\%$ level of significance, whether $x$ is useful for predicting $y$ (that is, whether $\beta _1\neq 0$).
Construct the $90\%$ confidence interval for the slope $\beta _1$ of the population regression line based on the sample data set of Exercise 7 of Section 10.2.
Construct the $95\%$ confidence interval for the slope $\beta _1$ of the population regression line based on the sample data set of Exercise 8 of Section 10.2.
For the data in Exercise 9 of Section 10.2 test, at the $1\%$ level of significance, whether $x$ is useful for predicting $y$ (that is, whether $\beta _1\neq 0$).
For the data in Exercise 10 of Section 10.2 test, at the $1\%$ level of significance, whether $x$ is useful for predicting $y$ (that is, whether $\beta _1\neq 0$).

Applications

For the data in Exercise 11 of Section 10.2 construct a $90\%$ confidence interval for the mean number of new words acquired per month by children between $13$ and $18$ months of age.
For the data in Exercise 12 of Section 10.2 construct a $90\%$ confidence interval for the mean increased braking distance for each additional $100$ pounds of vehicle weight.
For the data in Exercise 13 of Section 10.2 test, at the $10\%$ level of significance, whether age is useful for predicting resting heart rate.
For the data in Exercise 14 of Section 10.2 test, at the $10\%$ level of significance, whether wind speed is useful for predicting wave height.
For the situation described in Exercise 15 of Section 10.2
1. Construct the $95\%$ confidence interval for the mean increase in revenue per additional thousand dollars spent on advertising.
2. An advertising agency tells the business owner that for every additional thousand dollars spent on advertising, revenue will increase by over $\$25,000$. Test this claim (which is the alternative hypothesis) at the $5\%$ level of significance.
3. Perform the test of part (b) at the $10\%$ level of significance.
4. Based on the results in (b) and (c), how believable is the ad agency’s claim? (This is a subjective judgement.)
For the situation described in Exercise 16 of Section 10.2
1. Construct the $90\%$ confidence interval for the mean increase in height per additional inch of length at age two.
2. It is claimed that for girls each additional inch of length at age two means more than an additional inch of height at maturity. Test this claim (which is the alternative hypothesis) at the $10\%$ level of significance.
For the data in Exercise 17 of Section 10.2 test, at the $10\%$ level of significance, whether course average before the final exam is useful for predicting the final exam grade.
For the situation described in Exercise 18 of Section 10.2, an agronomist claims that each additional million acres planted results in more than $750,000$ additional acres harvested. Test this claim at the $1\%$ level of significance.
For the data in Exercise 19 of Section 10.2 test, at the $1/10$th of $1\%$ level of significance, whether, ignoring all other facts such as age and body mass, the amount of the medication consumed is a useful predictor of blood concentration of the active ingredient.
For the data in Exercise 20 of Section 10.2 test, at the $1\%$ level of significance, whether for each additional inch of girth the age of the tree increases by at least two and one-half years.
For the data in Exercise 21 of Section 10.2
1. Construct the $95\%$ confidence interval for the mean increase in strength at $28$ days for each additional hundred psi increase in strength at $3$ days.
2. Test, at the $1/10$th of $1\%$ level of significance, whether the $3$-day strength is useful for predicting $28$-day strength.
For the situation described in Exercise 22 of Section 10.2
1. Construct the $99\%$ confidence interval for the mean decrease in energy demand for each one-degree drop in temperature.
2. An engineer with the power company believes that for each one-degree increase in temperature, daily energy demand will decrease by more than $3.6$ million watt-hours. Test this claim at the $1\%$ level of significance.

Large Data Set Exercises

Large Data Sets not available

Large $\text{Data Set 1}$ lists the SAT scores and GPAs of $1,000$ students.
1. Compute the $90\%$ confidence interval for the slope $\beta _1$ of the population regression line with SAT score as the independent variable ($x$) and GPA as the dependent variable ($y$).
2. $β_{1}$ Test, at the $10\%$ level of significance, the hypothesis that the slope of the population regression line is greater than $0.001$, against the null hypothesis that it is exactly $0.001$.
Large $\text{Data Set 12}$ lists the golf scores on one round of golf for $75$ golfers first using their own original clubs, then using clubs of a new, experimental design (after two months of familiarization with the new clubs).
1. $β_{1}$ Compute the $95\%$ confidence interval for the slope $\beta _1$ of the population regression line with scores using the original clubs as the independent variable ($x$) and scores using the new clubs as the dependent variable ($y$).
2. Test, at the $10\%$ level of significance, the hypothesis that the slope of the population regression line is different from $1$, against the null hypothesis that it is exactly $1$.
Large $\text{Data Set 13}$ records the number of bidders and sales price of a particular type of antique grandfather clock at $60$ auctions.
1. Compute the $95\%$ confidence interval for the slope $\beta _1$ of the population regression line with the number of bidders present at the auction as the independent variable($x$) and sales price as the dependent variable ($y$).
2. Test, at the $10\%$ level of significance, the hypothesis that the average sales price increases by more than $\$90$ for each additional bidder at an auction, against the default that it increases by exactly $\$90$.

Answers

$0.743\pm 0.578$
$-0.610\pm 0.633$
$T=1.732,\; \pm t_{0.05}=\pm 2.353$, do not reject $H_0$
$0.6\pm 0.451$
$T=-4.481,\; \pm t_{0.005}=\pm 3.355$, reject $H_0$
$4.8\pm 1.7$ words
$T=2.843,\; \pm t_{0.05}=\pm 1.860$, reject $H_0$
1. $42.024\pm 28.011$ thousand dollars
2. $T=1.487,\; \pm t_{0.05}=\pm 1.943$, do not reject $H_0$
3. $t_{0.10}=1.440$, reject $H_0$
$T=4.096,\; \pm t_{0.05}=\pm 1.771$, reject $H_0$
$T=25.524,\; \pm t_{0.0005}=\pm 3.505$, reject $H_0$
1. $2.550\pm 0.127$ hundred psi
2. $T=41.072,\; \pm t_{0.005}=\pm 3.674$, reject $H_0$
1. $(0.0014,0.0018)$
2. $H_0:\beta _1=0.001\; vs\; H_a:\beta _1>0.001$. Test Statistic: $Z=6.1625$. Rejection Region: $[1.28,+\infty )$. Decision: Reject $H_0$
1. $(101.789,131.4435)$
2. $H_0:\beta _1=90\; vs\; H_a:\beta _1>90$. Test Statistic: $T=3.5938,\; d.f.=58$. Rejection Region: $[1.296,+\infty )$. Decision: Reject $H_0$

10.6 The Coefficient of Determination

Basic

For the Basic and Application exercises in this section use the computations that were done for the exercises with the same number in Section 10.2, Section 10.4, and Section 10.5.

For the sample data set of Exercise 1 of Section 10.2 find the coefficient of determination using the formula $r^{2} = {\hat{β}}_{1} S S_{x y} ∕ S S_{y y} .$ $r^2=\widehat{\beta _1}SS_{xy}/SS_{yy}$. Confirm your answer by squaring $r$ as computed in that exercise.
For the sample data set of Exercise 2 of Section 10.2" find the coefficient of determination using the formula $r^{2} = {\hat{β}}_{1} S S_{x y} ∕ S S_{y y} .$ $r^2=\widehat{\beta _1}SS_{xy}/SS_{yy}$. Confirm your answer by squaring $r$ as computed in that exercise.
For the sample data set of Exercise 3 of Section 10.2 find the coefficient of determination using the formula $r^{2} = {\hat{β}}_{1} S S_{x y} ∕ S S_{y y} .$ $r^2=\widehat{\beta _1}SS_{xy}/SS_{yy}$. Confirm your answer by squaring $r$ as computed in that exercise.
For the sample data set of Exercise 4 of Section 10.2 find the coefficient of determination using the formula $r^{2} = {\hat{β}}_{1} S S_{x y} ∕ S S_{y y} .$ $r^2=\widehat{\beta _1}SS_{xy}/SS_{yy}$. Confirm your answer by squaring $r$ as computed in that exercise.
For the sample data set of Exercise 5 of Section 10.2 find the coefficient of determination using the formula $r^{2} = {\hat{β}}_{1} S S_{x y} ∕ S S_{y y} .$ $r^2=\widehat{\beta _1}SS_{xy}/SS_{yy}$. Confirm your answer by squaring $r$ as computed in that exercise.
For the sample data set of Exercise 6 of Section 10.2 find the coefficient of determination using the formula $r^{2} = {\hat{β}}_{1} S S_{x y} ∕ S S_{y y} .$ $r^2=\widehat{\beta _1}SS_{xy}/SS_{yy}$. Confirm your answer by squaring $r$ as computed in that exercise.
For the sample data set of Exercise 7 of Section 10.2 find the coefficient of determination using the formula $r^{2} = (S S_{y y} - S S E) ∕ S S_{y y} .$ $r^2=(SS_{yy}-SSE)/SS_{yy}$. Confirm your answer by squaring $r$ as computed in that exercise.
For the sample data set of Exercise 7 of Section 10.2 find the coefficient of determination using the formula $r^{2} = (S S_{y y} - S S E) ∕ S S_{y y} .$ $r^2=(SS_{yy}-SSE)/SS_{yy}$. Confirm your answer by squaring $r$ as computed in that exercise.
For the sample data set of Exercise 7 of Section 10.2 find the coefficient of determination using the formula $r^{2} = (S S_{y y} - S S E) ∕ S S_{y y} .$ $r^2=(SS_{yy}-SSE)/SS_{yy}$. Confirm your answer by squaring $r$ as computed in that exercise.
For the sample data set of Exercise 7 of Section 10.2 find the coefficient of determination using the formula $r^{2} = (S S_{y y} - S S E) ∕ S S_{y y} .$ $r^2=(SS_{yy}-SSE)/SS_{yy}$. Confirm your answer by squaring $r$ as computed in that exercise.

Applications

For the data in Exercise 11 of Section 10.2 compute the coefficient of determination and interpret its value in the context of age and vocabulary.
For the data in Exercise 12 of Section 10.2" compute the coefficient of determination and interpret its value in the context of vehicle weight and braking distance.
For the data in Exercise 13 of Section 10.2 compute the coefficient of determination and interpret its value in the context of age and resting heart rate. In the age range of the data, does age seem to be a very important factor with regard to heart rate?
For the data in Exercise 14 of Section 10.2 compute the coefficient of determination and interpret its value in the context of wind speed and wave height. Does wind speed seem to be a very important factor with regard to wave height?
For the data in Exercise 15 of Section 10.2 find the proportion of the variability in revenue that is explained by level of advertising.
For the data in Exercise 16 of Section 10.2 find the proportion of the variability in adult height that is explained by the variation in length at age two.
For the data in Exercise 17 of Section 10.2 compute the coefficient of determination and interpret its value in the context of course average before the final exam and score on the final exam.
For the data in Exercise 18 of Section 10.2 compute the coefficient of determination and interpret its value in the context of acres planted and acres harvested.
For the data in Exercise 19 of Section 10.2 compute the coefficient of determination and interpret its value in the context of the amount of the medication consumed and blood concentration of the active ingredient.
For the data in Exercise 20 of Section 10.2 compute the coefficient of determination and interpret its value in the context of tree size and age.
For the data in Exercise 21 of Section 10.2 find the proportion of the variability in $28$-day strength of concrete that is accounted for by variation in $3$-day strength.
For the data in Exercise 22 of Section 10.2 find the proportion of the variability in energy demand that is accounted for by variation in average temperature.

Large Data Set Exercises

Large Data Sets not available

Large $\text{Data Set 1}$ lists the SAT scores and GPAs of $1,000$ students. Compute the coefficient of determination and interpret its value in the context of SAT scores and GPAs.
Large $\text{Data Set 12}$ lists the golf scores on one round of golf for $75$ golfers first using their own original clubs, then using clubs of a new, experimental design (after two months of familiarization with the new clubs). Compute the coefficient of determination and interpret its value in the context of golf scores with the two kinds of golf clubs.
Large $\text{Data Set 13}$ records the number of bidders and sales price of a particular type of antique grandfather clock at $60$ auctions. Compute the coefficient of determination and interpret its value in the context of the number of bidders at an auction and the price of this type of antique grandfather clock.

Answers

$0.848$
$0.631$
$0.5$
$0.766$
$0.715$
$0.898$; about $90\%$ of the variability in vocabulary is explained by age
$0.503$; about $50\%$ of the variability in heart rate is explained by age. Age is a significant but not dominant factor in explaining heart rate.
The proportion is $r^2=0.692$
$0.563$; about $56\%$ of the variability in final exam scores is explained by course average before the final exam
$0.931$; about $93\%$ of the variability in the blood concentration of the active ingredient is explained by the amount of the medication consumed
The proportion is $r^2=0.984$
$r^2=21.17\%$
$r^2=81.04\%$

10.7 Estimation and Prediction

Basic

For the Basic and Application exercises in this section use the computations that were done for the exercises with the same number in previous sections.

For the sample data set of Exercise 1 of Section 10.2
1. Give a point estimate for the mean value of $y$ in the sub-population determined by the condition $x = 4$.
2. Construct the $90\%$ confidence interval for that mean value.
For the sample data set of Exercise 2 of Section 10.2
1. Give a point estimate for the mean value of $y$ in the sub-population determined by the condition $x = 4$.
2. Construct the $90\%$ confidence interval for that mean value.
For the sample data set of Exercise 3 of Section 10.2
1. Give a point estimate for the mean value of $y$ in the sub-population determined by the condition $x = 7$.
2. Construct the $95\%$ confidence interval for that mean value.
For the sample data set of Exercise 4 of Section 10.2
1. Give a point estimate for the mean value of $y$ in the sub-population determined by the condition $x = 2$.
2. Construct the $80\%$ confidence interval for that mean value.
For the sample data set of Exercise 5 of Section 10.2
1. Give a point estimate for the mean value of $y$ in the sub-population determined by the condition $x = 1$.
2. Construct the $80\%$ confidence interval for that mean value.
For the sample data set of Exercise 6 of Section 10.2
1. Give a point estimate for the mean value of $y$ in the sub-population determined by the condition $x = 5$.
2. Construct the $95\%$ confidence interval for that mean value.
For the sample data set of Exercise 7 of Section 10.2
1. Give a point estimate for the mean value of $y$ in the sub-population determined by the condition $x = 6$.
2. Construct the $99\%$ confidence interval for that mean value.
3. Is it valid to make the same estimates for $x = 12$? Explain.
For the sample data set of Exercise 8 of Section 10.2
1. Give a point estimate for the mean value of $y$ in the sub-population determined by the condition $x = 12$.
2. Construct the $80\%$ confidence interval for that mean value.
3. Is it valid to make the same estimates for $x = 0$? Explain.
For the sample data set of Exercise 9 of Section 10.2
1. Give a point estimate for the mean value of $y$ in the sub-population determined by the condition $x = 0$.
2. Construct the $90\%$ confidence interval for that mean value.
3. Is it valid to make the same estimates for $x = -1$? Explain.
$x = − 1$ For the sample data set of Exercise 9 of Section 10.2
1. Give a point estimate for the mean value of $y$ in the sub-population determined by the condition $x = 8$.
2. Construct the $95\%$ confidence interval for that mean value.
3. Is it valid to make the same estimates for $x = 0$? Explain.

Applications

For the data in Exercise 11 of Section 10.2
1. Give a point estimate for the average number of words in the vocabulary of $18$-month-old children.
2. Construct the $95\%$ confidence interval for that mean value.
3. Construct the $95\%$ confidence interval for that mean value.
For the data in Exercise 12 of Section 10.2
1. Give a point estimate for the average braking distance of automobiles that weigh $3,250$ pounds.
2. Construct the $80\%$ confidence interval for that mean value.
3. Is it valid to make the same estimates for $5,000$-pound automobiles? Explain.
For the data in Exercise 13 of Section 10.2
1. Give a point estimate for the resting heart rate of a man who is $35$ years old.
2. One of the men in the sample is $35$ years old, but his resting heart rate is not what you computed in part (a). Explain why this is not a contradiction.
3. Construct the $90\%$ confidence interval for the mean resting heart rate of all $35$-year-old men.
For the data in Exercise 14 of Section 10.2
1. Give a point estimate for the wave height when the wind speed is $13$ miles per hour.
2. One of the wind speeds in the sample is $13$ miles per hour, but the height of waves that day is not what you computed in part (a). Explain why this is not a contradiction.
3. Construct the $95\%$ confidence interval for the mean wave height on days when the wind speed is $13$ miles per hour.
For the data in Exercise 15 of Section 10.2
1. The business owner intends to spend $\$2,500$ on advertising next year. Give an estimate of next year’s revenue based on this fact.
2. Construct the $90\%$ prediction interval for next year’s revenue, based on the intent to spend $\$2,500$ on advertising.
For the data in Exercise 16 of Section 10.2
1. A two-year-old girl is $32.3$ inches long. Predict her adult height.
2. Construct the $95\%$ prediction interval for the girl’s adult height.
For the data in Exercise 17 of Section 10.2
1. Lodovico has a $78.6$ average in his physics class just before the final. Give a point estimate of what his final exam grade will be.
2. Explain whether an interval estimate for this problem is a confidence interval or a prediction interval.
3. Based on your answer to (b), construct an interval estimate for Lodovico’s final exam grade at the $90\%$ level of confidence.
For the data in Exercise 18 of Section 10.2
1. This year $86.2$ million acres of corn were planted. Give a point estimate of the number of acres that will be harvested this year.
2. Explain whether an interval estimate for this problem is a confidence interval or a prediction interval.
3. Based on your answer to (b), construct an interval estimate for the number of acres that will be harvested this year, at the $99\%$ level of confidence.
For the data in Exercise 19 of Section 10.2
1. Give a point estimate for the blood concentration of the active ingredient of this medication in a man who has consumed $1.5$ ounces of the medication just recently.
2. Gratiano just consumed $1.5$ ounces of this medication $30$ minutes ago. Construct a $95\%$ prediction interval for the concentration of the active ingredient in his blood right now.
For the data in Exercise 20 of Section 10.2
1. You measure the girth of a free-standing oak tree five feet off the ground and obtain the value $127$ inches. How old do you estimate the tree to be?
2. Construct a $90\%$ prediction interval for the age of this tree.
For the data in Exercise 21 of Section 10.2
1. A test cylinder of concrete three days old fails at $1,750$ psi. Predict what the $28$-day strength of the concrete will be.
2. Construct a $99\%$ prediction interval for the $28$-day strength of this concrete.
3. Based on your answer to (b), what would be the minimum $28$-day strength you could expect this concrete to exhibit?
For the data in Exercise 22 of Section 10.2
1. Tomorrow’s average temperature is forecast to be $53$ degrees. Estimate the energy demand tomorrow.
2. Construct a $99\%$ prediction interval for the energy demand tomorrow.
3. Based on your answer to (b), what would be the minimum demand you could expect?

Large Data Set Exercises

Large Data Sets not available

Large $\text{Data Set 1}$ lists the SAT scores and GPAs of $1,000$ students.
1. Give a point estimate of the mean GPA of all students who score $1350$ on the SAT.
2. Construct a $90\%$ confidence interval for the mean GPA of all students who score $1350$ on the SAT.
Large $\text{Data Set 12}$ lists the golf scores on one round of golf for $75$ golfers first using their own original clubs, then using clubs of a new, experimental design (after two months of familiarization with the new clubs).
1. Thurio averages $72$ strokes per round with his own clubs. Give a point estimate for his score on one round if he switches to the new clubs.
2. Explain whether an interval estimate for this problem is a confidence interval or a prediction interval.
3. Based on your answer to (b), construct an interval estimate for Thurio’s score on one round if he switches to the new clubs, at $90\%$ confidence.
Large $\text{Data Set 13}$ records the number of bidders and sales price of a particular type of antique grandfather clock at $60$ auctions.
1. There are seven likely bidders at the Verona auction today. Give a point estimate for the price of such a clock at today’s auction.
2. Explain whether an interval estimate for this problem is a confidence interval or a prediction interval.
3. Based on your answer to (b), construct an interval estimate for the likely sale price of such a clock at today’s sale, at $95\%$ confidence.

Answers

1. $5.647$
2. $5.647\pm 1.253$
1. $-0.188$
2. $-0.188\pm 3.041$
1. $1.875$
2. $1.875\pm 1.423$
1. $5.4$
2. $5.4\pm 3.355$
3. invalid (extrapolation)
1. $2.4$
2. 2.4±1.4742.4±1.474$2.4\pm 1.474$
3. valid ($-1$ is in the range of the $x$-values in the data set)
1. $31.3$ words
2. $31.3\pm 7.1$ words
3. not valid, since two years is $24$ months, hence this is extrapolation
1. $73.2$ beats/min
2. The man’s heart rate is not the predicted average for all men his age.
3. $73.2\pm 1.2$ beats/min
1. $\$224,562$
2. $\$224,562 \pm \$28,699$
1. $74$
2. Prediction (one person, not an average for all who have average $78.6$ before the final exam)
3. $74\pm 24$
1. $0.066\%$
2. $0.066\pm 0.034\%$
1. $4,656$ psi
2. 4,656±321$4,656\pm 321$ psi
3. $4,656-321=4,335$ psi
1. $2.19$
2. $(2.1421,2.2316)$
1. $7771.39$
2. A prediction interval.
3. $(7410.41,8132.38)$

10.8 A Complete Example

Basic

The exercises in this section are unrelated to those in previous sections.

The data give the amount $x$ of silicofluoride in the water (mg/L) and the amount $y$ of lead in the bloodstream (μg/dL) of ten children in various communities with and without municipal water. Perform a complete analysis of the data, in analogy with the discussion in this section (that is, make a scatter plot, do preliminary computations, find the least squares regression line, find $S S E$ $SSE,\; s_\varepsilon$ and $r$, and so on). In the hypothesis test use as the alternative hypothesis $β_{1} > 0$ $\beta _1>0$, and test at the $5\%$ level of significance. Use confidence level $95\%$ for the confidence interval for $β_{1} > 0$ $\beta _1$. Construct $95\%$ confidence and predictions intervals at $x_{p} = 2$ $x_p=2$ at the end. \[\begin{array}{c|c c c c c} x &0.0 &0.0 &1.1 &1.4 &1.6 \\ \hline y &0.3 &0.1 &4.7 &3.2 &5.1\\ \end{array}\\ \begin{array}{c|c c c c c} x &1.7 &2.0 &2.0 &2.2 &2.2 \\ \hline y &7.0 &5.0 &6.1 &8.6 &9.5\\ \end{array}\]
The table gives the weight $x$ (thousands of pounds) and available heat energy $y$ (million BTU) of a standard cord of various species of wood typically used for heating. Perform a complete analysis of the data, in analogy with the discussion in this section (that is, make a scatter plot, do preliminary computations, find the least squares regression line, find $S S E$ $SSE,\; s_\varepsilon$ and $r$, and so on). In the hypothesis test use as the alternative hypothesis $β_{1} > 0$ $\beta _1$, and test at the $5\%$ level of significance. Use confidence level $95\%$ for the confidence interval for $β_{1} > 0$ $\beta _1$. Construct $95\%$ confidence and predictions intervals at $x_{p} = 2$ $x_p=5$ at the end. $x_{p} = 5$ \[\begin{array}{c|c c c c c} x &3.37 &3.50 &4.29 &4.00 &4.64 \\ \hline y &23.6 &17.5 &20.1 &21.6 &28.1\\ \end{array}\\ \begin{array}{c|c c c c c} x &4.99 &4.94 &5.48 &3.26 &4.16 \\ \hline y &25.3 &27.0 &30.7 &18.9 &20.7\\ \end{array}\]

Large Data Set Exercises

Large Data Sets not available

Large Data Sets 3 and 3A list the shoe sizes and heights of $174$ customers entering a shoe store. The gender of the customer is not indicated in Large Data Set 3. However, men’s and women’s shoes are not measured on the same scale; for example, a size $8$ shoe for men is not the same size as a size $8$ shoe for women. Thus it would not be meaningful to apply regression analysis to Large Data Set 3. Nevertheless, compute the scatter diagrams, with shoe size as the independent variable ($x$) and height as the dependent variable ($y$), for (i) just the data on men, (ii) just the data on women, and (iii) the full mixed data set with both men and women. Does the third, invalid scatter diagram look markedly different from the other two?
Separate out from Large Data Set 3A just the data on men and do a complete analysis, with shoe size as the independent variable ($x$) and height as the dependent variable ($y$). Use $α = 0.05$ $\alpha =0.05$ and $x_{p} = 2$ $x_p=10$ whenever appropriate.
Separate out from Large Data Set 3A just the data on women and do a complete analysis, with shoe size as the independent variable ($x$) and height as the dependent variable ($y$). Use $α = 0.05$ $\alpha =0.05$ and $x_{p} = 2$ $x_p=10$ whenever appropriate.

$α = 0.05$ Answers

\[\sum x=14.2,\; \sum y=49.6,\; \sum xy=91.73,\; \sum x^2=26.3,\; \sum y^2=333.86\\ SS_{xx}=6.136,\; SS_{xy}=21.298,\; SS_{yy}=87.844\\ \bar{x}=1.42,\; \bar{y}=4.96\\ \widehat{\beta _1}=3.47,\; \widehat{\beta _0}=0.03\\ SSE=13.92\\ s_\varepsilon =1.32\\ r = 0.9174, r^2 = 0.8416\\ df=8, T = 6.518\]
The $95\%$ confidence interval for $β_{1} > 0$ $\beta _1$ is: $(2.24,4.70)$
At $x_{p} = 2$ $x_p=2$ the $95\%$ confidence interval for $E (y)$ $E(y)$ is $(5.77,8.17)$
At $x_{p} = 2$ $x_p=2$ the $95\%$ confidence interval for $E (y)$ $y$ is $(3.73,10.21)$
The positively correlated trend seems less profound than that in each of the previous plots.
The regression line: $\hat{y}=3.3426x+138.7692$. Coefficient of Correlation: $r = 0.9431$. Coefficient of Determination: $r^2 = 0.8894$. $SSE=283.2473$. $s_e=1.9305$. A $95\%$ confidence interval for $β_{1} > 0$ $\beta _1$: $(3.0733,3.6120)$. Test Statistic for $H_{0} : β_{1} = 0$ $H_0: \beta _1=0: T=24.7209$. At $x_{p} = 2$ $x_p=10$, $\hat{y}=172.1956$; a $95\%$ confidence interval for the mean value of $y$ is: $(171.5577,172.8335)$; and a $95\%$ prediction interval for an individual value of $y$ is: $(168.2974,176.0938)$.