10.E: Correlation and Regression (Exercises)
These are homework exercises to accompany the Textmap created for "Introductory Statistics" by Shafer and Zhang. Complementary General Chemistry question banks can be found for other Textmaps and can be accessed here. In addition to these publicly available questions, access to private problems bank for use in exams and homework is available to faculty only on an individual basis; please contact Delmar Larsen for an account with access permission.
10.1 Linear Relationships Between Variables
Exercises
Basic

A line has equation y=0.5x+2.
 Pick five distinct xvalues, use the equation to compute the corresponding yvalues, and plot the five points obtained.
 Give the value of the slope of the line; give the value of the yintercept.

A line has equation y=x−0.5.
 Pick five distinct xvalues, use the equation to compute the corresponding yvalues, and plot the five points obtained.
 Give the value of the slope of the line; give the value of the yintercept.

A line has equation y=−2x+4.
 Pick five distinct xvalues, use the equation to compute the corresponding yvalues, and plot the five points obtained.
 Give the value of the slope of the line; give the value of the yintercept.

A line has equation y=−1.5x+1.
 Pick five distinct xvalues, use the equation to compute the corresponding yvalues, and plot the five points obtained.
 Give the value of the slope of the line; give the value of the yintercept.

Based on the information given about a line, determine how y will change (increase, decrease, or stay the same) when x is increased, and explain. In some cases it might be impossible to tell from the information given.
 The slope is positive.
 The yintercept is positive.
 The slope is zero.

Based on the information given about a line, determine how y will change (increase, decrease, or stay the same) when x is increased, and explain. In some cases it might be impossible to tell from the information given.
 The yintercept is negative.
 The yintercept is zero.
 The slope is negative.

A data set consists of eight (x,y) pairs of numbers:
(0,12)(2,15)(4,16)(5,14)(8,22)(13,24)(15,28)(20,30) Plot the data in a scatter diagram.
 Based on the plot, explain whether the relationship between x and y appears to be deterministic or to involve randomness.
 Based on the plot, explain whether the relationship between x and y appears to be linear or not linear.

A data set consists of ten (x,y) pairs of numbers:
(3,20)(5,13)(6,9)(8,4)(11,0)(12,0)(14,1)(17,6)(18,9)(20,16) Plot the data in a scatter diagram.
 Based on the plot, explain whether the relationship between x and y appears to be deterministic or to involve randomness.
 Based on the plot, explain whether the relationship between x and y appears to be linear or not linear.

A data set consists of nine (x,y) pairs of numbers:
(8,16)(9,9)(10,4)(11,1)(12,0)(13,1)(14,4)(15,9)(16,16) Plot the data in a scatter diagram.
 Based on the plot, explain whether the relationship between x and y appears to be deterministic or to involve randomness.
 Based on the plot, explain whether the relationship between x and y appears to be linear or not linear.

A data set consists of five (x,y) pairs of numbers:
(0,1)(2,5)(3,7)(5,11)(8,17) Plot the data in a scatter diagram.
 Based on the plot, explain whether the relationship between x and y appears to be deterministic or to involve randomness.
 Based on the plot, explain whether the relationship between x and y appears to be linear or not linear.
Applications

At 60°F a particular blend of automotive gasoline weights 6.17 lb/gal. The weight y of gasoline on a tank truck that is loaded with x gallons of gasoline is given by the linear equation
y=6.17x Explain whether the relationship between the weight y and the amount x of gasoline is deterministic or contains an element of randomness.
 Predict the weight of gasoline on a tank truck that has just been loaded with 6,750 gallons of gasoline.

The rate for renting a motor scooter for one day at a beach resort area is $25 plus 30 cents for each mile the scooter is driven. The total cost y in dollars for renting a scooter and driving it x miles is
y=0.30x+25 Explain whether the relationship between the cost y of renting the scooter for a day and the distance xthat the scooter is driven that day is deterministic or contains an element of randomness.
 A person intends to rent a scooter one day for a trip to an attraction 17 miles away. Assuming that the total distance the scooter is driven is 34 miles, predict the cost of the rental.

The pricing schedule for labor on a service call by an elevator repair company is $150 plus $50 per hour on site.
 Write down the linear equation that relates the labor cost y to the number of hours x that the repairman is on site.
 Calculate the labor cost for a service call that lasts 2.5 hours.

The cost of a telephone call made through a leased line service is 2.5 cents per minute.
 Write down the linear equation that relates the cost y (in cents) of a call to its length x.
 Calculate the cost of a call that lasts 23 minutes.
Large Data Set Exercises

Large Data Set 1 lists the SAT scores and GPAs of 1,000 students. Plot the scatter diagram with SAT score as the independent variable (x) and GPA as the dependent variable (y). Comment on the appearance and strength of any linear trend.
http://www.gone.2012books.lardbucket.org/sites/all/files/data1.xls

Large Data Set 12 lists the golf scores on one round of golf for 75 golfers first using their own original clubs, then using clubs of a new, experimental design (after two months of familiarization with the new clubs). Plot the scatter diagram with golf score using the original clubs as the independent variable (x) and golf score using the new clubs as the dependent variable (y). Comment on the appearance and strength of any linear trend.
http://www.gone.2012books.lardbucket.org/sites/all/files/data12.xls

Large Data Set 13 records the number of bidders and sales price of a particular type of antique grandfather clock at 60 auctions. Plot the scatter diagram with the number of bidders at the auction as the independent variable (x) and the sales price as the dependent variable (y). Comment on the appearance and strength of any linear trend.
http://www.gone.2012books.lardbucket.org/sites/all/files/data13.xls
Answers

 Answers vary.
 Slope m=0.5; yintercept b=2.


 Answers vary.
 Slope m=−2; yintercept b=4.


 y increases.
 Impossible to tell.
 y does not change.


 Scatter diagram needed.
 Involves randomness.
 Linear.


 Scatter diagram needed.
 Deterministic.
 Not linear.


 Deterministic.
 41,647.5 pounds.


 y=50x+150.
 b. $275.


There appears to a hint of some positive correlation.


There appears to be clear positive correlation.
10.2 The Linear Correlation Coefficient
Exercises
Basic
With the exception of the exercises at the end of Section 10.3, the first Basic exercise in each of the following sections through Section 10.7 uses the data from the first exercise here, the second Basic exercise uses the data from the second exercise here, and so on, and similarly for the Application exercises. Save your computations done on these exercises so that you do not need to repeat them later.

For the sample data
 Draw the scatter plot.
 Based on the scatter plot, predict the sign of the linear correlation coefficient. Explain your answer.
 Compute the linear correlation coefficient and compare its sign to your answer to part (b).

For the sample data
 Draw the scatter plot.
 Based on the scatter plot, predict the sign of the linear correlation coefficient. Explain your answer.
 Compute the linear correlation coefficient and compare its sign to your answer to part (b).

For the sample data
 Draw the scatter plot.
 Based on the scatter plot, predict the sign of the linear correlation coefficient. Explain your answer.
 Compute the linear correlation coefficient and compare its sign to your answer to part (b).

For the sample data
 Draw the scatter plot.
 Based on the scatter plot, predict the sign of the linear correlation coefficient. Explain your answer.
 Compute the linear correlation coefficient and compare its sign to your answer to part (b).

For the sample data
 Draw the scatter plot.
 Based on the scatter plot, predict the sign of the linear correlation coefficient. Explain your answer.
 Compute the linear correlation coefficient and compare its sign to your answer to part (b).

For the sample data
 Draw the scatter plot.
 Based on the scatter plot, predict the sign of the linear correlation coefficient. Explain your answer.
 Compute the linear correlation coefficient and compare its sign to your answer to part (b).

Compute the linear correlation coefficient for the sample data summarized by the following information:

Compute the linear correlation coefficient for the sample data summarized by the following information:

Compute the linear correlation coefficient for the sample data summarized by the following information:
Applications

The age \(x\) in months and vocabulary \(y\) were measured for six children, with the results shown in the table.
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

The curb weight \(x\) in hundreds of pounds and braking distance \(y\) in feet, at 50 miles per hour on dry pavement, were measured for five vehicles, with the results shown in the table.
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

The age \(x\) and resting heart rate \(y\) were measured for ten men, with the results shown in the table.
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

The wind speed \(x\) in miles per hour and wave height \(y\) in feet were measured under various conditions on an enclosed deep water sea, with the results shown in the table,
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

The advertising expenditure \(x\) and sales \(y\) in thousands of dollars for a small retail business in its first eight years in operation are shown in the table.
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

The height \(x\) at age 2 and \(y\) at age 20, both in inches, for ten women are tabulated in the table.
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

The course average \(x\) just before a final exam and the score \(y\) on the final exam were recorded for 15 randomly selected students in a large physics class, with the results shown in the table.
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

The table shows the acres \(x\) of corn planted and acres \(y\) of corn harvested, in millions of acres, in a particular country in ten successive years.
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

Fifty male subjects drank a measured amount \(x\) (in ounces) of a medication and the concentration \(y\) (in percent) in their blood of the active ingredient was measured 30 minutes later. The sample data are summarized by the following information.
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

In an effort to produce a formula for estimating the age of large freestanding oak trees noninvasively, the girth \(x\) (in inches) five feet off the ground of 15 such trees of known age \(y\) (in years) was measured. The sample data are summarized by the following information.
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

Construction standards specify the strength of concrete 28 days after it is poured. For 30 samples of various types of concrete the strength \(x\) after 3 days and the strength \(y\) after 28 days (both in hundreds of pounds per square inch) were measured. The sample data are summarized by the following information.
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

Powergenerating facilities used forecasts of temperature to forecast energy demand. The average temperature \(x\) (degrees Fahrenheit) and the day’s energy demand \(y\) (million watthours) were recorded on 40 randomly selected winter days in the region served by a power compan\(y\). The sample data are summarized by the following information.
Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.
Additional Exercises

In each case state whether you expect the two variables \(x\) and \(y\) indicated to have positive, negative, or zero correlation.
 the number \(x\) of pages in a book and the age \(y\) of the author
 the number \(x\) of pages in a book and the age \(y\) of the intended reader
 the weight \(x\) of an automobile and the fuel economy \(y\) in miles per gallon
 the weight \(x\) of an automobile and the reading \(y\) on its odometer
 the amount \(x\) of a sedative a person took an hour ago and the time \(y\) it takes him to respond to a stimulus

In each case state whether you expect the two variables \(x\) and \(y\) indicated to have positive, negative, or zero correlation.
 the length \(x\) of time an emergency flare will burn and the length \(y\) of time the match used to light it burned
 the average length \(x\) of time that calls to a retail call center are on hold one day and the number \(y\) of calls received that day
 the length \(x\) of a regularly scheduled commercial flight between two cities and the headwind \(y\) encountered by the aircraft
 the value \(x\) of a house and the its size \(y\) in square feet
 the average temperature \(x\) on a winter day and the energy consumption \(y\) of the furnace

Changing the units of measurement on two variables \(x\) and \(y\) should not change the linear correlation coefficient. Moreover, most change of units amount to simply multiplying one unit by the other (for example, 1 foot = 12 inches). Multiply each \(x\) value in the table in Exercise 1 by two and compute the linear correlation coefficient for the new data set. Compare the new value of \(r\) to the one for the original data.

Refer to the previous exercise. Multiply each \(x\) value in the table in Exercise 2 by two, multiply each \(y\) value by three, and compute the linear correlation coefficient for the new data set. Compare the new value of \(r\) to the one for the original data.

Reversing the roles of \(x\) and \(y\) in the data set of Exercise 1 produces the data set
Compute the linear correlation coefficient of the new set of data and compare it to what you got in Exercise 1.

In the context of the previous problem, look at the formula for \(r\) and see if you can tell why what you observed there must be true for every data set.
Large Data Set Exercises

Large Data Set 1 lists the SAT scores and GPAs of 1,000 students. Compute the linear correlation coefficient r. Compare its value to your comments on the appearance and strength of any linear trend in the scatter diagram that you constructed in the first large data set problem for Section 10.1 "Linear Relationships Between Variables".
http://www.gone.2012books.lardbucket.org/sites/all/files/data1.xls

r. Compare its value to your comments on the appearance and strength of any linear trend in the scatter diagram that you constructed in the second large data set problem for Section 10.1 "Linear Relationships Between Variables".
http://www.gone.2012books.lardbucket.org/sites/all/files/data12.xls

Large Data Set 13 records the number of bidders and sales price of a particular type of antique grandfather clock at 60 auctions. Compute the linear correlation coefficient r. Compare its value to your comments on the appearance and strength of any linear trend in the scatter diagram that you constructed in the third large data set problem for Section 10.1 "Linear Relationships Between Variables".
http://www.gone.2012books.lardbucket.org/sites/all/files/data13.xls
10.3 Modelling Linear Relationships with Randomness Present

State the three assumptions that are the basis for the Simple Linear Regression Model.

The Simple Linear Regression Model is summarized by the equation
Identify the deterministic part and the random part.

Is the number σ in the Simple Linear Regression Model a statistic or a population parameter? Explain.

Describe what to look for in a scatter diagram in order to check that the assumptions of the Simple Linear Regression Model are true.

True or false: the assumptions of the Simple Linear Regression Model must hold exactly in order for the procedures and analysis developed in this chapter to be useful.
Answers

 The mean of y is linearly related to x.
 For each given x, y is a normal random variable with mean
β1x+β0 and standard deviation σ.  All the observations of y in the sample are independent.


is a population parameter.


A linear trend.
10.4 The Least Squares Regression Line
Basic
For the Basic and Application exercises in this section use the computations that were done for the exercises with the same number in Section 10.2 "The Linear Correlation Coefficient".

Compute the least squares regression line for the data in Exercise 1 of Section 10.2 "The Linear Correlation Coefficient".

Compute the least squares regression line for the data in Exercise 2 of Section 10.2 "The Linear Correlation Coefficient".

Compute the least squares regression line for the data in Exercise 3 of Section 10.2 "The Linear Correlation Coefficient".

Compute the least squares regression line for the data in Exercise 4 of Section 10.2 "The Linear Correlation Coefficient".

For the data in Exercise 5 of Section 10.2 "The Linear Correlation Coefficient"
 Compute the least squares regression line.
 Compute the sum of the squared errors
SSE using the definitionΣ(y−yˆ)2.  Compute the sum of the squared errors
SSE using the formulaSSE=SSyy−βˆ1SSxy.

For the data in Exercise 6 of Section 10.2 "The Linear Correlation Coefficient"
 Compute the least squares regression line.
 Compute the sum of the squared errors
SSE using the definitionΣ(y−yˆ)2.  Compute the sum of the squared errors
SSE using the formulaSSE=SSyy−βˆ1SSxy.

Compute the least squares regression line for the data in Exercise 7 of Section 10.2 "The Linear Correlation Coefficient".

Compute the least squares regression line for the data in Exercise 8 of Section 10.2 "The Linear Correlation Coefficient".

For the data in Exercise 9 of Section 10.2 "The Linear Correlation Coefficient"
 Compute the least squares regression line.
 Can you compute the sum of the squared errors
SSE using the definitionΣ(y−yˆ)2 ? Explain.  Compute the sum of the squared errors
SSE using the formulaSSE=SSyy−βˆ1SSxy.

For the data in Exercise 10 of Section 10.2 "The Linear Correlation Coefficient"
 Compute the least squares regression line.
 Can you compute the sum of the squared errors
SSE using the definitionΣ(y−yˆ)2 ? Explain.  Compute the sum of the squared errors
SSE using the formulaSSE=SSyy−βˆ1SSxy.
Applications

For the data in Exercise 11 of Section 10.2 "The Linear Correlation Coefficient"
 Compute the least squares regression line.
 On average, how many new words does a child from 13 to 18 months old learn each month? Explain.
 Estimate the average vocabulary of all 16monthold children.

For the data in Exercise 12 of Section 10.2 "The Linear Correlation Coefficient"
 Compute the least squares regression line.
 On average, how many additional feet are added to the braking distance for each additional 100 pounds of weight? Explain.
 Estimate the average braking distance of all cars weighing 3,000 pounds.

For the data in Exercise 13 of Section 10.2 "The Linear Correlation Coefficient"
 Compute the least squares regression line.
 Estimate the average resting heart rate of all 40yearold men.
 Estimate the average resting heart rate of all newborn baby boys. Comment on the validity of the estimate.

For the data in Exercise 14 of Section 10.2 "The Linear Correlation Coefficient"
 Compute the least squares regression line.
 Estimate the average wave height when the wind is blowing at 10 miles per hour.
 Estimate the average wave height when there is no wind blowing. Comment on the validity of the estimate.

For the data in Exercise 15 of Section 10.2 "The Linear Correlation Coefficient"
 Compute the least squares regression line.
 On average, for each additional thousand dollars spent on advertising, how does revenue change? Explain.
 Estimate the revenue if $2,500 is spent on advertising next year.

For the data in Exercise 16 of Section 10.2 "The Linear Correlation Coefficient"
 Compute the least squares regression line.
 On average, for each additional inch of height of twoyearold girl, what is the change in the adult height? Explain.
 Predict the adult height of a twoyearold girl who is 33 inches tall.

For the data in Exercise 17 of Section 10.2 "The Linear Correlation Coefficient"
 Compute the least squares regression line.
 Compute
SSE using the formulaSSE=SSyy−βˆ1SSxy.  Estimate the average final exam score of all students whose course average just before the exam is 85.

For the data in Exercise 18 of Section 10.2 "The Linear Correlation Coefficient"
 Compute the least squares regression line.
 Compute
SSE using the formulaSSE=SSyy−βˆ1SSxy.  Estimate the number of acres that would be harvested if 90 million acres of corn were planted.

For the data in Exercise 19 of Section 10.2 "The Linear Correlation Coefficient"
 Compute the least squares regression line.
 Interpret the value of the slope of the least squares regression line in the context of the problem.
 Estimate the average concentration of the active ingredient in the blood in men after consuming 1 ounce of the medication.

For the data in Exercise 20 of Section 10.2 "The Linear Correlation Coefficient"
 Compute the least squares regression line.
 Interpret the value of the slope of the least squares regression line in the context of the problem.
 Estimate the age of an oak tree whose girth five feet off the ground is 92 inches.

For the data in Exercise 21 of Section 10.2 "The Linear Correlation Coefficient"
 Compute the least squares regression line.
 The 28day strength of concrete used on a certain job must be at least 3,200 psi. If the 3day strength is 1,300 psi, would we anticipate that the concrete will be sufficiently strong on the 28th day? Explain fully.

For the data in Exercise 22 of Section 10.2 "The Linear Correlation Coefficient"
 Compute the least squares regression line.
 If the power facility is called upon to provide more than 95 million watthours tomorrow then energy will have to be purchased from elsewhere at a premium. The forecast is for an average temperature of 42 degrees. Should the company plan on purchasing power at a premium?
Additional Exercises

Verify that no matter what the data are, the least squares regression line always passes through the point with coordinates
(x−,y−). Hint: Find the predicted value of y whenx=x−. 
In Exercise 1 you computed the least squares regression line for the data in Exercise 1 of Section 10.2 "The Linear Correlation Coefficient".

Reverse the roles of x and y and compute the least squares regression line for the new data set
xy2041635598  Interchanging x and y corresponds geometrically to reflecting the scatter plot in a 45degree line. Reflecting the regression line for the original data the same way gives a line with the equation
yˆ=1.346x−3.600. Is this the equation that you got in part (a)? Can you figure out why not? Hint: Think about how x and y are treated differently geometrically in the computation of the goodness of fit.  Compute
SSE for each line and see if they fit the same, or if one fits the data better than the other.

Large Data Set Exercises

Large Data Set 1 lists the SAT scores and GPAs of 1,000 students.
http://www.gone.2012books.lardbucket.org/sites/all/files/data1.xls
 Compute the least squares regression line with SAT score as the independent variable (x) and GPA as the dependent variable (y).
 Interpret the meaning of the slope
βˆ1 of regression line in the context of problem.  Compute
SSE , the measure of the goodness of fit of the regression line to the sample data.  Estimate the GPA of a student whose SAT score is 1350.

Large Data Set 12 lists the golf scores on one round of golf for 75 golfers first using their own original clubs, then using clubs of a new, experimental design (after two months of familiarization with the new clubs).
http://www.gone.2012books.lardbucket.org/sites/all/files/data12.xls
 Compute the least squares regression line with scores using the original clubs as the independent variable (x) and scores using the new clubs as the dependent variable (y).
 Interpret the meaning of the slope
βˆ1 of regression line in the context of problem.  Compute
SSE , the measure of the goodness of fit of the regression line to the sample data.  Estimate the score with the new clubs of a golfer whose score with the old clubs is 73.

Large Data Set 13 records the number of bidders and sales price of a particular type of antique grandfather clock at 60 auctions.
http://www.gone.2012books.lardbucket.org/sites/all/files/data13.xls
 Compute the least squares regression line with the number of bidders present at the auction as the independent variable (x) and sales price as the dependent variable (y).
 Interpret the meaning of the slope
βˆ1 of regression line in the context of problem.  Compute
SSE , the measure of the goodness of fit of the regression line to the sample data.  Estimate the sales price of a clock at an auction at which the number of bidders is seven.
Answers

yˆ=0.743x+2.675 

yˆ=−0.610x+4.082 

yˆ=0.625x+1.25 ,SSE=5 

yˆ=0.6x+1.8 

yˆ=−1.45x+2.4 ,SSE=50.25 (cannot use the definition to compute) 

yˆ=4.848x−56 , 4.8,
 21.6


yˆ=0.114x+69.222 , 73.8,
 69.2, invalid extrapolation


yˆ=42.024x+119.502 , increases by $42,024,
 $224,562


yˆ=1.045x−8.527 , 2151.93367,
 80.3


yˆ=0.043x+0.001 , For each additional ounce of medication consumed blood concentration of the active ingredient increases by 0.043 %,
 0.044%


yˆ=2.550x+1.993 , Predicted 28day strength is 3,514 psi; sufficiently strong


yˆ=0.0016x+0.022  On average, every 100 point increase in SAT score adds 0.16 point to the GPA.
SSE=432.10 yˆ=2.182


yˆ=116.62x+6955.1  On average, every 1 additional bidder at an auction raises the price by 116.62 dollars.
SSE=1850314.08 
10.5 Statistical Inferences About β1
Basic
For the Basic and Application exercises in this section use the computations that were done for the exercises with the same number in Section 10.2 "The Linear Correlation Coefficient" and Section 10.4 "The Least Squares Regression Line".
Construct the 95% confidence interval for the slope
Construct the 90% confidence interval for the slope
Construct the 90% confidence interval for the slope
Construct the 99% confidence interval for the slope
For the data in Exercise 5 of Section 10.2 "The Linear Correlation Coefficient" test, at the 10% level of significance, whether x is useful for predicting y (that is, whether
For the data in Exercise 6 of Section 10.2 "The Linear Correlation Coefficient" test, at the 5% level of significance, whether x is useful for predicting y (that is, whether
Construct the 90% confidence interval for the slope
Construct the 95% confidence interval for the slope
For the data in Exercise 9 of Section 10.2 "The Linear Correlation Coefficient" test, at the 1% level of significance, whether x is useful for predicting y (that is, whether
For the data in Exercise 10 of Section 10.2 "The Linear Correlation Coefficient" test, at the 1% level of significance, whether x is useful for predicting y (that is, whether
Applications
For the data in Exercise 11 of Section 10.2 "The Linear Correlation Coefficient" construct a 90% confidence interval for the mean number of new words acquired per month by children between 13 and 18 months of age.
For the data in Exercise 12 of Section 10.2 "The Linear Correlation Coefficient" construct a 90% confidence interval for the mean increased braking distance for each additional 100 pounds of vehicle weight.
For the data in Exercise 13 of Section 10.2 "The Linear Correlation Coefficient" test, at the 10% level of significance, whether age is useful for predicting resting heart rate.
For the data in Exercise 14 of Section 10.2 "The Linear Correlation Coefficient" test, at the 10% level of significance, whether wind speed is useful for predicting wave height.
For the situation described in Exercise 15 of Section 10.2 "The Linear Correlation Coefficient"
Construct the 95% confidence interval for the mean increase in revenue per additional thousand dollars spent on advertising.
An advertising agency tells the business owner that for every additional thousand dollars spent on advertising, revenue will increase by over $25,000. Test this claim (which is the alternative hypothesis) at the 5% level of significance.
Perform the test of part (b) at the 10% level of significance.
Based on the results in (b) and (c), how believable is the ad agency’s claim? (This is a subjective judgement.)
For the situation described in Exercise 16 of Section 10.2 "The Linear Correlation Coefficient"
Construct the 90% confidence interval for the mean increase in height per additional inch of length at age two.
It is claimed that for girls each additional inch of length at age two means more than an additional inch of height at maturity. Test this claim (which is the alternative hypothesis) at the 10% level of significance.
For the data in Exercise 17 of Section 10.2 "The Linear Correlation Coefficient" test, at the 10% level of significance, whether course average before the final exam is useful for predicting the final exam grade.
For the situation described in Exercise 18 of Section 10.2 "The Linear Correlation Coefficient", an agronomist claims that each additional million acres planted results in more than 750,000 additional acres harvested. Test this claim at the 1% level of significance.
For the data in Exercise 19 of Section 10.2 "The Linear Correlation Coefficient" test, at the 1/10th of 1% level of significance, whether, ignoring all other facts such as age and body mass, the amount of the medication consumed is a useful predictor of blood concentration of the active ingredient.
For the data in Exercise 20 of Section 10.2 "The Linear Correlation Coefficient" test, at the 1% level of significance, whether for each additional inch of girth the age of the tree increases by at least two and onehalf years.
For the data in Exercise 21 of Section 10.2 "The Linear Correlation Coefficient"
Construct the 95% confidence interval for the mean increase in strength at 28 days for each additional hundred psi increase in strength at 3 days.
Test, at the 1/10th of 1% level of significance, whether the 3day strength is useful for predicting 28day strength.
For the situation described in Exercise 22 of Section 10.2 "The Linear Correlation Coefficient"
Construct the 99% confidence interval for the mean decrease in energy demand for each onedegree drop in temperature.
An engineer with the power company believes that for each onedegree increase in temperature, daily energy demand will decrease by more than 3.6 million watthours. Test this claim at the 1% level of significance.
Large Data Set Exercises
Large Data Set 1 lists the SAT scores and GPAs of 1,000 students.
http://www.gone.2012books.lardbucket.org/sites/all/files/data1.xls
Compute the 90% confidence interval for the slope
Test, at the 10% level of significance, the hypothesis that the slope of the population regression line is greater than 0.001, against the null hypothesis that it is exactly 0.001.
Large Data Set 12 lists the golf scores on one round of golf for 75 golfers first using their own original clubs, then using clubs of a new, experimental design (after two months of familiarization with the new clubs).
http://www.gone.2012books.lardbucket.org/sites/all/files/data12.xls
Compute the 95% confidence interval for the slope
Test, at the 10% level of significance, the hypothesis that the slope of the population regression line is different from 1, against the null hypothesis that it is exactly 1.
Large Data Set 13 records the number of bidders and sales price of a particular type of antique grandfather clock at 60 auctions.
http://www.gone.2012books.lardbucket.org/sites/all/files/data13.xls
Compute the 95% confidence interval for the slope
Test, at the 10% level of significance, the hypothesis that the average sales price increases by more than $90 for each additional bidder at an auction, against the default that it increases by exactly $90.
Answers
10.7 Estimation and Prediction
Exercises
Basic
For the Basic and Application exercises in this section use the computations that were done for the exercises with the same number in Section 10.2 "The Linear Correlation Coefficient", Section 10.4 "The Least Squares Regression Line", and Section 10.5 "Statistical Inferences About ".
For the sample data set of Exercise 1 of Section 10.2 find the coefficient of determination using the formula
For the sample data set of Exercise 2 of Section 10.2 find the coefficient of determination using the formula
For the sample data set of Exercise 3 of Section 10.2 find the coefficient of determination using the formula
For the sample data set of Exercise 4 of Section 10.2 find the coefficient of determination using the formula
For the sample data set of Exercise 5 of Section 10.2 find the coefficient of determination using the formula
For the sample data set of Exercise 6 of Section 10.2 find the coefficient of determination using the formula
For the sample data set of Exercise 7 of Section 10.2 find the coefficient of determination using the formula
For the sample data set of Exercise 8 of Section 10.2 find the coefficient of determination using the formula
For the sample data set of Exercise 9 of Section 10.2 find the coefficient of determination using the formula
For the sample data set of Exercise 9 of Section 10.2 find the coefficient of determination using the formula
Applications
For the data in Exercise 11 of Section 10.2 compute the coefficient of determination and interpret its value in the context of age and vocabulary.
For the data in Exercise 12 of Section 10.2 compute the coefficient of determination and interpret its value in the context of vehicle weight and braking distance.
For the data in Exercise 13 of Section 10.2 compute the coefficient of determination and interpret its value in the context of age and resting heart rate. In the age range of the data, does age seem to be a very important factor with regard to heart rate?
For the data in Exercise 14 of Section 10.2 compute the coefficient of determination and interpret its value in the context of wind speed and wave height. Does wind speed seem to be a very important factor with regard to wave height?
For the data in Exercise 15 of Section 10.2 find the proportion of the variability in revenue that is explained by level of advertising.
For the data in Exercise 16 of Section 10.2 find the proportion of the variability in adult height that is explained by the variation in length at age two.
For the data in Exercise 17 of Section 10.2 compute the coefficient of determination and interpret its value in the context of course average before the final exam and score on the final exam.
For the data in Exercise 18 of Section 10.2 compute the coefficient of determination and interpret its value in the context of acres planted and acres harvested.
For the data in Exercise 19 of Section 10.2 compute the coefficient of determination and interpret its value in the context of the amount of the medication consumed and blood concentration of the active ingredient.
For the data in Exercise 20 of Section 10.2 compute the coefficient of determination and interpret its value in the context of tree size and age.
For the data in Exercise 21 of Section 10.2 find the proportion of the variability in 28day strength of concrete that is accounted for by variation in 3day strength.
For the data in Exercise 22 of Section 10.2 find the proportion of the variability in energy demand that is accounted for by variation in average temperature.
Large Data Set Exercises
Large Data Set 1 lists the SAT scores and GPAs of 1,000 students. Compute the coefficient of determination and interpret its value in the context of SAT scores and GPAs.
http://www.gone.2012books.lardbucket.org/sites/all/files/data1.xls
Large Data Set 12 lists the golf scores on one round of golf for 75 golfers first using their own original clubs, then using clubs of a new, experimental design (after two months of familiarization with the new clubs). Compute the coefficient of determination and interpret its value in the context of golf scores with the two kinds of golf clubs.
http://www.gone.2012books.lardbucket.org/sites/all/files/data12.xls
Large Data Set 13 records the number of bidders and sales price of a particular type of antique grandfather clock at 60 auctions. Compute the coefficient of determination and interpret its value in the context of the number of bidders at an auction and the price of this type of antique grandfather clock.
http://www.gone.2012books.lardbucket.org/sites/all/files/data13.xls
Answers
0.848
0.631
0.5
0.766
0.715
0.898; about 90% of the variability in vocabulary is explained by age
0.503; about 50% of the variability in heart rate is explained by age. Age is a significant but not dominant factor in explaining heart rate.
The proportion is r2 = 0.692.
0.563; about 56% of the variability in final exam scores is explained by course average before the final exam
0.931; about 93% of the variability in the blood concentration of the active ingredient is explained by the amount of the medication consumed
The proportion is r2 = 0.984.
10.8 A Complete Example
The exercises in this section are unrelated to those in previous sections.
The data give the amount x of silicofluoride in the water (mg/L) and the amount y of lead in the bloodstream (μg/dL) of ten children in various communities with and without municipal water. Perform a complete analysis of the data, in analogy with the discussion in this section (that is, make a scatter plot, do preliminary computations, find the least squares regression line, find
The table gives the weight x (thousands of pounds) and available heat energy y (million BTU) of a standard cord of various species of wood typically used for heating. Perform a complete analysis of the data, in analogy with the discussion in this section (that is, make a scatter plot, do preliminary computations, find the least squares regression line, find
Large Data Set Exercises

Large Data Sets 3 and 3A list the shoe sizes and heights of 174 customers entering a shoe store. The gender of the customer is not indicated in Large Data Set 3. However, men’s and women’s shoes are not measured on the same scale; for example, a size 8 shoe for men is not the same size as a size 8 shoe for women. Thus it would not be meaningful to apply regression analysis to Large Data Set 3. Nevertheless, compute the scatter diagrams, with shoe size as the independent variable (x) and height as the dependent variable (y), for (i) just the data on men, (ii) just the data on women, and (iii) the full mixed data set with both men and women. Does the third, invalid scatter diagram look markedly different from the other two?
http://www.gone.2012books.lardbucket.org/sites/all/files/data3.xls
http://www.gone.2012books.lardbucket.org/sites/all/files/data3A.xls

Separate out from Large Data Set 3A just the data on men and do a complete analysis, with shoe size as the independent variable (x) and height as the dependent variable (y). Use
α=0.05 andxp=10 whenever appropriate.http://www.gone.2012books.lardbucket.org/sites/all/files/data3A.xls

Separate out from Large Data Set 3A just the data on women and do a complete analysis, with shoe size as the independent variable (x) and height as the dependent variable (y). Use
α=0.05 andxp=10 whenever appropriate.http://www.gone.2012books.lardbucket.org/sites/all/files/data3A.xls
Answers

Σx=14.2 ,Σy=49.6 ,Σxy=91.73 ,Σx2=26.3 ,Σy2=333.86. SSxx=6.136 ,SSxy=21.298 ,SSyy=87.844. x−=1.42 ,y−=4.96. βˆ1=3.47 ,βˆ0=0.03. SSE=13.92. sε=1.32. r = 0.9174, r2 = 0.8416.
df=8 , T = 6.518.The 95% confidence interval for
β1 is:(2.24,4.70). At
xp=2 , the 95% confidence interval forE(y) is(5.77,8.17). At
xp=2 , the 95% prediction interval for y is(3 .73,10.21). 
The regression line:

The positively correlated trend seems less profound than that in each of the previous plots.
