# 10.E: Correlation and Regression (Exercises)

These are homework exercises to accompany the Textmap created for "Introductory Statistics" by Shafer and Zhang. Complementary General Chemistry question banks can be found for other Textmaps and can be accessed here. In addition to these publicly available questions, access to private problems bank for use in exams and homework is available to faculty only on an individual basis; please contact Delmar Larsen for an account with access permission.

## 10.1 Linear Relationships Between Variables

### Basic

1. A line has equation y=0.5x+2.

1. Pick five distinct x-values, use the equation to compute the corresponding y-values, and plot the five points obtained.
2. Give the value of the slope of the line; give the value of the y-intercept.
2. A line has equation y=x0.5.

1. Pick five distinct x-values, use the equation to compute the corresponding y-values, and plot the five points obtained.
2. Give the value of the slope of the line; give the value of the y-intercept.
3. A line has equation y=2x+4.

1. Pick five distinct x-values, use the equation to compute the corresponding y-values, and plot the five points obtained.
2. Give the value of the slope of the line; give the value of the y-intercept.
4. A line has equation y=1.5x+1.

1. Pick five distinct x-values, use the equation to compute the corresponding y-values, and plot the five points obtained.
2. Give the value of the slope of the line; give the value of the y-intercept.
5. Based on the information given about a line, determine how y will change (increase, decrease, or stay the same) when x is increased, and explain. In some cases it might be impossible to tell from the information given.

1. The slope is positive.
2. The y-intercept is positive.
3. The slope is zero.
6. Based on the information given about a line, determine how y will change (increase, decrease, or stay the same) when x is increased, and explain. In some cases it might be impossible to tell from the information given.

1. The y-intercept is negative.
2. The y-intercept is zero.
3. The slope is negative.
7. A data set consists of eight (x,y) pairs of numbers:

(0,12)(2,15)(4,16)(5,14)(8,22)(13,24)(15,28)(20,30)
1. Plot the data in a scatter diagram.
2. Based on the plot, explain whether the relationship between x and y appears to be deterministic or to involve randomness.
3. Based on the plot, explain whether the relationship between x and y appears to be linear or not linear.
8. A data set consists of ten (x,y) pairs of numbers:

(3,20)(5,13)(6,9)(8,4)(11,0)(12,0)(14,1)(17,6)(18,9)(20,16)
1. Plot the data in a scatter diagram.
2. Based on the plot, explain whether the relationship between x and y appears to be deterministic or to involve randomness.
3. Based on the plot, explain whether the relationship between x and y appears to be linear or not linear.
9. A data set consists of nine (x,y) pairs of numbers:

(8,16)(9,9)(10,4)(11,1)(12,0)(13,1)(14,4)(15,9)(16,16)
1. Plot the data in a scatter diagram.
2. Based on the plot, explain whether the relationship between x and y appears to be deterministic or to involve randomness.
3. Based on the plot, explain whether the relationship between x and y appears to be linear or not linear.
10. A data set consists of five (x,y) pairs of numbers:

(0,1)(2,5)(3,7)(5,11)(8,17)
1. Plot the data in a scatter diagram.
2. Based on the plot, explain whether the relationship between x and y appears to be deterministic or to involve randomness.
3. Based on the plot, explain whether the relationship between x and y appears to be linear or not linear.

### Applications

1. At 60°F a particular blend of automotive gasoline weights 6.17 lb/gal. The weight y of gasoline on a tank truck that is loaded with x gallons of gasoline is given by the linear equation

y=6.17x
1. Explain whether the relationship between the weight y and the amount x of gasoline is deterministic or contains an element of randomness.
2. Predict the weight of gasoline on a tank truck that has just been loaded with 6,750 gallons of gasoline.
2. The rate for renting a motor scooter for one day at a beach resort area is $25 plus 30 cents for each mile the scooter is driven. The total cost y in dollars for renting a scooter and driving it x miles is y=0.30x+25 1. Explain whether the relationship between the cost y of renting the scooter for a day and the distance xthat the scooter is driven that day is deterministic or contains an element of randomness. 2. A person intends to rent a scooter one day for a trip to an attraction 17 miles away. Assuming that the total distance the scooter is driven is 34 miles, predict the cost of the rental. 3. The pricing schedule for labor on a service call by an elevator repair company is$150 plus $50 per hour on site. 1. Write down the linear equation that relates the labor cost y to the number of hours x that the repairman is on site. 2. Calculate the labor cost for a service call that lasts 2.5 hours. 4. The cost of a telephone call made through a leased line service is 2.5 cents per minute. 1. Write down the linear equation that relates the cost y (in cents) of a call to its length x. 2. Calculate the cost of a call that lasts 23 minutes. ### Large Data Set Exercises 1. Large Data Set 1 lists the SAT scores and GPAs of 1,000 students. Plot the scatter diagram with SAT score as the independent variable (x) and GPA as the dependent variable (y). Comment on the appearance and strength of any linear trend. http://www.gone.2012books.lardbucket.org/sites/all/files/data1.xls 2. Large Data Set 12 lists the golf scores on one round of golf for 75 golfers first using their own original clubs, then using clubs of a new, experimental design (after two months of familiarization with the new clubs). Plot the scatter diagram with golf score using the original clubs as the independent variable (x) and golf score using the new clubs as the dependent variable (y). Comment on the appearance and strength of any linear trend. http://www.gone.2012books.lardbucket.org/sites/all/files/data12.xls 3. Large Data Set 13 records the number of bidders and sales price of a particular type of antique grandfather clock at 60 auctions. Plot the scatter diagram with the number of bidders at the auction as the independent variable (x) and the sales price as the dependent variable (y). Comment on the appearance and strength of any linear trend. http://www.gone.2012books.lardbucket.org/sites/all/files/data13.xls ### Answers 1. Answers vary. 2. Slope m=0.5y-intercept b=2. 1. 1. Answers vary. 2. Slope m=2y-intercept b=4. 2. 1. y increases. 2. Impossible to tell. 3. y does not change. 3. 1. Scatter diagram needed. 2. Involves randomness. 3. Linear. 4. 1. Scatter diagram needed. 2. Deterministic. 3. Not linear. 5. 1. Deterministic. 2. 41,647.5 pounds. 1. 1. y=50x+150. 2. b.$275.
2.
1. There appears to a hint of some positive correlation.

2.
3. There appears to be clear positive correlation.

## 10.2 The Linear Correlation Coefficient

### Basic

With the exception of the exercises at the end of Section 10.3, the first Basic exercise in each of the following sections through Section 10.7 uses the data from the first exercise here, the second Basic exercise uses the data from the second exercise here, and so on, and similarly for the Application exercises. Save your computations done on these exercises so that you do not need to repeat them later.

1. For the sample data

1. Draw the scatter plot.
2. Based on the scatter plot, predict the sign of the linear correlation coefficient. Explain your answer.
3. Compute the linear correlation coefficient and compare its sign to your answer to part (b).
2. For the sample data

1. Draw the scatter plot.
2. Based on the scatter plot, predict the sign of the linear correlation coefficient. Explain your answer.
3. Compute the linear correlation coefficient and compare its sign to your answer to part (b).
3. For the sample data

1. Draw the scatter plot.
2. Based on the scatter plot, predict the sign of the linear correlation coefficient. Explain your answer.
3. Compute the linear correlation coefficient and compare its sign to your answer to part (b).
4. For the sample data

1. Draw the scatter plot.
2. Based on the scatter plot, predict the sign of the linear correlation coefficient. Explain your answer.
3. Compute the linear correlation coefficient and compare its sign to your answer to part (b).
5. For the sample data

1. Draw the scatter plot.
2. Based on the scatter plot, predict the sign of the linear correlation coefficient. Explain your answer.
3. Compute the linear correlation coefficient and compare its sign to your answer to part (b).
6. For the sample data

1. Draw the scatter plot.
2. Based on the scatter plot, predict the sign of the linear correlation coefficient. Explain your answer.
3. Compute the linear correlation coefficient and compare its sign to your answer to part (b).
7. Compute the linear correlation coefficient for the sample data summarized by the following information:

8. Compute the linear correlation coefficient for the sample data summarized by the following information:

9. Compute the linear correlation coefficient for the sample data summarized by the following information:

10.

### Applications

1. The age $$x$$ in months and vocabulary $$y$$ were measured for six children, with the results shown in the table.

Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

2. The curb weight $$x$$ in hundreds of pounds and braking distance $$y$$ in feet, at 50 miles per hour on dry pavement, were measured for five vehicles, with the results shown in the table.

Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

3. The age $$x$$ and resting heart rate $$y$$ were measured for ten men, with the results shown in the table.

Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

4. The wind speed $$x$$ in miles per hour and wave height $$y$$ in feet were measured under various conditions on an enclosed deep water sea, with the results shown in the table,

Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

5. The advertising expenditure $$x$$ and sales $$y$$ in thousands of dollars for a small retail business in its first eight years in operation are shown in the table.

Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

6. The height $$x$$ at age 2 and $$y$$ at age 20, both in inches, for ten women are tabulated in the table.

Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

7. The course average $$x$$ just before a final exam and the score $$y$$ on the final exam were recorded for 15 randomly selected students in a large physics class, with the results shown in the table.

Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

8. The table shows the acres $$x$$ of corn planted and acres $$y$$ of corn harvested, in millions of acres, in a particular country in ten successive years.

Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

9. Fifty male subjects drank a measured amount $$x$$ (in ounces) of a medication and the concentration $$y$$ (in percent) in their blood of the active ingredient was measured 30 minutes later. The sample data are summarized by the following information.

Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

10. In an effort to produce a formula for estimating the age of large free-standing oak trees non-invasively, the girth $$x$$ (in inches) five feet off the ground of 15 such trees of known age $$y$$ (in years) was measured. The sample data are summarized by the following information.

Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

11. Construction standards specify the strength of concrete 28 days after it is poured. For 30 samples of various types of concrete the strength $$x$$ after 3 days and the strength $$y$$ after 28 days (both in hundreds of pounds per square inch) were measured. The sample data are summarized by the following information.

Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

12. Power-generating facilities used forecasts of temperature to forecast energy demand. The average temperature $$x$$ (degrees Fahrenheit) and the day’s energy demand $$y$$ (million watt-hours) were recorded on 40 randomly selected winter days in the region served by a power compan$$y$$. The sample data are summarized by the following information.

Compute the linear correlation coefficient for these sample data and interpret its meaning in the context of the problem.

1. In each case state whether you expect the two variables $$x$$ and $$y$$ indicated to have positive, negative, or zero correlation.

1. the number $$x$$ of pages in a book and the age $$y$$ of the author
2. the number $$x$$ of pages in a book and the age $$y$$ of the intended reader
3. the weight $$x$$ of an automobile and the fuel economy $$y$$ in miles per gallon
4. the weight $$x$$ of an automobile and the reading $$y$$ on its odometer
5. the amount $$x$$ of a sedative a person took an hour ago and the time $$y$$ it takes him to respond to a stimulus
2. In each case state whether you expect the two variables $$x$$ and $$y$$ indicated to have positive, negative, or zero correlation.

1. the length $$x$$ of time an emergency flare will burn and the length $$y$$ of time the match used to light it burned
2. the average length $$x$$ of time that calls to a retail call center are on hold one day and the number $$y$$ of calls received that day
3. the length $$x$$ of a regularly scheduled commercial flight between two cities and the headwind $$y$$ encountered by the aircraft
4. the value $$x$$ of a house and the its size $$y$$ in square feet
5. the average temperature $$x$$ on a winter day and the energy consumption $$y$$ of the furnace
3. Changing the units of measurement on two variables $$x$$ and $$y$$ should not change the linear correlation coefficient. Moreover, most change of units amount to simply multiplying one unit by the other (for example, 1 foot = 12 inches). Multiply each $$x$$ value in the table in Exercise 1 by two and compute the linear correlation coefficient for the new data set. Compare the new value of $$r$$ to the one for the original data.

4. Refer to the previous exercise. Multiply each $$x$$ value in the table in Exercise 2 by two, multiply each $$y$$ value by three, and compute the linear correlation coefficient for the new data set. Compare the new value of $$r$$ to the one for the original data.

5. Reversing the roles of $$x$$ and $$y$$ in the data set of Exercise 1 produces the data set

Compute the linear correlation coefficient of the new set of data and compare it to what you got in Exercise 1.

6. In the context of the previous problem, look at the formula for $$r$$ and see if you can tell why what you observed there must be true for every data set.

### Large Data Set Exercises

1. Large Data Set 1 lists the SAT scores and GPAs of 1,000 students. Compute the linear correlation coefficient r. Compare its value to your comments on the appearance and strength of any linear trend in the scatter diagram that you constructed in the first large data set problem for Section 10.1 "Linear Relationships Between Variables".

http://www.gone.2012books.lardbucket.org/sites/all/files/data1.xls

2. r. Compare its value to your comments on the appearance and strength of any linear trend in the scatter diagram that you constructed in the second large data set problem for Section 10.1 "Linear Relationships Between Variables".

http://www.gone.2012books.lardbucket.org/sites/all/files/data12.xls

3. Large Data Set 13 records the number of bidders and sales price of a particular type of antique grandfather clock at 60 auctions. Compute the linear correlation coefficient r. Compare its value to your comments on the appearance and strength of any linear trend in the scatter diagram that you constructed in the third large data set problem for Section 10.1 "Linear Relationships Between Variables".

http://www.gone.2012books.lardbucket.org/sites/all/files/data13.xls

1. r=0.921

2.
3.
4.
5.
6.
7. 0.875

8.
9. −0.846

10.
1. 0.948

2.
3. 0.709

4.
5. 0.832

6.
7. 0.751

8.
9. 0.965

10.
11. 0.992

12.
1. zero
2. positive
3. negative
4. zero
5. positive
1.
2. same value

3.
4. same value

5.
1.

## 10.3 Modelling Linear Relationships with Randomness Present

### Exercises

1. State the three assumptions that are the basis for the Simple Linear Regression Model.

2. The Simple Linear Regression Model is summarized by the equation

Identify the deterministic part and the random part.

a statistic or a population parameter? Explain.
1. Is the number σ in the Simple Linear Regression Model a statistic or a population parameter? Explain.

2. Describe what to look for in a scatter diagram in order to check that the assumptions of the Simple Linear Regression Model are true.

3. True or false: the assumptions of the Simple Linear Regression Model must hold exactly in order for the procedures and analysis developed in this chapter to be useful.

1. The mean of y is linearly related to x.
2. For each given x, y is a normal random variable with mean β1x+β0 and standard deviation σ.
3. All the observations of y in the sample are independent.
1.
2. is a population parameter.

3.
4. A linear trend.

## 10.4 The Least Squares Regression Line

### Basic

For the Basic and Application exercises in this section use the computations that were done for the exercises with the same number in Section 10.2 "The Linear Correlation Coefficient".

1. Compute the least squares regression line for the data in Exercise 1 of Section 10.2 "The Linear Correlation Coefficient".

2. Compute the least squares regression line for the data in Exercise 2 of Section 10.2 "The Linear Correlation Coefficient".

3. Compute the least squares regression line for the data in Exercise 3 of Section 10.2 "The Linear Correlation Coefficient".

4. Compute the least squares regression line for the data in Exercise 4 of Section 10.2 "The Linear Correlation Coefficient".

5. For the data in Exercise 5 of Section 10.2 "The Linear Correlation Coefficient"

1. Compute the least squares regression line.
2. Compute the sum of the squared errors SSE using the definition Σ(yyˆ)2.
3. Compute the sum of the squared errors SSE using the formula SSE=SSyyβˆ1SSxy.
6. For the data in Exercise 6 of Section 10.2 "The Linear Correlation Coefficient"

1. Compute the least squares regression line.
2. Compute the sum of the squared errors SSE using the definition Σ(yyˆ)2.
3. Compute the sum of the squared errors SSE using the formula SSE=SSyyβˆ1SSxy.
7. Compute the least squares regression line for the data in Exercise 7 of Section 10.2 "The Linear Correlation Coefficient".

8. Compute the least squares regression line for the data in Exercise 8 of Section 10.2 "The Linear Correlation Coefficient".

9. For the data in Exercise 9 of Section 10.2 "The Linear Correlation Coefficient"

1. Compute the least squares regression line.
2. Can you compute the sum of the squared errors SSE using the definition Σ(yyˆ)2? Explain.
3. Compute the sum of the squared errors SSE using the formula SSE=SSyyβˆ1SSxy.
10. For the data in Exercise 10 of Section 10.2 "The Linear Correlation Coefficient"

1. Compute the least squares regression line.
2. Can you compute the sum of the squared errors SSE using the definition Σ(yyˆ)2? Explain.
3. Compute the sum of the squared errors SSE using the formula SSE=SSyyβˆ1SSxy.

### Applications

1. For the data in Exercise 11 of Section 10.2 "The Linear Correlation Coefficient"

1. Compute the least squares regression line.
2. On average, how many new words does a child from 13 to 18 months old learn each month? Explain.
3. Estimate the average vocabulary of all 16-month-old children.
2. For the data in Exercise 12 of Section 10.2 "The Linear Correlation Coefficient"

1. Compute the least squares regression line.
2. On average, how many additional feet are added to the braking distance for each additional 100 pounds of weight? Explain.
3. Estimate the average braking distance of all cars weighing 3,000 pounds.
3. For the data in Exercise 13 of Section 10.2 "The Linear Correlation Coefficient"

1. Compute the least squares regression line.
2. Estimate the average resting heart rate of all 40-year-old men.
3. Estimate the average resting heart rate of all newborn baby boys. Comment on the validity of the estimate.
4. For the data in Exercise 14 of Section 10.2 "The Linear Correlation Coefficient"

1. Compute the least squares regression line.
2. Estimate the average wave height when the wind is blowing at 10 miles per hour.
3. Estimate the average wave height when there is no wind blowing. Comment on the validity of the estimate.
5. For the data in Exercise 15 of Section 10.2 "The Linear Correlation Coefficient"

1. Compute the least squares regression line.
2. On average, for each additional thousand dollars spent on advertising, how does revenue change? Explain.
3. $224,562 3. 1. yˆ=1.045x8.527, 2. 2151.93367, 3. 80.3 4. 1. yˆ=0.043x+0.001, 2. For each additional ounce of medication consumed blood concentration of the active ingredient increases by 0.043 %, 3. 0.044% 5. 1. yˆ=2.550x+1.993, 2. Predicted 28-day strength is 3,514 psi; sufficiently strong 6. 1. 2. 1. yˆ=0.0016x+0.022 2. On average, every 100 point increase in SAT score adds 0.16 point to the GPA. 3. SSE=432.10 4. yˆ=2.182 1. 1. yˆ=116.62x+6955.1 2. On average, every 1 additional bidder at an auction raises the price by 116.62 dollars. 3. SSE=1850314.08 4. 5. ## 10.5 Statistical Inferences About β1 Exercises Basic For the Basic and Application exercises in this section use the computations that were done for the exercises with the same number in Section 10.2 "The Linear Correlation Coefficient" and Section 10.4 "The Least Squares Regression Line". Construct the 95% confidence interval for the slope β1 of the population regression line based on the sample data set of Exercise 1 of Section 10.2 "The Linear Correlation Coefficient". Construct the 90% confidence interval for the slope β1 of the population regression line based on the sample data set of Exercise 2 of Section 10.2 "The Linear Correlation Coefficient". Construct the 90% confidence interval for the slope β1 of the population regression line based on the sample data set of Exercise 3 of Section 10.2 "The Linear Correlation Coefficient". Construct the 99% confidence interval for the slope β1 of the population regression Exercise 4 of Section 10.2 "The Linear Correlation Coefficient". For the data in Exercise 5 of Section 10.2 "The Linear Correlation Coefficient" test, at the 10% level of significance, whether x is useful for predicting y (that is, whether β1≠0). For the data in Exercise 6 of Section 10.2 "The Linear Correlation Coefficient" test, at the 5% level of significance, whether x is useful for predicting y (that is, whether β1≠0). Construct the 90% confidence interval for the slope β1 of the population regression line based on the sample data set of Exercise 7 of Section 10.2 "The Linear Correlation Coefficient". Construct the 95% confidence interval for the slope β1 of the population regression line based on the sample data set of Exercise 8 of Section 10.2 "The Linear Correlation Coefficient". For the data in Exercise 9 of Section 10.2 "The Linear Correlation Coefficient" test, at the 1% level of significance, whether x is useful for predicting y (that is, whether β1≠0). For the data in Exercise 10 of Section 10.2 "The Linear Correlation Coefficient" test, at the 1% level of significance, whether x is useful for predicting y (that is, whether β1≠0). Applications For the data in Exercise 11 of Section 10.2 "The Linear Correlation Coefficient" construct a 90% confidence interval for the mean number of new words acquired per month by children between 13 and 18 months of age. For the data in Exercise 12 of Section 10.2 "The Linear Correlation Coefficient" construct a 90% confidence interval for the mean increased braking distance for each additional 100 pounds of vehicle weight. For the data in Exercise 13 of Section 10.2 "The Linear Correlation Coefficient" test, at the 10% level of significance, whether age is useful for predicting resting heart rate. For the data in Exercise 14 of Section 10.2 "The Linear Correlation Coefficient" test, at the 10% level of significance, whether wind speed is useful for predicting wave height. For the situation described in Exercise 15 of Section 10.2 "The Linear Correlation Coefficient" Construct the 95% confidence interval for the mean increase in revenue per additional thousand dollars spent on advertising. An advertising agency tells the business owner that for every additional thousand dollars spent on advertising, revenue will increase by over$25,000. Test this claim (which is the alternative hypothesis) at the 5% level of significance.

Perform the test of part (b) at the 10% level of significance.

Based on the results in (b) and (c), how believable is the ad agency’s claim? (This is a subjective judgement.)

For the situation described in Exercise 16 of Section 10.2 "The Linear Correlation Coefficient"

Construct the 90% confidence interval for the mean increase in height per additional inch of length at age two.

It is claimed that for girls each additional inch of length at age two means more than an additional inch of height at maturity. Test this claim (which is the alternative hypothesis) at the 10% level of significance.

For the data in Exercise 17 of Section 10.2 "The Linear Correlation Coefficient" test, at the 10% level of significance, whether course average before the final exam is useful for predicting the final exam grade.

For the situation described in Exercise 18 of Section 10.2 "The Linear Correlation Coefficient", an agronomist claims that each additional million acres planted results in more than 750,000 additional acres harvested. Test this claim at the 1% level of significance.

For the data in Exercise 19 of Section 10.2 "The Linear Correlation Coefficient" test, at the 1/10th of 1% level of significance, whether, ignoring all other facts such as age and body mass, the amount of the medication consumed is a useful predictor of blood concentration of the active ingredient.

For the data in Exercise 20 of Section 10.2 "The Linear Correlation Coefficient" test, at the 1% level of significance, whether for each additional inch of girth the age of the tree increases by at least two and one-half years.

For the data in Exercise 21 of Section 10.2 "The Linear Correlation Coefficient"

Construct the 95% confidence interval for the mean increase in strength at 28 days for each additional hundred psi increase in strength at 3 days.

Test, at the 1/10th of 1% level of significance, whether the 3-day strength is useful for predicting 28-day strength.

For the situation described in Exercise 22 of Section 10.2 "The Linear Correlation Coefficient"

Construct the 99% confidence interval for the mean decrease in energy demand for each one-degree drop in temperature.

An engineer with the power company believes that for each one-degree increase in temperature, daily energy demand will decrease by more than 3.6 million watt-hours. Test this claim at the 1% level of significance.

Large Data Set Exercises

Large Data Set 1 lists the SAT scores and GPAs of 1,000 students.

http://www.gone.2012books.lardbucket.org/sites/all/files/data1.xls

Compute the 90% confidence interval for the slope β1 of the population regression line with SAT score as the independent variable (x) and GPA as the dependent variable (y).

Test, at the 10% level of significance, the hypothesis that the slope of the population regression line is greater than 0.001, against the null hypothesis that it is exactly 0.001.

Large Data Set 12 lists the golf scores on one round of golf for 75 golfers first using their own original clubs, then using clubs of a new, experimental design (after two months of familiarization with the new clubs).

http://www.gone.2012books.lardbucket.org/sites/all/files/data12.xls

Compute the 95% confidence interval for the slope β1 of the population regression line with scores using the original clubs as the independent variable (x) and scores using the new clubs as the dependent variable (y).

Test, at the 10% level of significance, the hypothesis that the slope of the population regression line is different from 1, against the null hypothesis that it is exactly 1.

Large Data Set 13 records the number of bidders and sales price of a particular type of antique grandfather clock at 60 auctions.

http://www.gone.2012books.lardbucket.org/sites/all/files/data13.xls

Compute the 95% confidence interval for the slope β1 of the population regression line with the number of bidders present at the auction as the independent variable (x) and sales price as the dependent variable (y).

Test, at the 10% level of significance, the hypothesis that the average sales price increases by more than $90 for each additional bidder at an auction, against the default that it increases by exactly$90.

0.743±0.578

−0.610±0.633

T=1.732, ±t0.05=±2.353, do not reject H0

0.6±0.451

T=−4.481, ±t0.005=±3.355, reject H0

4.8±1.7 words

T=2.843, ±t0.05=±1.860, reject H0

42.024±28.011 thousand dollars,

T=1.487, t0.05=1.943, do not reject H0;

t0.10=1.440, reject H0

T=4.096, ±t0.05=±1.771, reject H0

T=25.524, ±t0.0005=±3.505, reject H0

2.550±0.127 hundred psi,

T=41.072, ±t0.005=±3.674, reject H0

(0.0014,0.0018)

H0:β1=0.001 vs. Ha:β1>0.001. Test Statistic: Z=6.1625. Rejection Region: [1.28,+∞). Decision: Reject H0.

(101.789,131.4435)

H0:β1=90 vs. Ha:β1>90. Test Statistic: T=3.5938. d.f.=58. Rejection Region: [1.296,+∞). Decision: Reject H0.

## 10.7 Estimation and Prediction

Exercises

Basic

For the Basic and Application exercises in this section use the computations that were done for the exercises with the same number in Section 10.2 "The Linear Correlation Coefficient", Section 10.4 "The Least Squares Regression Line", and Section 10.5 "Statistical Inferences About ".

For the sample data set of Exercise 1 of Section 10.2 find the coefficient of determination using the formula r2=βˆ1SSxy/SSyy. Confirm your answer by squaring r as computed in that exercise.

For the sample data set of Exercise 2 of Section 10.2 find the coefficient of determination using the formula r2=βˆ1SSxy/SSyy. Confirm your answer by squaring r as computed in that exercise.

For the sample data set of Exercise 3 of Section 10.2 find the coefficient of determination using the formula r2=βˆ1SSxy/SSyy. Confirm your answer by squaring r as computed in that exercise.

For the sample data set of Exercise 4 of Section 10.2 find the coefficient of determination using the formula r2=βˆ1SSxy/SSyy. Confirm your answer by squaring r as computed in that exercise.

For the sample data set of Exercise 5 of Section 10.2 find the coefficient of determination using the formula r2=βˆ1SSxy/SSyy. Confirm your answer by squaring r as computed in that exercise.

For the sample data set of Exercise 6 of Section 10.2 find the coefficient of determination using the formula r2=βˆ1SSxy/SSyy. Confirm your answer by squaring r as computed in that exercise.

For the sample data set of Exercise 7 of Section 10.2 find the coefficient of determination using the formula r2=(SSyy−SSE)/SSyy. Confirm your answer by squaring r as computed in that exercise.

For the sample data set of Exercise 8 of Section 10.2 find the coefficient of determination using the formula r2=(SSyy−SSE)/SSyy. Confirm your answer by squaring r as computed in that exercise.

For the sample data set of Exercise 9 of Section 10.2 find the coefficient of determination using the formula r2=(SSyy−SSE)/SSyy. Confirm your answer by squaring r as computed in that exercise.

For the sample data set of Exercise 9 of Section 10.2 find the coefficient of determination using the formula r2=(SSyy−SSE)/SSyy. Confirm your answer by squaring r as computed in that exercise.

Applications

For the data in Exercise 11 of Section 10.2 compute the coefficient of determination and interpret its value in the context of age and vocabulary.

For the data in Exercise 12 of Section 10.2 compute the coefficient of determination and interpret its value in the context of vehicle weight and braking distance.

For the data in Exercise 13 of Section 10.2 compute the coefficient of determination and interpret its value in the context of age and resting heart rate. In the age range of the data, does age seem to be a very important factor with regard to heart rate?

For the data in Exercise 14 of Section 10.2 compute the coefficient of determination and interpret its value in the context of wind speed and wave height. Does wind speed seem to be a very important factor with regard to wave height?

For the data in Exercise 15 of Section 10.2 find the proportion of the variability in revenue that is explained by level of advertising.

For the data in Exercise 16 of Section 10.2 find the proportion of the variability in adult height that is explained by the variation in length at age two.

For the data in Exercise 17 of Section 10.2 compute the coefficient of determination and interpret its value in the context of course average before the final exam and score on the final exam.

For the data in Exercise 18 of Section 10.2 compute the coefficient of determination and interpret its value in the context of acres planted and acres harvested.

For the data in Exercise 19 of Section 10.2 compute the coefficient of determination and interpret its value in the context of the amount of the medication consumed and blood concentration of the active ingredient.

For the data in Exercise 20 of Section 10.2 compute the coefficient of determination and interpret its value in the context of tree size and age.

For the data in Exercise 21 of Section 10.2 find the proportion of the variability in 28-day strength of concrete that is accounted for by variation in 3-day strength.

For the data in Exercise 22 of Section 10.2 find the proportion of the variability in energy demand that is accounted for by variation in average temperature.

Large Data Set Exercises

Large Data Set 1 lists the SAT scores and GPAs of 1,000 students. Compute the coefficient of determination and interpret its value in the context of SAT scores and GPAs.

http://www.gone.2012books.lardbucket.org/sites/all/files/data1.xls

Large Data Set 12 lists the golf scores on one round of golf for 75 golfers first using their own original clubs, then using clubs of a new, experimental design (after two months of familiarization with the new clubs). Compute the coefficient of determination and interpret its value in the context of golf scores with the two kinds of golf clubs.

http://www.gone.2012books.lardbucket.org/sites/all/files/data12.xls

Large Data Set 13 records the number of bidders and sales price of a particular type of antique grandfather clock at 60 auctions. Compute the coefficient of determination and interpret its value in the context of the number of bidders at an auction and the price of this type of antique grandfather clock.

http://www.gone.2012books.lardbucket.org/sites/all/files/data13.xls

0.848

0.631

0.5

0.766

0.715

0.898; about 90% of the variability in vocabulary is explained by age

0.503; about 50% of the variability in heart rate is explained by age. Age is a significant but not dominant factor in explaining heart rate.

The proportion is r2 = 0.692.

0.563; about 56% of the variability in final exam scores is explained by course average before the final exam

0.931; about 93% of the variability in the blood concentration of the active ingredient is explained by the amount of the medication consumed

The proportion is r2 = 0.984.

r2=21.17%.

2=81.04%.

## 10.8 A Complete Example

The exercises in this section are unrelated to those in previous sections.

The data give the amount x of silicofluoride in the water (mg/L) and the amount y of lead in the bloodstream (μg/dL) of ten children in various communities with and without municipal water. Perform a complete analysis of the data, in analogy with the discussion in this section (that is, make a scatter plot, do preliminary computations, find the least squares regression line, find SSE, , and r, and so on). In the hypothesis test use as the alternative hypothesis β1>0, and test at the 5% level of significance. Use confidence level 95% for the confidence interval for β1. Construct 95% confidence and predictions intervals at xp=2 at the end.

xy0.00.30.00.11.14.71.43.21.65.1
xy1.77.02.05.02.06.12.28.62.29.5

The table gives the weight x (thousands of pounds) and available heat energy y (million BTU) of a standard cord of various species of wood typically used for heating. Perform a complete analysis of the data, in analogy with the discussion in this section (that is, make a scatter plot, do preliminary computations, find the least squares regression line, find SSE, , and r, and so on). In the hypothesis test use as the alternative hypothesis β1>0, and test at the 5% level of significance. Use confidence level 95% for the confidence interval for β1. Construct 95% confidence and predictions intervals at xp=5 at the end.

xy3.3723.63.5017.54.2920.14.0021.64.6428.1
xy4.9925.34.9427.05.4830.73.2618.94.1620.7

### Large Data Set Exercises

1. Large Data Sets 3 and 3A list the shoe sizes and heights of 174 customers entering a shoe store. The gender of the customer is not indicated in Large Data Set 3. However, men’s and women’s shoes are not measured on the same scale; for example, a size 8 shoe for men is not the same size as a size 8 shoe for women. Thus it would not be meaningful to apply regression analysis to Large Data Set 3. Nevertheless, compute the scatter diagrams, with shoe size as the independent variable (x) and height as the dependent variable (y), for (i) just the data on men, (ii) just the data on women, and (iii) the full mixed data set with both men and women. Does the third, invalid scatter diagram look markedly different from the other two?

http://www.gone.2012books.lardbucket.org/sites/all/files/data3.xls

http://www.gone.2012books.lardbucket.org/sites/all/files/data3A.xls

2. Separate out from Large Data Set 3A just the data on men and do a complete analysis, with shoe size as the independent variable (x) and height as the dependent variable (y). Use α=0.05 and xp=10 whenever appropriate.

http://www.gone.2012books.lardbucket.org/sites/all/files/data3A.xls

3. Separate out from Large Data Set 3A just the data on women and do a complete analysis, with shoe size as the independent variable (x) and height as the dependent variable (y). Use α=0.05 and xp=10 whenever appropriate.

http://www.gone.2012books.lardbucket.org/sites/all/files/data3A.xls

1. Σx=14.2, Σy=49.6, Σxy=91.73, Σx2=26.3, Σy2=333.86.

SSxx=6.136, SSxy=21.298, SSyy=87.844.

x−=1.42, y−=4.96.

βˆ1=3.47, βˆ0=0.03.

SSE=13.92.

sε=1.32.

r = 0.9174, r2 = 0.8416.

df=8, T = 6.518.

The 95% confidence interval for β1 is: (2.24,4.70).

At xp=2, the 95% confidence interval for E(y) is (5.77,8.17).

At xp=2, the 95% prediction interval for y is (3.73,10.21).

2.

The regression line: yˆ=3.3426x+138.7692. Coefficient of Correlation: r = 0.9431. Coefficient of Determination: r2 = 0.8894. SSE=283.2473. se=1.9305. A 95% confidence interval for β1: (3.0733,3.6120). Test Statistic for H0:β1=0: T = 24.7209. At xp=10, yˆ=172.1956; a 95% confidence interval for the mean value of y is: (171.5577,172.8335); and a 95% prediction interval for an individual value of y is:

(168.2974,176.0938).

1. The positively correlated trend seems less profound than that in each of the previous plots.

2.