# 9: End of chapter exercise solution

- Page ID
- 278

## Introduction to data

**1.1** (a) Treatment: 10/43 = 0.23 \(\rightarrow\) 23%. Control: 2/46 = 0:04 ! 4%. (b) There is a 19% difference between the pain reduction rates in the two groups. At first glance, it appears patients in the treatment group are more likely to experience pain reduction from the acupuncture treatment. (c) Answers may vary but should be sensible. Two possible answers: ^{1}Though the groups' difference is big, I'm skeptical the results show a real difference and think this might be due to chance. ^{2}The difference in these rates looks pretty big, so I suspect acupuncture is having a positive impact on pain.

**1.3** (a-i) 143,196 eligible study subjects born in Southern California between 1989 and 1993. (a-ii) Measurements of carbon monoxide, nitrogen dioxide, ozone, and particulate matter less than \(10_{\mu m}\) (PM10) collected at air-qualitymonitoring stations as well as length of gestation. These are continuous numerical variables. (a-iii) The research question: "Is there an association between air pollution exposure and preterm births?" (b-i) 600 adult patients aged 18-69 years diagnosed and currently treated for asthma. (b-ii) The variables were whether or not the patient practiced the Buteyko method (categorical) and measures of quality of life, activity, asthma symptoms and medication reduction of the patients (categorical, ordinal). It may also be reasonable to treat the ratings on a scale of 1 to 10 as discrete numerical variables. (b-iii) The research question: "Do asthmatic patients who practice the Buteyko method experience improvement in their condition?"

**1.5** (a) \(50 \times 3 = 150\). (b) Four continuous numerical variables: sepal length, sepal width, petal length, and petal width. (c) One categorical variable, species, with three levels: setosa, versicolor, and virginica. 1.7 (a) Population of interest: all births in Southern California. Sample: 143,196 births between 1989 and 1993 in Southern California. If births in this time span can be considered to be representative of all births, then the results are generalizable to the population of Southern California. However, since the study is observational, the ndings do not imply causal relationships. (b) Population: all 18-69 year olds diagnosed and currently treated for asthma. Sample: 600 adult patients aged 18-69 years diagnosed and currently treated for asthma. Since the sample consists of voluntary patients, the results cannot necessarily be generalized to the population at large. However, since the study is an experiment, the ndings can be used to establish causal relationships.

**1.9** (a) Explanatory: number of study hours per week. Response: GPA. (b) There is a slight positive relationship between the two variables. One respondent reported a GPA above 4.0, which is a data error. There are also a few respondents who reported unusually high study hours (60 and 70 hours/week). The variability in GPA also appears to be larger for students who study less than those who study more. Since the data become sparse as the number of study hours increases, it is somewhat difficult to evaluate the strength of the relationship and also the variability across different numbers of study hours. (c) Observational. (d) Since this is an observational study, a causal relationship is not implied.

**1.11 **(a) Observational. (b) The professor suspects students in a given section may have similar feelings about the course. To ensure each section is reasonably represented, she may choose to randomly select a xed number of students, say 10, from each section for a total sample size of 40 students. Since a random sample of fixed size was taken within each section in this scenario, this represents strati ed sampling.

**1.13 **Sampling from the phone book would miss unlisted phone numbers, so this would result in bias. People who do not have their numbers listed may share certain characteristics, e.g. consider that cell phones are not listed in phone books, so a sample from the phone book would not necessarily be a representative of the population.

**1.15** The estimate will be biased, and it will tend to overestimate the true family size. For example, suppose we had just two families: the first with 2 parents and 5 children, and the second with 2 parents and 1 child. Then if we draw one of the six children at random, 5 times out of 6 we would sample the larger family

**1.17 **(a) No, this is an observational study. (b) This statement is not justi ed; it implies a causal association between sleep disorders and bullying. However, this was an observational study. A better conclusion would be "School children identi ed as bullies are more likely to suffer from sleep disorders than non-bullies."

**1.19 **(a) Experiment, as the treatment was assigned to each patient. (b) Response: Duration of the cold. Explanatory: Treatment, with 4 levels: placebo, 1g, 3g, 3g with additives. (c) Patients were blinded. (d) Double-blind with respect to the researchers evaluating the patients, but the nurses who briey interacted with patients during the distribution of the medication were not blinded. We could say the study was partly double-blind. (e) No. The patients were randomly assigned to treatment groups and were blinded, so we would expect about an equal number of patients in each group to not adhere to the treatment.

**1.21 **(a) Experiment. (b) Treatment is exercise twice a week. Control is no exercise. (c) Yes, the blocking variable is age. (d) No. (e) This is an experiment, so a causal conclusion is reasonable. Since the sample is random, the conclusion can be generalized to the population at large. However, we must consider that a placebo effect is possible. (f) Yes. Randomly sampled people should not be required to participate in a clinical trial, and there are also ethical concerns about the plan to instruct one group not to participate in a healthy behavior, which in this case is exercise.

**1.23 **(a) Positive association: mammals with longer gestation periods tend to live longer as well. (b) Association would still be positive. (c) No, they are not independent. See part (a).

**1.25 **(a) 1/linear and 3/nonlinear. (b) 4/some curvature (nonlinearity) may be present on the right side. "Linear" would also be acceptable for the type of relationship for plot 4. (c) 2.

**1.27 **(a) Decrease: the new score is smaller than the mean of the 24 previous scores. (b) Calculate a weighted mean. Use a weight of 24 for the old mean and 1 for the new mean: \(\frac {(24 \times 74 + 1 \times 64)}{(24 + 1)} = 73.6\). There are other ways to solve this exercise that do not use a weighted mean. (c) The new score is more than 1 standard deviation away from the previous mean, so increase.

**1.29 **Both distributions are right skewed and bimodal with modes at 10 and 20 cigarettes; note that people may be rounding their answers to half a pack or a whole pack. The median of each distribution is between 10 and 15 cigarettes. The middle 50% of the data (the IQR) appears to be spread equally in each group and have a width of about 10 to 15. There are potential outliers above 40 cigarettes per day. It appears that more respondents who smoke only a few cigarettes (0 to 5) on the weekdays than on weekends.

**1.31 **(a) \(\bar {x}_{amtWeekends} = 20, \bar {x}_{amtWeekdays} = 16\). (b) \(s_{amtWeekends} = 0, s_{amtWeekdays} = 4.18\).

In this very small sample, higher on weekdays.

**1.33 **(a) Both distributions have the same median, 6, and the same IQR. (b) Same IQR, but second distribution has higher median. (c) Second distribution has higher median. IQRs are equal. (d) Second distribution has higher median and larger IQR.

**1.35**

**1.37 **Descriptions will vary a little. (a) 2. Unimodal, symmetric, centered at 60, standard deviation of roughly 3. (b) 3. Symmetric and approximately evenly distributed from 0 to 100. (c) 1. Right skewed, unimodal, centered at about 1.5, with most observations falling between 0 and 3. A very small fraction of observations exceed a value of 5.

**1.39** The histogram shows that the distribution is bimodal, which is not apparent in the box plot. The box plot makes it easy to identify more precise values of observations outside of the whiskers.

**1.41 **(a) The median is better; the mean is substantially affected by the two extreme observations. (b) The IQR is better; the standard deviation, like the mean, is substantially affected by the two high salaries.

**1.43 **The distribution is unimodal and symmetric with a mean of about 25 minutes and a standard deviation of about 5 minutes. There does not appear to be any counties with unusually high or low mean travel times. Since the distribution is already unimodal and symmetric, a log transformation is not necessary.

**1.45 **Answers will vary. There are pockets of longer travel time around DC, Southeastern NY, Chicago, Minneapolis, Los Angeles, and many other big cities. There is also a large section of shorter average commute times that overlap with farmland in the Midwest. Many farmers' homes are adjacent to their farmland, so their commute would be 0 minutes, which may explain why the average commute time for these counties is relatively low.

**1.47 **(a) We see the order of the categories and the relative frequencies in the bar plot. (b) There are no features that are apparent in the pie chart but not in the bar plot. (c) We usually prefer to use a bar plot as we can also see the relative frequencies of the categories in this graph.

**1.49 **The vertical locations at which the ideological groups break into the Yes, No, and Not Sure categories differ, which indicates the variables are dependent.

**1.51 **(a) False. Instead of comparing counts, we should compare percentages. (b) True. (c) False. We cannot infer a causal relationship from an association in an observational study. However, we can say the drug a person is on affects his risk in this case, as he chose that drug and his choice may be associated with other variables, which is why part (b) is true. The difference in these statements is subtle but important. (d) True.

**1.53 **(a) Proportion who had heart attack: \(\frac {7,979}{227,571} \approx 0.035\) (b) Expected number of cardiovascular problems in the rosiglitazone group if having cardiovascular problems and treatment were independent can be calculated as the number of patients in that group multiplied by the overall rate of cardiovascular problems in the study: \(67,593 \times \frac {7,979}{227,571} \approx 2370\). (c-i) H0: Independence model. The treatment and cardiovascular problems are independent. They have no relationship, and the difference in incidence rates between the rosiglitazone and pioglitazone groups is due to chance. HA: Alternate model. The treatment and cardiovascular problems are not independent. The difference in the incidence rates between the rosiglitazone and pioglitazone groups is not due to chance, and rosiglitazone is associated with an increased risk of serious cardiovascular problems. (c-ii) A higher number of patients with cardiovascular problems in the rosiglitazone group than expected under the assumption of independence would provide support for the alternative hypothesis. This would suggest that rosiglitazone increases the risk of such problems. (c-iii) In the actual study, we observed 2,593 cardiovascular events in the rosiglitazone group. In the 1,000 simulations under the independence model, we observed somewhat less than 2,593 in all but one or two simulations, which suggests that the actual results did not come from the independence model. That is, the analysis provides strong evidence that the variables are not independent, and we reject the independence model in favor of the alternative. The study's results provide strong evidence that rosiglitazone is associated with an increased risk of cardiovascular problems.

## Probability

**2.1** (a) False. These are independent trials. (b) False. There are red face cards. (c) True. A card cannot be both a face card and an ace.

**2.3** (a) 10 tosses. Fewer tosses mean more variability in the sample fraction of heads, meaning there's a better chance of getting at least 60% heads. (b) 100 tosses. More flips means the observed proportion of heads would often be closer to the average, 0.50, and therefore also above 0.40. (c) 100 tosses. With more flips, the observed proportion of heads would often be closer to the average, 0.50. (d) 10 tosses. Fewer ips would increase variability in the fraction of tosses that are heads.

**2.5** (a) \(0.5^{10} = 0.00098\). (b) \(0.5^{10} = 0.00098\). (c) P(at least one tails) = 1 - P(no tails) = \(1 - (0.5^{10}) \approx 1 - 0.001 = 0.999\).

**2.7** (a) No, there are voters who are both politically Independent and also swing voters. (b) Venn diagram below: (c) 24%. (d) Add up the corresponding disjoint sections in the Venn diagram: 0.24 + 0.11 + 0.12 = 0.47. Alternatively, use the General Addition Rule: 0.35 + 0.23 - 0.11 = 0.47. (e) 1 - 0.47 = 0.53. (f) \(P(Independent) \times P(swing) = 0.35 \times 0.23 = 0.08\), which does not equal P(Independent and swing) = 0.11, so the events are dependent. If you stated that this difference might be due to sampling variability in the survey, that answer would also be reasonable (we'll dive into this topic more in later chapters).

**2.9** (a) If the class is not graded on a curve, they are independent. If graded on a curve, then neither independent nor disjoint (unless the instructor will only give one A, which is a situation we will ignore in parts (b) and (c)). (b) They are probably not independent: if you study together, your study habits would be related, which suggests your course performances are also related. (c) No. See the answer to part (a) when the course is not graded on a curve. More generally: if two things are unrelated (independent), then one occurring does not preclude the other from occurring.

**2.11 **(a) \(0.16 + 0.09 = 0.25\). (b) \(0.17 + 0.09 = 0.26\). (c) Assuming that the education level of the husband and wife are independent: \(0.25 \times 0.26 = 0.065\). You might also notice we actually made a second assumption: that the decision to get married is unrelated to education level. (d) The husband/wife independence assumption is probably not reasonable, because people often marry another person with a comparable level of education. We will leave it to you to think about whether the second assumption noted in part (c) is reasoanble.

**2.13 **(a) Invalid. Sum is greater than 1. (b) Valid. Probabilities are between 0 and 1, and they sum to 1. In this class, every student gets a C. (c) Invalid. Sum is less than 1. (d) Invalid. There is a negative probability. (e) Valid. Probabilities are between 0 and 1, and they sum to 1. (f) Invalid. There is a negative probability.

**2.15 **(a) No, but we could if A and B are independent. (b-i) 0.21. (b-ii) 0.3+0.7-0.21 = 0.79. (b-iii) Same as P(A): 0.3. (c) No, because \(0.1 \ne 0.21\), where 0.21 was the value computed under independence from part (a). (d) P(A|B) = 0.1/0.7 = 0.143.

**2.17 **(a) 0.60 + 0.20 - 0.18 = 0.62. (b) 0.18/0.20 = 0.90. (c) \(0.11/0.33 \approx 0.33\). (d) No, otherwise the final answers of parts (b) and (c) would have been equal. (e) \(0.06/0.34\approx 0.18\).

**2.19 **(a) 162/248 = 0.65. (b) 181/252 = 0.72 (c) Under the assumption of a dating choices being independent of hamburger preference, which on the surface seems reasonable: \(0.65 \times 0.72 = 0.468\). (d) (252 + 6 - 1)/500 = 0.514

**2.21 **(a) The tree diagram:

(b) \(P(can construct|pass) = \frac {P(can construct and pass)}{P(pass)} = \frac {0.80 \times 0.86}{0.8 \times 0.86 + 0.2 \times 0.65} = \frac {0.688}{0.818} \approx 0.84\).

**2.23 **First draw a tree diagram:

Then compute the probability: \(P(HIV |+) = \frac {P(HIV and +)}{P(+)} = \frac {0.259 \times 0.997}{0.259 \times 0.997+0.741 \times 0.074} = \frac {0.2582}{0.3131} = 0.8247\).

**2.25 **A tree diagram of the situation:

\(P(lupus|positive) = \frac {P(lupus and positive)}{P(positive)} = \frac {0.0196}{0.0196+0.2548} = 0.0714\). Even when a patient tests positive for lupus, there is only a 7.14% chance that he actually has lupus. While House is not exactly right - it is possible that the patient has lupus - his implied skepticism is warranted.

**2.27** (a) 0.3. (b) 0.3. (c) 0.3. (d) \(0.3 \times 0.3 = 0.09\). (e) Yes, the population that is being sampled from is identical in each draw.

**2.29** (a) 2/9. (b) 3/9 = 1/3. (c) \((3/10) \times (2/9) \approx 0.067\). (d) No. In this small population of marbles, removing one marble meaningfully changes the probability of what might be drawn next.

**2.31 **For 1 leggings (L) and 2 jeans (J), there are three possible orderings: LJJ, JLJ, and JJL. The probability for LJJ is \( (5/24) \times (7/23) \times (6/22) = 0.0173\). The other two orderings have the same probability, and these three possible orderings are disjoint events. Final answer: 0.0519.

**2.33 **(a) 13. (b) No. The students are not a random sample.

**2.35 **(a) The table below summarizes the probability model:

Event | X | P(X) | X . P(X) | \({(X - E(X))}^2\) | \({(X - E(X))}^2 - P(X)\) |

3 hearts 3 blacks Else | 50 25 0 | \(\frac {13}{52} \times \frac {12}{51} \times \frac {11}{50} \)=0.0129 \(\frac {26}{52} \times \frac {25}{51} \times \frac {24}{50}\) = 0.1176 1 - (0.0129 + 0.1176) = 0.8695 | 0.65 2.94 0 | \({(0.65-3.59)}^2\) = 8.6436 \({(2.94-3.59)}^2\) = 0.4225 \({(0-3.59)}^2\) = 12.8881 | \(8.6436 \times 0.0129 \)= 0.1115 \(0.4225 \times 0.1176\) = 0.0497 \(12.8881 \times 0.8695\) = 11.2062 |

E(X) = $ 3.59 | V(X) = 11.3674 \(SD(X) = \sqrt {V(X)} \)= 3.37 |

(b) E(X-5) = E(X)-5 = 3.59-5 = -$1.41. The standard deviation is the same as the standard deviation of X: $3.37. (c) No. The expected earnings is negative, so on average you would lose money playing the game.

**2.37**

Event | X | P(X) | X . P(X) |

Boom Normal Recession | 0.18 0.09 -0.12 | 1/3 1/3 1/3 | \(0.18 \times 1/3 = 0.06\) \(0.09 \times 1/3 = 0.03\) \(-0.12 \times 1/3 = -0.04\) |

E(X) = 0.05 |

The expected return is a 5% increase in value for a single year.

**2.39 **(a) Expected: -$0.16. Variance: 8.95. SD: $2.99. (b) Expected: -$0.16. SD: $1.73. (c) Expected values are the same, but the SDs differ. The SD from the game with tripled winnings/losses is larger, since the three independent games might go in different directions (e.g.could win one game and lose two games). So the three independent games is lower risk, but in this context it just means we are likely to lose a more stable amount since the expected value is still negative.

**2.41 **A fair game has an expected value of zero: \($5 \times 0.46 + x \times 0.54 = 0\). Solving for x: -$4.26. You would bet $4.26 for the Padres to make the game fair.

**2.43 **(a) Expected: $3.90. SD: $0.34. (b) Expected: $27.30. SD: $0.89. If you computed part (b) using part (a), you should have ob-

tained an SD of $0.90.

**2.45 **Approximate answers are OK. Answers are only estimates based on the sample. (a) (29 + 32)/144 = 0.42. (b) 21/144 = 0.15. (c) (26 + 12 + 15)/144 = 0.37\).

## Distributions of random variables

**3.1** (a) 8.85%. (b) 6.94%. (c) 58.86%. (d) 4.56%.

**3.3** (a) Verbal: \(N(\mu = 462; \sigma = 119)\), Quant: \(N(\mu = 584; \sigma = 151)\). (b) \(Z_{V R} = 1.33, Z_{QR} = 0.57\).

(c) She scored 1.33 standard deviations above the mean on the Verbal Reasoning section and 0.57 standard deviations above the mean on the Quantitative Reasoning section. (d) She did better on the Verbal Reasoning section since her Z score on that section was higher. (e) \(Perc_{V R} = 0.9082 \approx 91%, Perc_{QR} = 0.7157 \approx 72%\). (f) 100% - 91% = 9% did better than her on VR, and 100% - 72% = 28% did better than her on QR. (g) We cannot compare the raw scores since they are on different scales. Comparing her percentile scores is more appropriate when comparing her performance to others. (h) Answer to part (b) would not change as Z scores can be calculated for distributions that are not normal. However, we could not answer parts (c)-(f) since we cannot use the normal probability table to calculate probabilities and percentiles without a normal model.

**3.5** (a) Z = 0.84, which corresponds to 711 on QR. (b) Z = -0.52, which corresponds to 400 on VR.

**3.7** (a) \(Z = 1.2 \approx 0.1151\). (b) \(Z = -1.28 \approx \)70.6^{0}F or colder.

**3.9** (a) N(25; 2.78). (b) \(Z = 1.08 \approx 0.1401\). (c) The answers are very close because only the units were changed. (The only reason why they are a little different is because 28^{0}C is 82.4^{0}F, not precisely 83^{0}F.)

**3.11 **(a) Z = 0.67. (b) \(\mu\) = $1650, x = $1800. (c) \(0.67 = \frac {1800-1650}{\sigma} = $223.88\).

**3.13 **\(Z = 1.56 \approx 0.0594\), i.e. 6%.

**3.15 **(a) \(Z = 0.73 \approx 0.2327\). (b) If you are bidding on only one auction and set a low maximum bid price, someone will probably outbid you. If you set a high maximum bid price, you may win the auction but pay more than is necessary. If bidding on more than one auction, and you set your maximum bid price very low, you probably won't win any of the auctions. However, if the maximum bid price is even modestly high, you are likely to win multiple auctions. (c) An answer roughly equal to the 10th percentile would be reasonable. Regrettably, no percentile cutoff point guarantees beyond any possible event that you win at least one auction. However, you may pick a higher percentile if you want to be more sure of winning an auction. (d) Answers will vary a little but should correspond to the answer in part (c). We use the 10th percentile: \(Z = -1:28 \approx $69.80\).

**3.17 **14/20 = 70% are within 1 SD. Within 2 SD: 19/20 = 95%. Within 3 SD: 20/20 = 100%. They follow this rule closely.

**3.19 **The distribution is unimodal and symmetric. The superimposed normal curve approximates the distribution pretty well. The points on the normal probability plot also follow a relatively straight line. There is one slightly distant observation on the lower end, but it is not extreme. The data appear to be reasonably approximated by the normal distribution.

**3.21 **(a) No. The cards are not independent. For example, if the first card is an ace of clubs, that implies the second card cannot be an ace of clubs. Additionally, there are many possible categories, which would need to be simplified. (b) No. There are six events under consideration. The Bernoulli distribution allows for only two events or categories. Note that rolling a die could be a Bernoulli trial if we simply to two events, e.g. rolling a 6 and not rolling a 6, though specifying such details would be necessary.

**3.23 **(a) \({(1 - 0.471)}^2 \times 0.471 = 0.1318\). (b) \(0.471^3 = 0.1045\). (c) \(\mu = 1/0.471 = 2.12, \sigma = 2.38\). (d) \(\mu = 1/0.30 = 3.33, \sigma = 2.79\). (e) When p is smaller, the event is rarer, meaning the expected number of trials before a success and the standard deviation of the waiting time are higher.

**3.25 **(a) \(0.875^2 \times 0.125 = 0.096\). (b) \(\mu = 8, \sigma = 7.48\).

**3.27 **(a) Yes. The conditions are satisfied: independence, xed number of trials, either success or failure for each trial, and probability of success being constant across trials. (b) 0.200. (c) 0.200. (d) \(0.0024+0.0284+0.1323 = 0.1631\). (e) 1 - 0.0024 = 0.9976.

**3.29 **(a) \(\mu = 35, \sigma = 3.24\). (b) Yes. Z = 3.09. Since 45 is more than 2 standard deviations from the mean, it would be considered unusual. Note that the normal model is not required to apply this rule of thumb. (c) Using a normal model: 0.0010. This does indeed appear to be an unusual observation. If using a normal model with a 0.5 correction, the probability would be calculated as 0.0017.

**3.31 **Want to find the probabiliy that there will be more than 1,786 enrollees. Using the normal model: 0.0537. With a 0.5 correction: 0.0559.

**3.33 **(a) 1 - 0.753 = 0.5781. (b) 0.1406. (c) 0.4219. (d) 1 - 0.253 = 0.9844.

**3.35 **(a) Geometric distribution: 0.109. (b) Binomial: 0.219. (c) Binomial: 0.137. (d) 1 - 0.8756 = 0.551. (e) Geometric: 0.084. (f) Using a binomial distribution with n = 6 and p = 0.125, we see that \(\mu = 4, \sigma = 1.06\), and Z = -1.89. Since this is within 2 SD, it may not be considered unusual, though this is a borderline case, so we might say the observations is somewhat unusual.

**3.37 **0 wins (-$3): 0.1458. 1 win (-$1): 0.3936. 2 wins (+$1): 0.3543. 3 wins (+$3): 0.1063.

**3.39 **(a) \(\overset {Anna}{1/5} \times \overset {Ben}{1/4} \times \overset {Carl}{1/3} \times \overset {Damian}{1/2} \times \overset {Eddy}{1/1} = 1/5! = 1/120\). (b) Since the probabilities must add to 1, there must be 5! = 120 possible orderings. (c) 8! = 40,320.

**3.41 **(a) Geometric: \((5/6)^4 \times (1/6) = 0.0804\). Note that the geometric distribution is just a special case of the negative binomial distribution when there is a single success on the last trial. (b) Binomial: 0.0322. (c) Negative binomial: 0.0193.

**3.43 **(a) Negative binomial with n = 4 and p = 0.55, where a success is defined here as a female student. The negative binomial setting is appropriate since the last trial is fixed but the order of the rst 3 trials is unknown. (b) 0.1838. (c)\(\binom {3}{1} = 3\). (d) In the binomial model there are no restrictions on the outcome of the last trial. In the negative binomial model the last trial is fixed. Therefore we are interested in the number of ways of orderings of the other k -1 successes in the first n - 1 trials.

**3.45 **(a) Poisson with \(\lambda = 75\). (b) \(\mu = \lambda = 75, \sigma = \sqrt {\lambda} = 8.66\). (c) Z = -1.73. Since 60 is within 2 standard deviations of the mean, it would not generally be considered unusual. Note that we often use this rule of thumb even when the normal model does not apply.

**3.47 **Using Poisson with \(\lambda = 75: 0.0402\).

## Foundations for inference

**4.1** (a) Mean. Each student reports a numerical value: a number of hours. (b) Mean. Each student reports a number, which is a percentage, and we can average over these percentages. (c) Proportion. Each student reports Yes or No, so this is a categorical variable and we use a proportion. (d) Mean. Each student reports a number, which is a percentage like in part (b). (e) Proportion. Each student reports whether or not he got a job, so this is a categorical variable and we use a proportion.

**4.3** (a) Mean: 13.65. Median: 14. (b) SD: 1.91. IQR: 15 - 13 = 2. (c) Z16 = 1.23, which is not unusual since it is within 2 SD of the mean. Z18 = 2:23, which is generally considered unusual. (d) No. Point estimates that are based on samples only approximate the population parameter, and they vary from one sample to another. (e) We use the SE, which is \(1.91/\sqrt {100} = 0.191\) for this sample's mean.

**4.5** (a) SE = 2.89. (b) Z = 1.73, which indicates that the two values are not unusually distant from each other when accounting for the uncertainty in John's point estimate.

**4.7** (a) We are 95% confident that US residents spend an average of 3.53 to 3.83 hours per day relaxing or pursuing activities they enjoy after an average work day. (b) 95% of such random samples will yield a 95% CI that contains the true average hours per day that US residents spend relaxing or pursuing activities they enjoy after an average work day. (c) They can be a little less confident in capturing the parameter, so the interval will be a little slimmer.

**4.9** A common way to decrease the width of the interval without losing con dence is to increase the sample size. It may also be possible to use a more advanced sampling method, such as strati ed sampling, though the required analysis is beyond the scope of this course, and such a sampling method may be difficult in this context.

**4.11 **(a) False. Provided the data distribution is not very strongly skewed (n = 64 in this sample, so we can be slightly lenient with the skew), the sample mean will be nearly normal, allowing for the method normal approximation described. (b) False. Inference is made on the population parameter, not the point estimate. The point estimate is always in the confidence interval. (c) True. (d) False. The confidence interval is not about a sample mean. (e) False. To be more con dent that we capture the parameter, we need a wider interval. Think about needing a bigger net to be more sure of catching a sh in a murky lake. (f) True. Optional explanation: This is true since the normal model was used to model the sample mean. The margin of error is half the width of the interval, and the sample mean is the midpoint of the interval. (g) False. In the calculation of the standard error, we divide the standard deviation by the square root of the sample size. To cut the SE (or margin of error) in half, we would need to sample 22 = 4 times the number of people in the initial sample.

**4.13 **Independence: sample from < 10% of population. We must assume it is a simple random sample to move forward; in practice, we would investigate whether this is the case, but here we will just report that we are making this assumption. Notice that there are no students who have had no exclusive relationships in the sample, which suggests some student responses are likely missing (perhaps only positive values were reported). The sample size is at least 30. The skew is strong, but the sample is very large so this is not a concern. 90% CI: (2.97, 3.43). We are 90% con dent that the average number of exclusive relationships that Duke students have been in is between 2.97 and 3.43.

**4.15 **(a) H_{0} : \(\mu\) = 8 (On average, New Yorkers sleep 8 hours a night.) H_{A} : \(\mu\) < 8 (On average, New Yorkers sleep less than 8 hours a night.) (b) H_{0} : \(\mu\) = 15 (The average amount of company time each employee spends not working is 15 minutes for March Madness.) H_{A} : \(\mu\) > 15 (The average amount of company time each employee spends not working is greater than 15 minutes for March Madness.)

**4.17 **First, the hypotheses should be about the population mean (\(\mu\)) not the sample mean. Second, the null hypothesis should have an equal sign and the alternative hypothesis should be about the null hypothesized value, not the observed sample mean. The correct way to set up these hypotheses is shown below:

\[H_0 : \mu = \text {2 hours}\]

\[H_A : \mu > \text {2 hours}\]

The one-sided test indicates that we our only interested in showing that 2 is an underestimate. Here the interest is in only one direction, so a one-sided test seems most appropriate. If we would also be interested if the data showed strong evidence that 2 was an overestimate, then the test should be two-sided.

**4.19 **(a) This claim does not seem plausible since 3 hours (180 minutes) is not in the interval. (b) 2.2 hours (132 minutes) is in the 95% confidence interval, so we do not have evidence to say she is wrong. However, it would be more appropriate to use the point estimate of the sample. (c) A 99% con dence interval will be wider than a 95% con dence interval, meaning it would enclose this smaller interval. This means 132 minutes would be in the wider interval, and we would not reject her claim based on a 99% confidence level.

**4.21 **Independence: The sample is presumably a simple random sample, though we should verify that is the case. Generally, this is what is meant by "random sample", though it is a good idea to actually check. For all following questions and solutions, it may be assumed that "random sample" actually means "simple random sample". 75 ball bearings is smaller than 10% of the population of ball bearings. The sample size is at least 30. The data are only slightly skewed. Under the assumption that the random sample is a simple random sample, \(\bar {x}\) will be normally distributed. \(H_0 : \mu\) = 7 hours. \(H_A : \mu \ne\) 7 hours. \(Z = -1.04 \rightarrow\) p-value = \(2 \times 0.1492 = 0.2984\). Since the p-value is greater than 0.05, we fail to reject H_{0}. The data do not provide convincing evidence that the average lifespan of all ball bearings produced by this machine is different than 7 hours. (Comment on using a one-sided alternative: the worker may be interested in learning if the ball bearings underperforms or over-performs the manufacturer's claim, which is why we suggest a two-sided test.)

**4.23 **(a) Independence: The sample is random and 64 patients would almost certainly make up less than 10% of the ER residents. The sample size is at least 30. No information is provided about the skew. In practice, we would ask to see the data to check this condition, but here we will make the assumption that the skew is not very strong. (b) \(H_0 : \mu = 127. H_A : \mu \ne 127\). \(Z = 2.15 \approx\) p-value = \(2 \times 0.0158 = 0.0316\). Since the p-value is less than \(\alpha = 0.05\), we reject H_{0}. The data provide convincing evidence that the the average ER wait time has increased over the last year. (c) Yes, it would change. The p-value is greater than 0.01, meaning we would fail to reject H_{0} at = 0.01.

**4.25 **\(H_0 : \mu = 130. H_A : \mu \ne 130\). Z = 1.39 \(\approx\) p-value = \(2 \times 0.0823 = 0.1646\), which is larger than \(\alpha = 0.05\). The data do not provide convincing evidence that the true average calorie content in bags of potato chips is different than 130 calories.

**4.27 **(a) H_{0}: Anti-depressants do not help symptoms of Fibromyalgia. H_{A}: Antidepressants do treat symptoms of Fibromyalgia. Remark: Diana might also have taken special note if her symptoms got much worse, so a more scienti c approach would have been to use a two-sided test. While parts (b)-(d) use the onesided version, your answers will be a little different if you used a two-sided test. (b) Concluding that anti-depressants work for the treatment of Fibromyalgia symptoms when they actually do not. (c) Concluding that anti-depressants do not work for the treatment of Fibromyalgia symptoms when they actually do. (d) If she makes a Type 1 error, she will continue taking medication that does not actually treat her disorder. If she makes a Type 2 error, she will stop taking medication that could treat her disorder.

**4.29 **(a) If the null hypothesis is rejected in error, then the regulators concluded that the adverse effect was higher in those taking the drug than those who did not take the drug when in reality the rates are the same for the two groups. (b) If the null hypothesis is not rejected but should have been, then the regulators failed to identify that the adverse effect was higher in those taking the drug. (c) Answers may vary a little. If all 403 drugs are actually okay, then about \(403 \times 0.05 \approx 20\) drugs will have a Type 1 error. Of the 42 suspect drugs, we would expect about 20/42 would represent an error while about \(22/42 \approx 52%\) would actually be drugs with adverse effects. (d) There is not enough information to tell.

**4.31 **(a) Independence: The sample is random. In practice, we should ask whether 70 customers is less than 10% of the population (we'll assume this is the case for this exercise). The sample size is at least 30. No information is provided about the skew, so this is another item we would typically ask about. For now, we'll assume the skew is not very strong. (b) \(H_0 : \mu = 18. H_A : \mu > 18\). \(Z = 3.46 \approx\) p-value = 0.0003, which is less than \(\alpha = 0.05\), so we reject H_{0}. There is strong evidence that the average revenue per customer is greater than $18. (c) (18.65, 19.85). (d) Yes. The hypothesis test reject the notion that \(\mu = 18\), and this value is not in the confidence interval. (e) Even though the increase in average revenue per customer appears to be significant, the restaurant owner may want to consider other criteria, such as total profits. With a longer happy hour, the revenue over the entire evening may actually drop since lower prices are offered for a longer time. Also, costs usually rise when prices are lowered. A better measure to consider may be an increase in total profits for the entire evening.

**4.33 **(a) The distribution is unimodal and strongly right skewed with a median between 5 and 10 years old. Ages range from 0 to slightly over 50 years old, and the middle 50% of the distribution is roughly between 5 and 15 years old. There are potential outliers on the higher end. (b) When the sample size is small, the sampling distribution is right skewed, just like the population distribution. As the sample size increases, the sampling distribution gets more unimodal, symmetric, and approaches normality. The variability also decreases. This is consistent with the Central Limit Theorem.

**4.35 **The centers are the same in each plot, and each data set is from a nearly normal distribution (see Section 4.2.6), though the histograms may not look very normal since each represents only 100 data points. The only way to tell which plot corresponds to which scenario is to examine the variability of each distribution. Plot B is the most variable, followed by Plot A, then Plot C. This means Plot B will correspond to the original data, Plot A to the sample means with size 5, and Plot C to the sample means with size 25.

**4.37 **(a) Right skewed. There is a long tail on the higher end of the distribution but a much shorter tail on the lower end. (b) Less than, as the median would be less than the mean in a right skewed distribution. (c) We should not. (d) Even though the population distribution is not normal, the conditions for inference are reasonably satis ed, with the possible exception of skew. If the skew isn't very strong (we should ask to see the data), then we can use the Central Limit Theorem to estimate this probability. For now, we'll assume the skew isn't very strong, though the description suggests it is at least moderate to strong. Use N(1.3; \(SE_{\bar {x}} = 0.3/\sqrt {60}\)): Z = 2.58 \(\rightarrow\) 0.0049. (e) It would decrease it by a factor of \(1/\sqrt {2}\).

**4.39 **(a) \(Z = -3.33 \rightarrow 0.0004\). (b) The population SD is known and the data are nearly normal, so the sample mean will be nearly normal with distribution \(N(\mu, \sigma / \sqrt {n}\), i.e. N(2.5; 0.0055). (c) \(Z = -10.54 \rightarrow \approx 0\). (d) See below:

(e) We could not estimate (a) without a nearly normal population distribution. We also could not estimate (c) since the sample size is not sufficient to yield a nearly normal sampling distribution if the population distribution is not nearly normal.

**4.41 **(a) We cannot use the normal model for this calculation, but we can use the histogram. About 500 songs are shown to be longer than 5 minutes, so the probability is about \(500/3000 = 0.167\). (b) Two different answers are reasonable. Option 1Since the population distribution is only slightly skewed to the right, even a small sample size will yield a nearly normal sampling distribution. We also know that the songs are sampled randomly and the sample size is less than 10% of the population, so the length of one song in the sample is independent of another. We are looking for the probability that the total length of 15 songs is more than 60 minutes, which means that the average song should last at least 60/15 = 4 minutes. Using \(SE = 1.62/ \sqrt {15}\), \(Z = 1.31 \rightarrow 0.0951\). Option 2Since the population distribution is not normal, a small sample size may not be sufficient to yield a nearly normal sampling distribution. Therefore, we cannot estimate the probability using the tools we have learned so far. (c) We can now be confident that the conditions are satis ed. \(Z = 0.92 \rightarrow 0.1788\).

**4.43 **(a) \(H_0 : \mu _{2009} = \mu _{2004}\). \(H_A : \mu _{2009} \ne \mu _{2004}\). (b) \(\bar {x}_{2009} - \bar {x}_{2004} = -3.6\) spam emails per day. (c) The null hypothesis was not rejected, and the data do not provide convincing evidence that the true average number of spam emails per day in years 2004 and 2009 are different. The observed difference is about what we might expect from sampling variability alone. (d) Yes, since the hypothesis of no difference was not rejected in part (c).

**4.45 **(a) \(H_0 : p_{2009} = p_{2004}\). \(H_A : p_{2009} \ne p_{2004}\). (b) -7%. (c) The null hypothesis was rejected. The data provide strong evidence that the true proportion of those who once a month or less frequently delete their spam email was higher in 2004 than in 2009. The difference is so large that it cannot easily be explained as being due to chance. (d) No, since the null difference, 0, was rejected in part (c).

**4.47 **(a) Scenario I is higher. Recall that a sample mean based on less data tends to be less accurate and have larger standard errors. (b) Scenario I is higher. The higher the confidence level, the higher the corresponding margin of error. (c) They are equal. The sample size does not affect the calculation of the p-value for a given Z score. (d) Scenario I is higher. If the null hypothesis is harder to reject (lower), then we are more likely to make a Type 2 error.

**4.49 \(**10 \ge 2.58 \times \frac {102}{\sqrt {n}} \rightarrow n \ge 692.5319\). He should survey at least 693 customers.

**4.51 **(a) The null hypothesis would be that the mean this year is also 128 minutes. The alternative hypothesis would be that the mean is different from 128 minutes. (b) First calculate the SE: \(\frac {39}{\sqrt {64}} = 4.875\). Next, identify the Z scores that would result in rejecting H_{0}: \(Z_{lower}\) = -1.96, \(Z_{upper}\) = 1.96. In each case, calculate the corresponding sample mean cutoff: \(\bar {x}_{lower}\) = 118.445 and \(\bar {x}_{upper}\) = 137.555\). (c) Construct Z scores for the values from part (b) but using the supposed true distribution (i.e. \(\mu\) = 135), i.e. not using the null value (\(\mu\) = 128). The probability of correctly rejecting the null hypothesis would be 0.0003+0.3015 = 0.3018 using these two cutoffs, and the probability of a Type 2 error would then be 1 - 0.3018 = 0.6982.

## Inference for numerical data

**5.1** (a) For each observation in one data set, there is exactly one specially-corresponding observation in the other data set for the same geographic location. The data are paired. (b) H_{0} : \(\mu_{diff} = 0\) (There is no difference in average daily high temperature between January 1, 1968 and January 1, 2008 in the continental US.) \(H_A : \mu_{diff} > 0\) (Average daily high temperature in January 1, 1968 was lower than average daily high temperature in January, 2008 in the continental US.) If you chose a two-sided test, that would also be acceptable. If this is the case, note that your p-value will be a little bigger than what is reported here in part (d). (c) Independence: locations are random and represent less than 10% of all possible locations in the US. The sample size is at least 30. We are not given the distribution to check the skew. In practice, we would ask to see the data to check this condition, but here we will move forward under the assumption that it is not strongly skewed. (d) \(Z = 1.60 \rightarrow\) p-value = 0.0548. (e) Since the p-value > \(\alpha\) (since not given use 0.05), fail to reject H_{0}. The data do not provide strong evidence of temperature warming in the continental US. However it should be noted that the p-value is very close to 0.05. (f) Type 2, since we may have incorrectly failed to reject H_{0}. There may be an increase, but we were unable to defftect it. (g) Yes, since we failed to reject H_{0}, which had a null value of 0.

**5.3** (a) (-0.03, 2.23). (b) We are 90% confident that the average daily high on January 1, 2008 in the continental US was 0.13 degrees lower to 2.13 degrees higher than the average daily high on January 1, 1968. (c) No, since 0 is included in the interval.

**5.5** (a) Each of the 36 mothers is related to exactly one of the 36 fathers (and vice-versa), so there is a special correspondence between the mothers and fathers. (b) \(H_0 : \mu _{diff}\) = 0. \(H_A : \mu _{diff} \ne 0\). Independence: random sample from less than 10% of population. Sample size of at least 30. The skew of the differences is, at worst, slight. \(Z = 2.72 \rightarrow\) p-value = 0.0066. Since p-value < 0.05, reject H_{0}. The data provide strong evidence that the average IQ scores of mothers and fathers of gifted children are different, and the data indicate that mothers' scores are higher than fathers' scores for the parents of gifted children.

**5.7** Independence: Random samples that are less than 10% of the population. Both samples are at least of size 30. In practice, we'd ask for the data to check the skew (which is not provided), but here we will move forward under the assumption that the skew is not extreme (there is some leeway in the skew for such large samples). Use z* = 1.65. 90% CI: (0.16, 5.84). We are 90% con dent that the average score in 2008 was 0.16 to 5.84 points higher than the average score in 2004.

**5.9** (a) \(H_0 : \mu _{2008} = \mu _{2004} \rightarrow \mu _{2004} - \mu _{2008} = 0\) (Average math score in 2008 is equal to average math score in 2004.) \(H_A : \mu _{2008} \ne \mu _{2004} \rightarrow \mu _{2004} - \mu _{2008} \ne 0\) (Average math score in 2008 is different than average math score in 2004.) Conditions necessary for inference were checked in Exercise 5.7. Z = -1.74 \(\rightarrow\) p-value = 0.0818. Since the p-value < \(\alpha\), reject H_{0}. The data provide strong evidence that the average math score for 13 year old students has changed between 2004 and 2008. (b) Yes, a Type 1 error is possible. We rejected H_{0}, but it is possible H_{0} is actually true. (c) No, since we rejected H_{0} in part (a).

**5.11 **(a) We are 95% confident that those on the Paleo diet lose 0.891 pounds less to 4.891 pounds more than those in the control group. (b) No. The value representing no difference between the diets, 0, is included in the confidence interval. (c) The change would have shifted the con dence interval by 1 pound, yielding CI = (0.109; 5.891), which does not include 0. Had we observed this result, we would have rejected H_{0}.

**5.13 **Independence and sample size conditions are satis ed. Almost any degree of skew is reasonable with such large samples. Compute the joint SE: \(\sqrt {SE^2_M + SE^2_W} = 0.114\). The 95% CI: (-11.32, -10.88). We are 95% confident that the average body fat percentage in men is 11.32% to 10.88% lower than the average body fat percentage in women.

**5.15 **(a) df = 6 - 1 = 5, \(t^*_5\) = 2.02 (column with two tails of 0.10, row with df = 5). (b) df = 21 - 1 = 5, \(t^*_20 = 2.53\) (column with two tails of 0.02, row with df = 20). (c) df = 28, \(t^*_28 = 2.05\). (d) df = 11, \(t^*_11 = 3.11\).

**5.17 **The mean is the midpoint: \(\bar {x} = 20\). Identify the margin of error: ME = 1.015, then use \(t^*_{35} = 2.03\) and \(SE = s/\sqrt {n}\) in the formula for margin of error to identify s = 3.

**5.19 **(a) \(H_0: \mu = 8\) (New Yorkers sleep 8 hrs per night on average.) \(H_A: \mu < 8\) (New Yorkers sleep less than 8 hrs per night on average.) (b) Independence: The sample is random and from less than 10% of New Yorkers. The sample is small, so we will use a t distribution. For this size sample, slight skew is acceptable, and the min/max suggest there is not much skew in the data. T = -1.75. df = 25-1 = 24. (c) 0.025 < p-value < 0.05. If in fact the true population mean of the amount New Yorkers sleep per night was 8 hours, the probability of getting a random sample of 25 New Yorkers where the average amount of sleep is 7.73 hrs per night or less is between 0.025 and 0.05. (d) Since p-value < 0.05, reject H_{0}. The data provide strong evidence that New Yorkers sleep less than 8 hours per night on average. (e) No, as we rejected H_{0}.

**5.21 **\(t^*_{19}\) is 1.73 for a one-tail. We want the lower tail, so set -1.73 equal to the T score, then solve for \(\bar {x}: 56.91\).

**5.23 **No, he should not move forward with the test since the distributions of total personal income are very strongly skewed. When sample sizes are large, we can be a bit lenient with skew. However, such strong skew observed in this exercise would require somewhat large sample sizes, somewhat higher than 30.

**5.25 **(a) These data are paired. For example, the Friday the 13th in say, September 1991, would probably be more similar to the Friday the 6th in September 1991 than to Friday the 6th in another month or year. (b) Let \(\mu _{diff} = \mu _{sixth} - \mu _{thirteenth}\). \(H_0 : \mu _{diff} = 0\). \(H_A : \mu _{diff} \ne 0\). (c) Independence: The months selected are not random. However, if we think these dates are roughly equivalent to a simple random sample of all such Friday 6th/13th date pairs, then independence is reasonable. To proceed, we must make this strong assumption, though we should note this assumption in any reported results. With fewer than 10 observations, we would need to use the t distribution to model the sample mean. The normal probability plot of the differences shows an approximately straight line. There isn't a clear reason why this distribution would be skewed, and since the normal quantile plot looks reasonable, we can mark this condition as reasonably satis ed. (d) T = 4.94 for df = 10 - 1 = 9 \(\rightarrow \) p-value < 0.01. (e) Since p-value < 0.05, reject H_{0}. The data provide strong evidence that the average number of cars at the intersection is higher on Friday the 6th than on Friday the 13th. (We might believe this intersection is representative of all roads, i.e. there is higher traffic on Friday the 6th relative to Friday the 13th. However, we should be cautious of the required assumption for such a generalization.) (f) If the average number of cars passing the intersection actually was the same on Friday the 6th and 13th, then the probability that we would observe a test statistic so far from zero is less than 0.01. (g) We might have made a Type 1 error, i.e. incorrectly rejected the null hypothesis.

**5.27 **(a) \(H_0 : \mu _{diff} = 0\). \(H_A : \mu _{diff} \ne 0\). T = -2.71. df = 5. 0:02 < p-value < 0:05. Since p-value < 0.05, reject H_{0}. The data provide strong evidence that the average number of traffic accident related emergency room admissions are different between Friday the 6th and Friday the 13th. Furthermore, the data indicate that the direction of that difference is that accidents are lower on Friday the 6th relative to Friday the 13th. (b) (-6.49, -0.17). (c) This is an observational study, not an experiment, so we cannot so easily infer a causal intervention implied by this statement. It is true that there is a difference. However, for example, this does not mean that a responsible adult going out on Friday the 13th has a higher chance of harm than on any other night.

**5.29 **(a) Chicken fed linseed weighed an average of 218.75 grams while those fed horsebean weighed an average of 160.20 grams. Both distributions are relatively symmetric with no apparent outliers. There is more variability in the weights of chicken fed linseed. (b) \(H_0 : \mu _{ls} = \mu _{hb}\). \(H_A : \mu _{ls} \ne \mu _{hb}\). We leave the conditions to you to consider. T = 3.02, df = min(11; 9) = 9 \(\rightarrow\) 0.01 < p-value < 0.02. Since p-value < 0.05, reject H_{0}. The data provide strong evidence that there is a significant difference between the average weights of chickens that were fed linseed and horsebean. (c) Type 1, since we rejected H_{0}. (d) Yes, since p-value > 0.01, we would have failed to reject H_{0}.

**5.31 **\(H_0 : \mu _C = \mu _S\). \(H_A : \mu _C \ne \mu _S\). T = 3.48, df = 11 \(\rightarrow\) p-value < 0.01. Since p-value < 0.05, reject H_{0}. The data provide strong evidence that the average weight of chickens that were fed casein is different than the average weight of chickens that were fed soybean (with weights from casein being higher). Since this is a randomized experiment, the observed difference are can be attributed to the diet.

**5.33 **\(H_0 : \mu _T = \mu _C\). \(H_A : \mu _T \ne \mu _C\). T = 2.24, df = 21 \(\rightarrow\) 0.02 < p-value < 0.05. Since p-value < 0.05, reject H_{0}. The data provide strong evidence that the average food consumption by the patients in the treatment and control groups are different. Furthermore, the data indicate patients in the distracted eating (treatment) group consume more food than patients in the control group.

**5.35 **Let \(\mu _{diff} = \mu _{pre} - \mu _{post}\). \(H_0 : \mu _{diff} = 0\): Treatment has no effect. \(H_A : \mu _{diff} > 0\): Treatment is effective in reducing Pd T scores, the average pre-treatment score is higher than the average post-treatment score. Note that the reported values are pre minus post, so we are looking for a positive difference, which would correspond to a reduction in the psychopathic deviant T score. Conditions are checked as follows. Independence: The subjects are randomly assigned to treatments, so the patients in each group are independent. All three sample sizes are smaller than 30, so we use t tests.Distributions of differences are somewhat skewed. The sample sizes are small, so we cannot reliably relax this assumption. (We will proceed, but we would not report the results of this specific analysis, at least for treatment group 1.) For all three groups: \(df = 13. T_1 = 1.89\) (0.025 < p-value < 0.05), \(T_2 = 1.35\) (p-value = 0.10), \(T_3 = -1.40\) (p-value > 0.10). The only significant test reduction is found in Treatment 1, however, we had earlier noted that this result might not be reliable due to the skew in the distribution. Note that the calculation of the p-value for Treatment 3 was unnecessary: the sample mean indicated a increase in Pd T scores under this treatment (as opposed to a decrease, which was the result of interest). That is, we could tell without formally completing the hypothesis test that the p-value would be large for this treatment group.

**5.37 **\(H_0: \mu _1 = \mu _2 = \dots = \mu _6\). H_{A}: The average weight varies across some (or all) groups. Independence: Chicks are randomly assigned to feed types (presumably kept separate from one another), therefore independence of observations is reasonable. Approx. normal: the distributions of weights within each feed type appear to be fairly symmetric. Constant variance: Based on the side-by-side box plots, the constant variance assumption appears to be reasonable. There are differences in the actual computed standard deviations, but these might be due to chance as these are quite small samples. \(F_{5;65} = 15.36\) and the p-value is approximately 0. With such a small p-value, we reject H_{0}. The data provide convincing evidence that the average weight of chicks varies across some (or all) feed supplement groups.

**5.39 **(a) H_{0}: The mean MET for each group is equal to the others. H_{A}: At least one pair of means is different. (b) Independence: We don't have any information on how the data were collected, so we cannot assess independence. To proceed, we must assume the subjects in each group are independent. In practice, we would inquire for more details. Approx. normal: The data are bound below by zero and the standard deviations are larger than the means, indicating very strong strong skew. However, since the sample sizes are extremely large, even extreme skew is acceptable. Constant variance: This condition is sufficiently met, as the standard deviations are reasonably consistent across groups. (c) See below, with the last column omitted:

Df | Sum Sq | Mean Sq | ||

Total | 50738 | 25575327 |

(d) Since p-value is very small, reject H_{0}. The data provide convincing evidence that the average MET differs between at least one pair of groups.

**5.41 **(a) H_{0}: Average GPA is the same for all majors. HA: At least one pair of means are different. (b) Since p-value > 0.05, fail to reject H_{0}. The data do not provide convincing evidence of a difference between the average GPAs across three groups of majors. (c) The total degrees of freedom is 195+2 = 197, so the sample size is 197 + 1 = 198.

**5.43 **(a) False. As the number of groups increases, so does the number of comparisons and hence the modified significance level decreases. (b) True. (c) True. (d) False. We need observations to be independent regardless of sample size.

**5.45 **(a) H_{0}: Average score difference is the same for all treatments. H_{A}: At least one pair of means are different. (b) We should check conditions. If we look back to the earlier exercise, we will see that the patients were randomized, so independence is satis ed. There are some minor concerns about skew, especially with the third group, though this may be acceptable. The standard deviations across the groups are reasonably similar. Since the p-value is less than 0.05, reject H_{0}. The data provide convincing evidence of a difference between the average reduction in score among treatments. (c) We determined that at least two means are different in part (b), so we now conduct \(K = 3 \times 2/2 = 3\) pairwise t tests that each use \(\alpha = 0.05/3 = 0.0167\) for a significance level. Use the following hypotheses for each pairwise test. H_{0}: The two means are equal. H_{A}: The two means are different. The sample sizes are equal and we use the pooled SD, so we can compute SE = 3.7 with the pooled df = 39. The p-value only for Trmt 1 vs. Trmt 3 may be statistically significant: 0.01 < p-value < 0.02. Since we cannot tell, we should use a computer to get the p-value, 0.015, which is statistically significant for the adjusted significance level. That is, we have identified Treatment 1 and Treatment 3 as having different effects. Checking the other two comparisons, the differences are not statistically significant.

## Inference for categorical data

**6.1** (a) False. Doesn't satisfy success-failure condition. (b) True. The success-failure condition is not satis ed. In most samples we would expect \(\hat {p}\) to be close to 0.08, the true population proportion. While \(\hat {p}\) can be much above 0.08, it is bound below by 0, suggesting it would take on a right skewed shape. Plotting the sampling distribution would confirm this suspicion. (c) False. \(SE_{\hat {p}} = 0.0243\), and \(\hat {p} = 0.12\) is only \( \frac {0.12-0.08}{0.0243} = 1.65\) SEs away from the mean, which would not be considered unusual. (d) True. \(\hat {p} = 0.12\) is 2.32 standard errors away from the mean, which is often considered unusual. (e) False. Decreases the SE by a factor of \(1/\sqrt {2}\).

**6.3** (a) True. See the reasoning of 6.1(b). (b) True. We take the square root of the sample size in the SE formula. (c) True. The independence and success-failure conditions are satisfied. (d) True. The independence and success-failure conditions are satisfied.

**6.5** (a) False. A con dence interval is constructed to estimate the population proportion, not the sample proportion. (b) True. 95% CI: 70% \(\pm\) 8%. (c) True. By the definition of a confidence interval. (d) True. Quadrupling the sample size decreases the SE and ME by a factor of \(1/\sqrt {4}\). (e) True. The 95% CI is entirely above 50%.

**6.7** With a random sample from < 10% of the population, independence is satis ed. The success-failure condition is also satis ed. ME = z*\(\sqrt {\frac {\hat {p}(1- \hat {p})}{n}} = 1.96 \sqrt {\frac {0.56 \times 0.44}{600}} = 0.0397 \approx 4%\)

**6.9** (a) Proportion of graduates from this university who found a job within one year of graduating. \(\hat {p} = 348/400 = 0.87\). (b) This is a random sample from less than 10% of the population, so the observations are independent. Success-failure condition is satisfied: 348 successes, 52 failures, both well above 10. (c) (0.8371, 0.9029). We are 95% confident that approximately 84% to 90% of graduates from this university found a job within one year of completing their undergraduate degree. (d) 95% of such random samples would produce a 95% confidence interval that includes the true proportion of students at this university who found a job within one year of graduating from college. (e) (0.8267, 0.9133). Similar interpretation as before. (f) 99% CI is wider, as we are more confident that the true proportion is within the interval and so need to cover a wider range.

**6.11 **(a) No. The sample only represents students who took the SAT, and this was also an online survey. (b) (0.5289, 0.5711). We are 95% confident that 53% to 57% of high school seniors are fairly certain that they will participate in a study abroad program in college. (c) 90% of such random samples would produce a 90% con dence interval that includes the true proportion. (d) Yes. The interval lies entirely above 50%.

**6.13 **(a) This is an appropriate setting for a hypothesis test. H_{0} : p = 0.50. H_{A} : p > 0.50. Both independence and the success-failure condition are satis ed. \(Z = 1:.2 \rightarrow \) p-value = 0.1314. Since the p-value > \(\alpha\) = 0.05, we fail to reject H_{0}. The data do not provide strong evidence in favor of the claim. (b) Yes, since we did not reject H_{0} in part (a).

**6.15 **(a) \(H_0 : p = 0.38\). \(H_A : p \ne 0.38\). Independence (random sample, < 10% of population) and the success-failure condition are satisfied. \(Z = -20 \rightarrow p-value \approx 0\). Since the p-value is very small, we reject H_{0}. The data provide strong evidence that the proportion of Americans who only use their cell phones to access the internet is different than the Chinese proportion of 38%, and the data indicate that the proportion is lower in the US. (b) If in fact 38% of Americans used their cell phones as a primary access point to the internet, the probability of obtaining a random sample of 2,254 Americans where 17% or less or 59% or more use their only their cell phones to access the internet would be approximately 0. (c) (0.1545, 0.1855). We are 95% confident that approximately 15.5% to 18.6% of all Americans primarily use their cell phones to browse the internet.

**6.17 **(a) \(H_0 : p = 0.5. H_A : p > 0.5\). Independence (random sample, < 10% of population) is satisfied, as is the success-failure conditions (using p_{0} = 0.5, we expect 40 successes and 40 failures). \(Z = 2.91 \rightarrow p-value = 0.0018\). Since the p-value < 0.05, we reject the null hypothesis. The data provide strong evidence that the rate of correctly identifying a soda for these people is significantly better than just by random guessing. (b) If in fact people cannot tell the difference between diet and regular soda and they randomly guess, the probability of getting a random sample of 80 people where 53 or more identify a soda correctly would be 0.0018.

**6.19 **(a) Independence is satisfied (random sample from < 10% of the population), as is the success-failure condition (40 smokers, 160 non-smokers). The 95% CI: (0.145, 0.255). We are 95% confident that 14.5% to 25.5% of all students at this university smoke. (b) We want z*SE to be no larger than 0.02 for a 95% confidence level. We use z* = 1.96 and plug in the point estimate \(\hat {p} = 0.2\) within the SE formula: \(1.96 \sqrt {\frac {0.2(1 - 0.2)}{n}} \le 0.02\). The sample size n should be at least 1,537.

**6.21 **The margin of error, which is computed as z*SE, must be smaller than 0.01 for a 90% confidence level. We use z* = 1.65 for

a 90% confidence level, and we can use the point estimate \(\hat {p} = 052\) in the formula for SE. \(1.65 \sqrt {\frac {0.52(1 - 0.52)}{n}} \le 0.01\). Therefore, the sample size n must be at least 6,796.

**6.23 **This is not a randomized experiment, and it is unclear whether people would be affected by the behavior of their peers. That is, independence may not hold. Additionally, there are only 5 interventions under the provocative scenario, so the success-failure condition does not hold. Even if we consider a hypothesis test where we pool the proportions, the success-failure condition will not be satisfied. Since one condition is questionable and the other is not satisfied, the difference in sample proportions will not follow a nearly normal distribution.

**6.25 **(a) False. The entire con dence interval is above 0. (b) True. (c) True. (d) True. (e) False. It is simply the negated and reordered values: (-0.06,-0.02).

**6.27 **(a) (0.23, 0.33). We are 95% confident that the proportion of Democrats who support the plan is 23% to 33% higher than the proportion of Independents who do. (b) True.

**6.29 **(a) College grads: 23.7%. Non-college grads: 33.7%. (b) Let \(p_{CG}\) and \(p_{NCG}\) represent the proportion of college graduates and noncollege graduates who responded "do not know". \(H_0 : p_{CG} = p_{NCG}. H_A : p_{CG} \ne p_{NCG}\). Independence is satisfied (random sample, < 10% of the population), and the success-failure condition, which we would check using the pooled proportion (\(\hat {p} = 235/827 = 0.284\)), is also satisfied. \(Z = -3.18 \rightarrow p-value = 0.0014\). Since the p-value is very small, we reject H_{0}. The data provide strong evidence that the proportion of college graduates who do not have an opinion on this issue is different than that of non-college graduates. The data also indicate that fewer college grads say they "do not know" than noncollege grads (i.e. the data indicate the direction after we reject H_{0}).

**6.31** (a) College grads: 35.2%. Non-college grads: 33.9%. (b) Let pCG and pNCG represent the proportion of college graduates and non-college grads who support offshore drilling. H_{0} : \(p_{CG} = p_{NCG}. H_A : p_{CG} \ne p_{NCG}\). Independence is satisfied (random sample, < 10% of the population), and the success-failure condition, which we would check using the pooled proportion (\(\hat {p} = 286/827 = 0.346\)), is also satised. \(Z = 0.39 \rightarrow p-value = 0.6966\). Since the p-value > \(\alpha\) (0.05), we fail to reject H_{0}. The data do not provide strong evidence of a difference between the proportions of college graduates and non-college graduates who support offshore drilling in California.

**6.33 **Subscript C means control group. Subscript T means truck drivers. (a) H_{0} : pC = pT . H_{A} : pC \(\ne\) pT . Independence is satisfied (random samples, < 10% of the population), as is the success-failure condition, which we would check using the pooled proportion (\(\hat {p} = 70/495 = 0.141\)). \(Z = -1.58 \rightarrow p-value = 0.1164\). Since the p-value is high, we fail to reject H_{0}. The data do not provide strong evidence that the rates of sleep deprivation are different for non-transportation workers and truck drivers.

**6.35 **(a) Summary of the study:

Virol. failure | |||

Yes | No | Total | |

Nevaripine Lopinavir | 26 10 | 94 110 | 120 120 |

Total | 36 | 204 | 240 |

(b) H_{0} : pN = pL. There is no difference in virologic failure rates between the Nevaripine and Lopinavir groups. H_{A} : pN \(\ne\) pL. There is some difference in virologic failure rates between the Nevaripine and Lopinavir groups. (c) Random assignment was used, so the observations in each group are independent. If the patients in the study are representative of those in the general population (something impossible to check with the given information), then we can also confidently generalize the ndings to the population. The success-failure condition, which we would check using the pooled proportion (\(\hat {p} = 36/240 = 0.15\)), is satis ed. \(Z = 3.04 \rightarrow p-value = 0.0024\). Since the p-value is low, we reject H_{0}. There is strong evidence of a difference in virologic failure rates between the Nevaripine and Lopinavir groups do not appear to be independent.

**6.37 **(a) False. The chi-square distribution has one parameter called degrees of freedom. (b) True. (c) True. (d) False. As the degrees of freedom increases, the shape of the chi-square distribution becomes more symmetric.

**6.39 **(a) H_{0}: The distribution of the format of the book used by the students follows the professor's predictions. H_{A}: The distribution of the format of the book used by the students does not follow the professor's predictions. (b) \(E_{hard copy} = 126 \times 0.60 = 75.6\). \(E_{print} = 126 \times 0.25 = 31.5\). \(E_{online} = 126 \times 0.15 = 18.9\). (c) Independence: The sample is not random. However, if the professor has reason to believe that the proportions are stable from one term to the next and students are not a ecting each other's study habits, independence is probably reasonable. Sample size: All expected counts are at least 5. Degrees of freedom: df = k - 1 = 3 - 1 = 2 is more than 1. (d) \(X^2 = 2.32, df = 2, p-value > 0.3\). (e) Since the p-value is large, we fail to reject H_{0}. The data do not provide strong evidence indicating the professor's predictions were statistically inaccurate.

**6.41 **(a). Two-way table:

Quit | |||

Treatment | Yes | No | Total |

Patch + support group Only patch | 40 30 | 110 120 | 150 150 |

Total | 70 | 230 | 300 |

(b-i) \(E_{row_1;col_1} = \frac {(row 1 total) \times (col 1 total)}{table total} = \frac {150 \times 70}{300} = 35\). This is lower than the observed value. (b-ii) \(E_{row_2;col_2} = \frac {(row 2 total) \times (col 2 total)}{table total} = \frac {150 \times 230}{300} = 115\). This is lower than the observed value.

**6.43 **H_{0}: The opinion of college grads and nongrads is not different on the topic of drilling for oil and natural gas off the coast of California. H_{A}: Opinions regarding the drilling for oil and natural gas off the coast of California has an association with earning a college degree.

\[E_{row 1;col 1} = 151.5 E_{row 1;col 2} = 134.5\]

\[E_{row 2;col 1} = 162.1 E_{row 2;col 2} = 143.9\]

\[E_{row 3;col 1} = 124.5 E_{row 3;col 2} = 110.5\]

Independence: The samples are both random, unrelated, and from less than 10% of the population, so independence between observations is reasonable. Sample size: All expected counts are at least 5. Degrees of freedom: \(df = (R - 1) \times (C - 1) = (3 - 1) \times (2 - 1) = 2\), which is greater than 1. \(X^2 = 11.47, df = 2 \rightarrow 0.001 < p-value < 0.005\). Since the p-value < \alpha\), we reject H_{0}. There is strong evidence that there is an association between support for off -shore drilling and having a college degree.

**6.45 **(a) H_{0} : There is no relationship between gender and how informed Facebook users are about adjusting their privacy settings. H_{A} : There is a relationship between gender and how informed Facebook users are about adjusting their privacy settings. (b) The expected counts:

\[E_{row 1;col 1} = 296.6 E_{row 1;col 2} = 369.3\]

\[E_{row 2;col 1} = 162.1 E_{row 2;col 2} = 68.2\]

\[E_{row 3;col 1} = 7.6 E_{row 3;col 2} = 9.4\]

The sample is random, all expected counts are above 5, and \(df = (3 - 1) \times (2 - 1) = 2 > 1\), so we may proceed with the test.

**6.47 **It is not appropriate. There are only 9 successes in the sample, so the success-failure condition is not met.

**6.49 **(a) H_{0} : p = 0.69. H_{A} : p \(\ne\) 0.69. (b) \(\hat {p} = \frac {17}{30} = 0.57\). (c) The success-failure condition is not satisfied; note that it is appropriate to use the null value (\(p_0 = 0.69\)) to compute the expected number of successes and failures. (d) Answers may vary. Each student can be represented with a card. Take 100 cards, 69 black cards representing those who follow the news about Egypt and 31 red cards representing those who do not. Shuffle the cards and draw with replacement (shuffling each time in between draws) 30 cards representing the 30 high school students. Calculate the proportion of black cards in this sample, \(\hat {p} _{sim}\), i.e. the proportion of those who follow the news in the simulation. Repeat this many times (e.g. 10,000 times) and plot the resulting sample proportions. The p-value will be two times the proportion of simulations where \(\hat {p}_{sim} \ge 0.57\). (Note: we would generally use a computer to perform these simulations.) (e) The p-value is about 0.001 + 0.005 + 0.020 + 0.035 + 0.075 = 0.136, meaning the two-sided p-value is about 0.272. Your p-value may vary slightly since it is based on a visual estimate. Since the p-value is greater than 0.05, we fail to reject H_{0}. The data do not provide strong evidence that the proportion of high school students who followed the news about Egypt is different than the proportion of American adults who did.

**6.51 **The subscript pr corresponds to provocative and con to conservative. (a) \(H_0 : p_{pr} = p_{con}\). \(H_A : p_{pr} \ne p_{con}\). (b) -0.35. (c) The left tail for the p-value is calculated by adding up the two left bins: 0.005 + 0.015 = 0.02. Doubling the one tail, the p-value is 0.04. (Students may have approximate results, and a small number of students may have a p-value of about 0.05.) Since the p-value is low, we reject H_{0}. The data provide strong evidence that people react differently under the two scenarios.

## Introduction to linear regression

**7.1** (a) The residual plot will show randomly distributed residuals around 0. The variance is also approximately constant. (b) The residuals will show a fan shape, with higher variability for smaller x. There will also be many points on the right above the line. There is trouble with the model being t here.

**7.3** (a) Strong relationship, but a straight line would not t the data. (b) Strong relationship, and a linear t would be reasonable. (c) Weak relationship, and trying a linear fit would be reasonable. (d) Moderate relationship, but a straight line would not t the data. (e) Strong relationship, and a linear t would be reasonable. (f) Weak relationship, and trying a linear fit would be reasonable.

**7.5** (a) Exam 2 since there is less of a scatter in the plot of nal exam grade versus exam 2. Notice that the relationship between Exam 1 and the Final Exam appears to be slightly nonlinear. (b) Exam 2 and the nal are relatively close to each other chronologically, or Exam 2 may be cumulative so has greater similarities in material to the nal exam. Answers may vary for part (b).

**7.7** (a) \(R = -0.7 \rightarrow\) (4). (b) \(R = 0.45 \rightarrow\) (3). (c) \(R = 0.06 \rightarrow\) (1). (d) \(R = 0.92 \rightarrow\) (2).

**7.9** (a) The relationship is positive, weak, and possibly linear. However, there do appear to be some anomalous observations along the left where several students have the same height that is notably far from the cloud of the other points. Additionally, there are many students who appear not to have driven a car, and they are represented by a set of points along the bottom of the scatterplot. (b) There is no obvious explanation why simply being tall should lead a person to drive faster. However, one confounding factor is gender. Males tend to be taller than females on average, and personal experiences (anecdotal) may suggest they drive faster. If we were to follow-up on this suspicion, we would nd that sociological studies con rm this suspicion. (c) Males are taller on average and they drive faster. The gender variable is indeed an important confounding variable.

**7.11 **(a) There is a somewhat weak, positive, possibly linear relationship between the distance traveled and travel time. There is clustering near the lower left corner that we should take special note of. (b) Changing the units will not change the form, direction or strength of the relationship between the two variables. If longer distances measured in miles are associated with longer travel time measured in minutes, longer distances measured in kilometers will be associated with longer travel time measured in hours. (c) Changing units doesn't affect correlation: R = 0.636.

**7.13 **(a) There is a moderate, positive, and linear relationship between shoulder girth and height. (b) Changing the units, even if just for one of the variables, will not change the form, direction or strength of the relationship between the two variables.

**7.15 **In each part, we may write the husband ages as a linear function of the wife ages: (a) \(age_H = age_W + 3\); (b) \(age_H = age_W - 2\); and (c) \(age_H = age_W/2\). Therefore, the correlation will be exactly 1 in all three parts. An alternative way to gain insight into this solution is to create a mock data set, such as a data set of 5 women with ages 26, 27, 28, 29, and 30 (or some other set of ages). Then, based on the description, say for part (a), we can compute their husbands' ages as 29, 30, 31, 32, and 33. We can plot these points to see they fall on a straight line, and they always will. The same approach can be applied to the other parts as well.

**7.17 **(a) There is a positive, very strong, linear association between the number of tourists and spending. (b) Explanatory: number of tourists (in thousands). Response: spending (in millions of US dollars). (c)We can predict spending for a given number of tourists using a regression line. This may be useful information for determining how much the country may want to spend in advertising abroad, or to forecast expected revenues from tourism. (d) Even though the relationship appears linear in the scatterplot, the residual plot actually shows a nonlinear relationship. This is not a contradiction: residual plots can show divergences from linearity that can be difficult to see in a scatterplot. A simple linear model is inadequate for modeling these data. It is also important to consider that these data are observed sequentially, which means there may be a hidden structure that it is not evident in the current data but that is important to consider.

**7.19 **(a) First calculate the slope: \(b_1 = R \times \frac {s_y}{s_x} = 0.636 \times \frac {113}{99} = 0.726\). Next, make use of the fact that the regression line passes through the point \((\bar {x}; \bar {y}): \bar {y} = b_0 + b_1 \times \bar {x}\). Plug in \(\bar {x}, \bar {y}, and b_1\), and solve for \(b_0\): 51. Solution: travdel time = \(51 + 0.726 \times distance\). (b) \(b_1\): For each additional mile in distance, the model predicts an additional 0.726 minutes in travel time. \(b_0\): When the distance traveled is 0 miles, the travel time is expected to be 51 minutes. It does not make sense to have a travel distance of 0 miles in this context. Here, the y-intercept serves only to adjust the height of the line and is meaningless by itself. (c) \(R^2 = 0.636^2 = 0.40\). About 40% of the variability in travel time is accounted for by the model, i.e. explained by the distance traveled. (d) \(\hat {travdel time} = 51 + 0.726 \times distance = 51 + 0.726 \times 103 \approx 126 minutes\). (Note: we should be cautious in our predictions with this model since we have not yet evaluated whether it is a well- t model.) (e) \(e_i = y_i - \hat {y}_i = 168 - 126 = 42 minutes\). A positive residual means that the model underestimates the travel time. (f) No, this calculation would require extrapolation.

**7.21 **The relationship between the variables is somewhat linear. However, there are two apparent outliers. The residuals do not show a random scatter around 0. A simple linear model may not be appropriate for these data, and we should investigate the two outliers.

**7.23 **(a) \(\sqrt {R^2} = 0.849\). Since the trend is negative, R is also negative: \(R = -0.849\). (b) \(b_0 = 55.34. b_1 = -0.537\). (c) For a neighborhood with 0% reduced-fee lunch, we would expect 55.34% of the bike riders to wear helmets. (d) For every additional percentage point of reduced fee lunches in a neighborhood, we would expect 0.537% fewer kids to be wearing helmets. (e) \(\hat {y} = 40 \times (-0.537)+55.34 = 33.86\), \(e = 40 - \hat {y} = 6.14\). There are 6.14% more bike riders wearing helmets than predicted by the regression model in this neighborhood.

**7.25 **(a) The outlier is in the upper-left corner. Since it is horizontally far from the center of the data, it is a point with high leverage. Since the slope of the regression line would be very different if t without this point, it is also an inuential point. (b) The outlier is located in the lowerleft corner. It is horizontally far from the rest of the data, so it is a high-leverage point. The line again would look notably different if the fit excluded this point, meaning it the outlier is inuential. (c) The outlier is in the upper-middle of the plot. Since it is near the horizontal center of the data, it is not a high-leverage point. This means it also will have little or no inuence on the slope of the regression line.

**7.27 **(a) There is a negative, moderate-to-strong, somewhat linear relationship between percent of families who own their home and the percent of the population living in urban areas in 2010. There is one outlier: a state where 100% of the population is urban. The variability in the percent of homeownership also increases as we move from left to right in the plot. (b) The outlier is located in the bottom right corner, horizontally far from the center of the other points, so it is a point with high leverage. It is an influential point since excluding this point from the analysis would greatly affect the slope of the regression line.

**7.29 **(a) The relationship is positive, moderate-to-strong, and linear. There are a few outliers but no points that appear to be influential. (b) \(\hat {wedight} = -105.0113+1.0176 \times height. Slope: For each additional centimeter in height, the model predicts the average weight to be 1.0176 additional kilograms (about 2.2 pounds). Intercept: People who are 0 centimeters tall are expected to weigh -105.0113 kilograms. This is obviously not possible. Here, the y-intercept serves only to adjust the height of the line and is meaningless by itself. (c) H_{0}: The true slope coefficient of height is zero ( \(\beta _1\) = 0). H_{0}: The true slope coefficient of height is greater than zero ( \(\beta _1\) > 0). A two-sided test would also be acceptable for this application. The p-value for the two-sided alternative hypothesis ( \(\beta _1 \ne 0\)) is incredibly small, so the p-value for the onesided hypothesis will be even smaller. That is, we reject H_{0}. The data provide convincing evidence that height and weight are positively correlated. The true slope parameter is indeed greater than 0. (d) \(R^2 = 0.72^2 = 0.52\). Approximately 52% of the variability in weight can be explained by the height of individuals.

**7.31 **(a) \(H_0: \beta _1 = 0. H_0: \beta _1 > 0\). A two-sided test would also be acceptable for this application. The p-value, as reported in the table, is incredibly small. Thus, for a one-sided test, the p-value will also be incredibly small, and we reject \(H_0\). The data provide convincing evidence that wives' and husbands' heights are positively correlated. (b) \(\hat {hedight} _W = 43.5755 + 0.2863 times height_H\). (c) Slope: For each additional inch in husband's height, the average wife's height is expected to be an additional 0.2863 inches on average. Intercept: Men who are 0 inches tall are expected to have wives who are, on average, 43.5755 inches tall. The intercept here is meaningless, and it serves only to adjust the height of the line. (d) The slope is positive, so R must also be positive. \(R = \sqrt {0.09} = 0.30\). (e) 63.2612. Since \(R^2\) is low, the prediction based on this regression model is not very reliable. (f) No, we should avoid extrapolating.

**7.33 **(a) 25.75. (b) \(H_0: \beta _1 = 0\). \(H_A: \beta _1 \ne 0\). A one-sided test also may be reasonable for this application. T = 2.23, \(df = 23 \rightarrow p-value\) between 0.02 and 0.05. So we reject H_{0}. There is an association between gestational age and head circumference. We can also say that the associaation is positive.

## Multiple and logistic regression

**8.1** (a) \(\hat {baby_weight} = 123.05 \times 8.94\) smoke (b) The estimated body weight of babies born to smoking mothers is 8.94 ounces lower than babies born to non-smoking mothers. Smoker: \(123.05-8.94 \times 1 = 114.11\) ounces. Non-smoker: \(123.05 - 8.94 \times 0 = 123.05\) ounces. (c) \(H_0: \beta _1 = 0. H_A: \beta _1 \ne 0\). \(T = -8..65\), and the p-value is approximately 0. Since the p-value is very small, we reject \(H_0\). The data provide strong evidence that the true slope parameter is different than 0 and that there is an association between birth weight and smoking. Furthermore, having rejected \(H_0\), we can conclude that smoking is associated with lower birth weights.

**8.3** (a) \(\hat {baby_weight} = -80.41 + 0.44 \times gestation - 3.33 \times parity - 0.01 \times age + 1.15 \times height + 0.05 weight - 8.40\) smoke. (b) gestation: The model predicts a 0.44 ounce increase in the birth weight of the baby for each additional day of pregnancy, all else held constant. age: The model predicts a 0.01 ounce decrease in the birth weight of the baby for each additional year in mother's age, all else held constant. (c) Parity might be correlated with one of the other variables in the model, which complicates model estimation. (d) \(\hat {baby_weight} = 120.58\). e = 120 - 120.58 = -0.58. The model over-predicts this baby's birth weight. (e) \(R^2 = 0.2504\). \(R^2_{adj} = 0.2468\).

**8.5** (a) (-0.32, 0.16). We are 95% confident that male students on average have GPAs 0.32 points lower to 0.16 points higher than females when controlling for the other variables in the model. (b) Yes, since the p-value is larger than 0.05 in all cases (not including the intercept).

**8.7** (a) There is not a signi cant relationship between the age of the mother. We should consider removing this variable from the model. (b) All other variables are statistically significant at the 5% level.

**8.9** Based on the p-value alone, either gestation or smoke should be added to the model first. However, since the adjusted \(R^2\) for the model with gestation is higher, it would be preferable to add gestation in the first step of the forwardselection algorithm. (Other explanations are possible. For instance, it would be reasonable to only use the adjusted \(R^2\).)

**8.11 **Nearly normal residuals: The normal probability plot shows a nearly normal distribution of the residuals, however, there are some minor irregularities at the tails. With a data set so large, these would not be a concern. Constant variability of residuals: The scatterplot of the residuals versus the tted values does not show any overall structure. However, values that have very low or very high tted values appear to also have somewhat larger outliers. In addition, the residuals do appear to have constant variability between the two parity and smoking status groups, though these items are relatively minor.

Independent residuals: The scatterplot of residuals versus the order of data collection shows a random scatter, suggesting that there is no apparent structures related to the order the data were collected.

Linear relationships between the response variable and numerical explanatory variables: The residuals vs. height and weight of mother are randomly distributed around 0. The residuals vs. length of gestation plot also does not show any clear or strong remaining structures, with the possible exception of very short or long gestations. The rest of the residuals do appear to be randomly distributed around 0. All concerns raised here are relatively mild. There are some outliers, but there is so much data that the inuence of such observations will be minor.

**8.13 **(a) There are a few potential outliers, e.g. on the left in the total length variable, but nothing that will be of serious concern in a data set this large. (b) When coefficient estimates are sensitive to which variables are included in the model, this typically indicates that some variables are collinear. For example, a possum's gender may be related to its head length, which would explain why the coefficient (and p-value) for sex male changed when we removed the head length variable. Likewise, a possum's skull width is likely to be related to its head length, probably even much more closely related than the head length was to gender.

**8.15 **(a) The logistic model relating \(\hat {p}_i\) to the predictors may be written as \(log (\frac {\hat {p}_i}{1- \hat {p}_i}) = 33.5095 - 1.4207 \times sex male_i - 0.2787 \times skull widthi + 0.5687 total length_i\). Only total_length has a positive association with a possum being from Victoria. (b) \(\hat {p} = 0.0062\). While the probability is very near zero, we have not run diagnostics on the model. We might also be a little skeptical that the model will remain accurate for a possum found in a US zoo. For example, perhaps the zoo selected a possum with specific characteristics but only looked in one region. On the other hand, it is encouraging that the possum was caught in the wild. (Answers regarding the reliability of the model probability will vary.)

## Contributors

David M Diez (Google/YouTube), Christopher D Barr (Harvard School of Public Health), Mine Çetinkaya-Rundel (Duke University)