Skip to main content
Statistics LibreTexts

9.7: Practice (Chapter 9)

  • Page ID
    59145
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    9.1: Scatter plots and Direction of Association

    1. What is a scatterplot and when is it appropriate to use one?
    2. Draw a scatterplot of the following data by hand or using technology. What pattern do you observe?
      1. 1, 2, 3, 4, 5
      2. 2, 4, 6, 7, 10
    3. Suppose you were to collect data for the following pair of variables, people: age, grip strength. You want to make a scatterplot.
      1. Which variable would you use as the explanatory variable?
      2. Which variable would you use as the response variable?
      3. Would you expect to see a positive or negative association?
    4. What does it mean to say two variables are positively associated?
      1. There is a nonlinear relationship between the variables.
      2. When the value of one variable increases, the value of the other variable decreases.
      3. There is a linear relationship between the variables.
      4. When the value of one variable increases, the value of the other variable also increases.
    5. Does the scatter plot below appear to have a positive or negative association?
      scatterplot with with points form a clear pattern, moving upward to the right.
    6. Does the scatter plot below appear to have a positive or negative association?
    7. Scatterplot with points that move downward to the right.

      A random sample of ten celebrities produced the following data where \(x\) is the number of endorsements the celebrity has and \(y\) is the amount of money made (in millions of dollars). Draw a scatter plot of the data.

      Celebrity endorsements and income
      \(x\) \(y\) \(x\) \(y\)
      0 2 5 12
      3 8 4 9
      2 7 3 9
      1 3 0 3
      5 13 4 10
    8. Office productivity is relatively low when the employees feel no stress about their work or job security. However, high levels of stress can also lead to reduced employee productivity. Sketch a plot to represent the relationship between stress and productivity.
    9. Explain the difference between a positive association and a negative association. Sketch a rough example of each.
    10. Describe what a scatterplot with no association would look like. What might that tell you about the relationship between two variables?
    11. Why is it important to always visualize your data before doing further analysis?

    9.2: Correlation Coefficient (r)

    1. What are the possible values for the correlation coefficient?
    2. What does a correlation coefficient of \( r = 0.94 \) tell us about a dataset?
    3. List three things the correlation coefficient tells us, and two things it does not tell us.
    4. How does the value of \( r \) change as the points in a scatterplot move further from a perfect straight line?
    5. In order to have a correlation coefficient between traits A and B, it is necessary to have:
      1. one group of subjects, some of whom possess characteristics of trait A, the remainder possessing those of trait B
      2. measures of trait A on one group of subjects and of trait B on another group
      3. two groups of subjects, one which could be classified as A or not A, the other as B or not B
      4. two groups of subjects, one which could be classified as A or not A, the other as B or not B
    6. If the correlation between age of an auto and money spent for repairs is +.90
      1. 81% of the variation in the money spent for repairs is explained by the age of the auto
      2. 81% of money spent for repairs is unexplained by the age of the auto
      3. 90% of the money spent for repairs is explained by the age of the auto
      4. none of the above
    7. Consider a sample least squares regression analysis between a dependent variable (Y) and an independent variable (X). A sample correlation coefficient of −1 (minus one) tells us that
      1. there is no relationship between Y and X in the sample
      2. there is no relationship between Y and X in the population
      3. there is a perfect negative relationship between Y and X in the population
      4. there is a perfect negative relationship between Y and X in the sample.
    8. If the correlation coefficient is negative, this means
      1. the slope of the regression line is negative
      2. the test statistic can not be calculated
      3. there is a strong correlation between the variables
      4. there is no correlation between the variables
      5. the calculation was done wrong
      6. the regression line can be used for prediction
    9. Match each \( r \) value below with the best description:
      1. Strong negative linear relationship
      2. No linear relationship
      3. Moderate positive correlation
      4. Strong positive linear relationship
        1. \( r = -0.15 \)
        2. \( r = -0.92 \)
        3. \( r = 0 \)
        4. \( r = 0.83 \)
    10. A national consumer magazine reported the following correlations. The correlation between car weight and car reliability is -0.9. The correlation between car weight and annual maintenance cost is 0.3. According to this information, which of the following statements are true?
      1. Heavier cars tend to be less reliable.
      2. Heavier cars tend to cost more to maintain.
      3. Car weight is related more strongly to reliability than to maintenance cost.
    11. Does the scatter plot below appear to have a strong or weak linear relationship?
      Scatterplot with points form a pattern, moving upward to the right
    12. Does the scatter plot below appear to have a strong or weak linear relationship?
    This is a scatter plot with several points plotted all over the first quadrant. There is no pattern.
    1. A random sample of ten celebrities produced the following data where \(x\) is the number of endorsements the celebrity has and \(y\) is the amount of money made (in millions of dollars). Find the value of the correlation coefficient for this data set. What does this mean in this context?
    Celebrity Endorsements and Income
    \(x\) \(y\) \(x\) \(y\)
    0 2 5 12
    3 8 4 9
    2 7 3 9
    1 3 0 3
    5 13 4 10
    1. Eduardo and Rosie are both collecting data on number of rainy days in a year and the total rainfall for the year. Eduardo records rainfall in inches and Rosie in centimeters. How will their correlation coefficients compare?
    2. What are some limitations of using only \( r \) to describe a relationship?

    9.3: Least-Squares Regression Line

    1. Explain what the "least-squares" in "least-squares regression line" means and why it's useful.
    2. A researcher finds the regression equation: \( \hat{y} = 45.1 + 2.3x \)
      1. Interpret the slope in context.
      2. What does the y-intercept mean in this example (if it's meaningful)?
      3. Predict \( y \) when \( x = 5 \).
    3. Given a regression line \( \hat{y} = 30 + 4.2x \), what is the predicted value of \( y \) when \( x = 6 \)?
    4. Suppose one collected the following information where X is diameter of tree trunk and Y is tree height.
      Tree diameter and height
      X Y
      4 8
      2 4
      8 18
      6 22
      10 30
      6 8

      Regression equation: \(\hat{y}_i=-3.6+3.1 \cdot X_i\)

      What is your estimate of the average height of all trees having a trunk diameter of 7 inches?
    5. A random sample of ten celebrities produced the following data where \(x\) is the number of endorsements the celebrity has and \(y\) is the amount of money made (in millions of dollars). Find the equation of the least-squares regression line for this data.
      Celebrity endorsements and income
      \(x\) \(y\) \(x\) \(y\)
      0 2 5 12
      3 8 4 9
      2 7 3 9
      1 3 0 3
      5 13 4 10
    6. What is interpolation? What is extrapolation? Which one is more risky — and why?
    7. An electronics retailer used regression to find a simple model to predict sales growth in the first quarter of the new year (January through March). The model is good for 90 days, where \(x\) is the day. The model can be written as follows: \[\hat{y} = 101.32 + 2.48x\] where \(\hat{y}\) is in thousands of dollars.
      1. What would you predict the sales to be after one month?
      2. What is the meaning of \hat{y}\(100\)?
      3. When do we predict the sales to reach one million dollars?
    8. Ornithologists, scientists who study birds, tag sparrow hawks in 13 different colonies to study their population. They gather data for the percent of new sparrow hawks in each colony and the percent of those that have returned from migration.

      Percent return: 74; 66; 81; 52; 73; 62; 52; 45; 62; 46; 60; 46; 38

      Percent new: 5; 6; 8; 11; 12; 15; 16; 17; 18; 18; 19; 20; 20

      1. Enter the data into your calculator and make a scatter plot.
      2. Use your calculator’s regression function to find the equation of the least-squares regression line. Add this to your scatter plot from part a.
      3. Explain in words what the slope and -intercept of the regression line tell us.
      4. How well does the regression line fit the data? Explain your response.
      5. Which point has the largest residual? Explain what the residual means in context. Is this point an outlier? An influential point? Explain.
      6. An ecologist wants to predict how many birds will join another colony of sparrow hawks to which 70% of the adults from the previous year have returned. What is the prediction?
    9. The following table shows data on average per capita wine consumption and heart disease rate in a random sample of 10 countries.
      Wine Consumption and Heart Disease
      Yearly wine consumption in liters 2.5 3.9 2.9 2.4 2.9 0.8 9.1 2.7 0.8 0.7
      Death from heart diseases 221 167 131 191 220 297 71 172 211 300
      1. Make a scatter plot of the data
      2. Find the equation of the least-squares regression line. Add this to your scatter plot from part a.
      3. How well does the regression line fit the data? Explain your response.
      4. Do the data provide convincing evidence that there is a linear relationship between the amount of alcohol consumed and the heart disease death rate?
    10. A regression was run to determine if there is a relationship between hours of TV watched per day (x) and number of sit ups a person can do (y).

      The results of the regression were:
      y=ax+b
      a=-0.948
      b=31.264
      r2=0.583696
      r=-0.764
      


      Use this to predict the number of sit ups a person who watches 10 hours of TV can do (to one decimal place)
    11. What happens to the regression line if the correlation is close to 0?
    12. What is the problem with using a regression model to predict what will happen in the distant future?

    9.4: Interpreting Slope, Intercept, and Residuals

    1. Calculate the residual for a point where the actual value is 74 and the predicted value is 68. Interpret the result.
    2. What does it tell us when many residuals are large (positive or negative)?
    3. A regression equation is \( \hat{y} = 55 + 0.7x \)
      1. What is the predicted value when \( x = 10 \)?
      2. A student with \( x = 10 \) has an actual value of 66. What is the residual?
    4. A researcher wishes to examine the relationship between years of schooling completed and the number of pregnancies in young women. Her research discovers a linear relationship, and the least squares line is: \( \hat{y} = 1 - 4x \) where \x\ is the number of years of schooling completed and \hat{y}\ is the number of pregnancies. The slope of the regression line can best be interpreted by which of the following way:
      1. When amount of schooling increases by one year, the number of pregnancies increases by 4.
      2. When amount of schooling increases by one year, the number of pregnancies decreases by 1.
      3. When amount of schooling increases by one year, the number of pregnancies increases by 1.
      4. When amount of schooling increases by one year, the number of pregnancies decreases by 4.
    5. The equation of the regression line that relates one hundredth percent blood alcohol level \x\ to the number of names a person can remember five minutes after hearing them is ( \hat{y} = 12 - 0.5x \)
      1. Interpret the slope in this context
      2. Interpret the intercept in this context
    6. The linear regression equation for a data set is \( \hat{y} = -4.1 + 1.6x \). The actual value at \x=15\ is 20.4. What is the residual value at \x=15\?
    7. A random sample of ten celebrities produced the following data where \(x\) is the number of endorsements the celebrity has and \(y\) is the amount of money made (in millions of dollars). Interpret the slope and intercept(s) of the line of best fit for this data.
      Celebrity endorsements and income
      \(x\) \(y\) \(x\) \(y\)
      0 2 5 12
      3 8 4 9
      2 7 3 9
      1 3 0 3
      5 13 4 10
    8. The following table shows the life expectancy for an individual born in the United States in certain years.
      Life Expectancies
      Year of Birth Life Expectancy
      1930 59.7
      1940 62.9
      1950 70.2
      1965 69.7
      1973 71.4
      1982 74.5
      1987 75
      1992 75.7
      2010 78.7
      1. Decide which variable should be the independent variable and which should be the dependent variable.
      2. Draw a scatter plot of the ordered pairs.
      3. Calculate the least squares line.
      4. Find the correlation coefficient. Is it significant?
      5. Find the estimated life expectancy for an individual born in 1950 and for one born in 1982.
      6. Why aren’t the answers to part e the same as the values in the table that correspond to those years?
      7. Use the two points in part e to plot the least squares line on your graph from part b.
      8. Based on the data, is there a linear relationship between the year of birth and life expectancy?
      9. Which point has the largest residual? Explain what the residual means in context. Is this point an outlier? An influential point? Explain.Using the least squares line, find the estimated life expectancy for an individual born in 1850. Does the least squares line give an accurate estimate for that year? Explain why or why not.
      10. What is the slope of the least-squares (best-fit) line? Interpret the slope.
      11. What does the sign (positive or negative) of a residual tell you about the prediction?
    9. The height (sidewalk to roof) of notable tall buildings in America is compared to the number of stories of the building (beginning at street level).
      Heights of buildings
      Height (in feet) Stories
      1,050 57
      428 28
      362 26
      529 40
      790 60
      401 22
      380 38
      1,454 110
      1,127 100
      700 46
      1. Using “stories” as the independent variable and “height” as the dependent variable, make a scatter plot of the data.
      2. Does it appear from inspection that there is a relationship between the variables?
      3. Calculate the least squares line.
      4. Find the correlation coefficient. Is it significant?
      5. Find the estimated heights for 32 stories and for 94 stories.
      6. Based on the data in the table, is there a linear relationship between the number of stories in tall buildings and the height of the buildings?
      7. Are there any outliers in the data? If so, which point(s)?
      8. What is the estimated height of a building with six stories? Does the least squares line give an accurate estimate of height? Explain why or why not.
      9. Based on the least squares line, adding an extra story is predicted to add about how many feet to a building?
      10. What is the slope of the least squares (best-fit) line? Interpret the slope.
    10. A group of students measure the length and width of a random sample of beans. They are interested in investigating the relationship between the length and width. Their summary statistics are displayed in the table below. All units, if applicable, are millimeters.
      Summary Statistics
      Mean width: 7.571
      Stdev width: 0.955
      Mean height: 13.943
      Stdev height: 2.008
      Correlation coefficient: 0.8435
      1. The students are interested in using the width of the beans to predict the height. Calculate the slope of the regression equation.
      2. Write the equation of the best-fit line that can be used to predict bean heights. Use x to represent width and y to represent height.
      3. What fraction of the variability in bean heights can be explained by the linear model of bean height vs width? Express your answer as a decimal.
      4. If, instead, the students are interested in using the height of the beans to predict the width, calculate the slope of this new regression equation.
      5. Write the equation of the best-fit line that can be used to predict bean widths. Use x to represent height and y to represent width.
    11. Explain why residuals are important when assessing the usefulness of a model.

    9.5: Cautions with Correlation and Causation

    1. What is the difference between correlation and causation?
    2. A study of the CO2 concentration and global temperature data found that the correlation between CO2 concentration and global temperature was 0.993. Based on this correlation, it can be concluded that the increased level of CO2 concentration is causing global warming.
    3. A study was done on smoking and lung capacity. 200 smokers took part in a study that asked them how many cigarettes a day they smoked and then measured their lung capacity. The correlation was found to be \r=-0.992\ . Based solely on this study it can be concluded that smoking causes lung cancer.
    4. Give an example of two variables that might have a strong correlation but are not causally related. Explain what the "lurking variable" might be.
    5. Give an example of two variables that might have a strong correlation and are causally related. Explain how the causality might be proven.
    6. Select True or False for the following statements.
      1. When ice cream sales go up, so do crime rates. A reasonable conclusion is that there is an association between ice cream sales and the crime rate.
      2. Association does not imply causation.
      3. If there is an association identified between the explanatory and response variable, the relationship is causal.
      4. The response variable is always caused by the explanatory variable
      5. When ice cream sales go up, so do crime rates. A reasonable conclusion is that the increase in ice cream sales causes an increase in the crime rate.
    7. Can correlation help us form hypotheses about cause and effect? Why or why not?
    8. Explain why a randomized controlled experiment is better than an observational study when trying to establish causation.
    9. Review one recent article or claim you’ve seen online or in the media that may confuse correlation with causation. What evidence would you need to decide if a causal relationship exists?

    9.7: Practice (Chapter 9) is shared under a not declared license and was authored, remixed, and/or curated by LibreTexts.

    • Was this article helpful?