15.4.1: Practice with Nutrition
 Page ID
 19987
Let’s try a second example using the same nutrition data set from fast food restaurants from OpenIntro.org as the chapter on correlations. If you grab your calculations and Sum of Products table from that practice with the nutrition data, you’ll be ahead of the game!
Scenario
When Dr. MO was on a lowcarb diet, she noticed that things that were low in carbohydrates tended to be high in fat, and vice versa. Our prior analysis found a strong correlation between fat and carbs, but not in the direction that Dr. MO expected! Now, we will use our knowledge of this strong correlation to see if we can use the Total Fat in our sample chicken dishes from one fast food restaurant to predict the Total Carbs in those same dishes.
Step 1: State the Hypotheses
Because of trying to simplify regression, our hypotheses are still a little clunky, but hopefully you’ll see the value of regression analyses in the end.
Example \(\PageIndex{1}\)
What could be a research hypothesis for this scenario? State the research hypothesis in words and symbols.
Solution
The research hypothesis should probably be something like, "There will be a positive slope in the regression line for Total Fat and Total Carbs."
Symbols: \( \beta > 0 \)
In other words, we’re trying to construct a statistically significant (as determined by an ANOVA) regression line equation that could use to predict Total Carbs from Total Fat.
Example \(\PageIndex{2}\)
What is the null hypothesis for this scenario? State the null hypothesis in words and symbols.
Solution
The null hypothesis is that there is no relationship between Total Fat and Total Carbs, so "The slope in the regression line for Total Fat and Total Carbs will be zero." Neither variable can predict or explain the other variable.
Symbols: \( \beta = 0 \)
Step 2: Find the Critical Value
Our critical value for regression will come from the Table of Critical Values of F (found in the first chapter discussing ANOVAs, or found through the Common Critical Values page at the end of the book). This table shows that the appropriate critical value at p = .05 is F_{Critical} (1, 11)= 4.84.
Step 3: Calculate the Test Statistic
Okay, here’s where the tedious part starts! Use the data in Table \(\PageIndex{1}\) to start your calculations!
Total Fat 
Difference: Fat  Mean 
Fat Difference Squared 
Total Carbs 
Difference: Carbs  Mean 
Carbs Difference Squared 
Fat Diff * Carb Diff 

11 


19 



12 


30 



13 


31 



17 


28 



24 


53 



24 


26 



26 


52 



28 


47 



31 


66 



31 


56 



35 


53 



44 


81 



46 


53 



\(\sum = 342.00 \) 
\(\sum = ? \) 
\(\sum = ? \) 
\(\sum = 595.00 \) 
\(\sum = ? \) 
\(\sum = ? \) 
\(\sum = ? \) 
To start filling in the Table \(\PageIndex{1}\), we’ll need to find the means again for Total Fat and Total Carbs. Since we did that in the last chapter, let’s just put them in a Table \(\PageIndex{2}\).

Mean 
Standard Deviation 
N 

Total Fat 
26.31 
11.32 
13 
Total Carb 
45.77 
17.89 
13 
Now that we have the means, we can fill in the complete Sum of Products table.
Exercise \(\PageIndex{1}\)
Fill in Table \(\PageIndex{1}\) by finding the differences of each score from that variable's mean, squaring the differences, multiplying them, then finding the sums of each of these.
 Answer

Completed Sum of Products table:
Table \(\PageIndex{3}\): Completed Sum of Products Table Total Fat
Difference: Fat  Mean
Fat Difference Squared
Total Carbs
Difference: Carbs  Mean
Carbs Difference Squared
Fat Diff * Carb Diff
11
15.31
234.40
19
26.77
716.63
409.85
12
14.31
204.78
30
15.77
248.69
225.67
13
13.31
177.16
31
14.77
218.15
196.59
17
9.31
86.68
28
17.77
315.77
165.44
24
2.31
5.34
53
7.23
52.27
16.70
24
2.31
5.34
26
19.77
390.85
45.67
26
0.31
0.10
52
6.23
38.81
1.93
28
1.69
2.86
47
1.23
1.51
2.08
31
4.69
22.00
66
20.23
409.25
94.88
31
4.69
22.00
56
10.23
104.65
47.98
35
8.69
75.52
53
7.23
52.27
62.83
44
17.69
312.94
81
35.23
1241.15
623.22
46
19.69
387.70
53
7.23
52.27
142.36
\(\sum = 342.00 \)
\(\sum = 0.00 \)
\(\sum = 1536.77 \)
\(\sum = 595.00 \)
\(\sum = 0.01 \)
\(\sum = 3842.31 \)
\(\sum = 1997.92 \)
We can do a computation check to see if our difference scores each score minus the mean for that variable) were calculated correctly by seeing if they each sum to nearly zero, which they do; yay! We could use the sum of those squares to calculate the standard deviation for each variable, but that was already provided in Table \(\PageIndex{2}\) .
So, we have what we need to complete this activity. But what is this activity? Ultimately, our goal is to predict Total Carbs using Total Fat with a regression line equation (\(\widehat{y} = a + b\text{x}\)). So, we need to find the slope (b) and the intercept (a).
Here’s the formula for slope:
\[\mathrm{b}=\dfrac{(Diff_{x} * Diff_{y})}{Diff_{x}^2} = \dfrac{(Diff_{F} \times Diff_{C})}{Diff_{F}^2}\nonumber \]
What this equation is telling us to do is pretty simple since we have the Sum of Products table filled out.
Example \(\PageIndex{3}\)
Calculate the slope for this scenario.
Solution
\[\mathrm{b}= \dfrac{1997.92}{1539.77} = 1.30 \nonumber \]
The result means that as Total Fat (\(X\)) changes by 1 unit, Total Carb (\(Y\)) will change by 1.30. This is a positive relation.
Next, we use this slope (b = 1.30), along with the means of each variable, to compute the intercept:
\[a =\overline{X_y}  (b \times \overline{X_x}) = \overline{X_C}  (b \times \overline{X_F}) \nonumber \]
Example \(\PageIndex{4}\)
Using the means and the slope (b) that we just calculated, calculate the intercept (a).
Solution
\[a =45.77  (1.30 * 26.31) \nonumber \]
\[a =45.77  (34.20) \nonumber \]
\[a =11.57 \nonumber \]
Now that we have all of our parameters estimated, we can give the full equation for our line of best fit: \(\widehat{y} = a + b\text{x}\)
Example \(\PageIndex{5}\)
Construct the regression line equation for to predict Total Carbs (\(\widehat{y}).
Solution
\[\widehat{y} = 11.57 + 1.30\text{x} \nonumber \]
Let’s look at that regression line on our scatterplot generated using statistical software.
It looks like our regression line is a pretty good match for our data! But to make sure, let’s make sure that we have a statistically significant model using this regression line. Table \(\PageIndex{4}\) shows an ANOVA Summary Table with the Sum of Squares already included from an analysis conducted using statistical software.
Source 
\(SS\) 
\(df\) 
\(MS\) 
\(F\) 

Model 
1038.88 

Error 
497.89 

Total 

Example \(\PageIndex{6}\)
Fill in the rest of the ANOVA Summary Table from Table \(\PageIndex{4}\).
Solution
Source 
\(SS\) 
\(df\) 
\(MS\) 
\(F\) 

Model 
1038.88 
1 
\(MS_M = \dfrac{SS_M}{df_M} = \dfrac{1038.88}{1} = 1038.88 \) 
\(F = \dfrac{MS_M}{MS_E} = \dfrac{1038.88}{45.26} = 22.95\) 
Error 
497.889 
\(N  2 = 13  2 = 11\) 
\(MS_E = \dfrac{SS_E}{df_E} = \dfrac{497.89}{11} = 45.26 \) 
N/A 
Total 
173.99 
\(N  1 = 13  1 = 14\) 
N/A  N/A 
We can do a computation check to make sure that our Degrees of Freedom are correct since the sum of the Model's df and the Error's df should equal the Total df. And since 1 + 11 = 12, we're on the right track.
This gives us an obtained \(F\) statistic of 22.95, which we will now use to test our hypothesis.
Step 4: Make the Decision
We now have everything we need to make our final hypothesis testing decision. Our calculated test statistic was \(F_{Calc} = 22.95\) and our critical value was \(F_{Crit} = 4.84\). Since our calculated test statistic is greater than our critical value, we can reject the null hypothesis because this is still true:
Note
Critical \(<\) Calculated \(=\) Reject null \(=\) There is a linear relationship. \(= p<.05 \)
Critical \(>\) Calculated \(=\) Retain null \(=\) There is not a linear relationship. \(= p>.05\)
WriteUp: Reporting the Results
Let’s use the four components needed for reporting results to organize our conclusions so far.
Example \(\PageIndex{7}\)
Add text hereDescribe the four components for reporting results for this scenario in complete sentences.
Solution
 For our sample of 13 chicken dishes from one fast food restaurant, the average Total Fat was 26.31 and the average Total Carb was 45.77.
 The research hypothesis was that here will be a positive slope in the regression line for Total Fat and Total Carbs.
 The ANOVA results were F(1,11)=22.95, p<.05, showing that our regression line could be used to make predictions.
 The results are statistically significant, and the positive Sum or Products shows a positive slope so the research hypothesis is supported.
Using Regression
But really, what does that even mean? How is any of that useful? Well, it turns out that because our regression equation was found to be statistically significant, we can use it to make predictions.
Example \(\PageIndex{8}\)
Imagine that you went to the fast food restaurant that this data is from, and they had a new chicken meal with even lower fat than anything currently on their menu (10 grams of Total Fat). Use the regression line equation to estimate how many Total Carbs would be in this new meal. Don’t forget to end with a complete sentence.
Solution
With this regression line equation:
\[\widehat{y} = 11.57 + 1.30\text{x} \nonumber \]
We would replace the “x” placeholder with our new value of 10 grams of Total Fat to find:
\[\widehat{y} = 11.57 + (1.30 * 10) \nonumber \]
\[\widehat{y} = 11.57 + (13.00) \nonumber \]
\[\widehat{y} = 24.57 \nonumber \]
The estimated Total Carbs of this new chicken meal with 10 Total Fat is 24.57 grams of carbohydrates.
That’s not what we hypothesized, but this is the whole point of regression, to use a variable (or variables) that we have to predict variables that we don’t have. Knowing this, let’s try a final conclusion.
In our sample of 13 chicken dishes from one fast food restaurant, we found the average Total Fat to be 26.31 and the average Total Carbs to be 45.77. This data was used to create a statistically significant regression equation (F(1, 11)=22.95, p<.05), which supports the research hypothesis that there would be a positive slope. If the fast food restaurant came out with a new lowfat chicken dish (10 grams of fat), we can predict that the Total Carbs of that dish would be 24.57 grams.
Accuracy in Prediction
Wanna know how accurate that prediction is? Square root the Mean Square of the Error term from the ANOVA Summary Table. As this formula shows, this is similar to the standard deviation formula because it follows the same logic:
\[S_{(Y\widehat{Y})}=\sqrt{\dfrac{\sum(Y\widehat{Y})^{2}}{N2}} \nonumber \]
But since we have the MS of the Error in Table \(\PageIndex{5}\), we can just square root 42.26:
\[s_{(Y\widehat{Y})}=\sqrt{MS_E} = \sqrt{45.26} = 6.73 \nonumber\]
So on average, our predictions will be almost 7 (6.73) grams away from the actual values. There are no specific cutoffs or guidelines for how big our standard error of the estimate can or should be. What do you think? Is a variation around the prediction of 7 grams of carbohydrates precise enough? There’s no right or wrong answer, just your thoughts and opinion!
Hopefully you saw how regression equations can be useful for making predictions about values that we don’t have yet based on samples of two variables that we do already have data for. In the next section, you’ll learn how we can combine multiple variables to predict one variable.
Contributors and Attributions
Foster et al. (University of MissouriSt. Louis, Rice University, & University of Houston, Downtown Campus)