16.4: Practice Goodness of Fit- Pineapple on Pizza

Last updated
Save as PDF

Page ID: 18147

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \) \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)\(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\) \(\newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\) \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\) \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\) \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\) \( \newcommand{\Span}{\mathrm{span}}\)\(\newcommand{\AA}{\unicode[.8,0]{x212B}}\)

There is a very passionate and on-going debate on whether or not pineapple should go on pizza. Being the objective, rational data analysts that we are, we will collect empirical data to see if we can settle this debate once and for all. We gather data from a group of adults asking for a simple Like/Dislike answer.

Step 1: State the Hypotheses

We start, as always, with our hypotheses. Chi-Square focuses on patterns of relationship, so that's what the hypotheses in words should talk about. Let's go through research hypothesis to see how this all works out.

Example \(\PageIndex{1}\)

What is the research hypothesis in words for this scenario? Make sure to list which group you think will have a higher frequency.

Solution

Research hypothesis in words: There will be a pattern of difference such that there will be more people who dislike pineapple on their pizza than people who like pineapple on their pizza.

The hypotheses in symbols focus on probabilities, but because of how Chi-Square works, we can only say that the probabilities will not be equal.

Reseach hypothesis in symbols: \(P_{Like}\neq 0.50, P_{Dislike} \neq 0.50, or P \neq (0.50, 0.05)\)

The probability of 0.50 (which means a 50% chance) was found by knowing that we only have two options: Like or Dislike. All probabilities add up to 100% chance, so with only two options, we find \(\dfrac{100}{2} = 50 \) which means that the P (probability) is 0.50.

If this research hypothesis in symbols doesn't make sense, it might be easier to start with a null hypothesis in words and symbols, then figure out how that works out for the research hypothesis.

Example \(\PageIndex{2}\)

What is the null hypothesis in words and symbols for this scenario?

Solution

Null hypothesis in words: There is no pattern of difference based on liking pineapple on pizza.
Null hypothesis in symbols: \(P_{Like}\neq 0.50, P_{Dislike} \neq 0.50, or P \neq (0.50, 0.05)\)

Let's move on to an easier step!

Step 2: Find the Critical Value

Per usual, we will leave \(α\) at its typical level of 0.05. You can find the Critical Values of Chi-Square Table earlier in this chapter, or look for the link in the Common Critical Values page at the end of this book.

Exercise \(\PageIndex{1}\)

What is the critical value for this scenario?

Answer: We have two options in our data (Like or Dislike), which will give us two categories (k=2). The Degrees of Freedom is found through k-1, so we will have 1 df (k-1=2-1=1). From our \(\chi^{2}\) table of critical values, we find a critical value of 3.841 for our \(\alpha\) of p=0.05.

See, that was easy! How, the slightly-less-easy-but-not-that-hard step of calculating the Chi-Square test statistic.

Step 3: Calculate the Test Statistic

The results of the data collection are presented in Table \(\PageIndex{1}\).

Table \(\PageIndex{1}\): Results of Data collection
	Like	Dislike	Total
Observed	19	26	19+26=45

First, let's find the Expected values, then we'll fill have what we need to complete the full calculation.

Example \(\PageIndex{3}\)

With two categories and 45 scores, what is the Expected frequency?

Solution

\[E = \dfrac{45}{2} = 22.50 \nonumber \]

We can use the Observed and the Expected frequencies to calculate our \(\chi^{2}\) statistic either through a table or individually in the equation:

\[\chi^{2}=\sum_{Each}\left(\dfrac{\left(E-O\right)^{2}}{E} \right)\nonumber \]

The first example will use the table.

Example \(\PageIndex{4}\)

Complete the calculations labeled to fill in Table \(\PageIndex{2}\).

Table \(\PageIndex{2}\)- Table to Complete Chi-Square Formula
	Like	Dislike	Total
Observed	19	26	45.00
Expected	22.50	22.50	22.50+22.50=45.00
Difference Score (E Minus O)
Difference Score Squared
Diff² divided by Expected

Solution

Table \(\PageIndex{3}\)- Table to Complete Chi-Square Formula
	Like	Dislike	Total
Observed	19	26	45.00
Expected	22.50	22.50	45.00
Difference Score (O Minus E)	22.50-19=3.50	22.50-26=-3.50	N/A
Difference Score Squared	\(3.50^2 = 12.25 \)	\(-3.50^2 = 12.25 \)	N/A
Diff² divided by Expected	\(\dfrac{12.25}{22.50} = 0.54 \)	\(\dfrac{12.25}{22.50} = 0.54 \)	0.54+0.54=1.08

You might have noticed that there are still two empty cells. You can add up the Difference Scores (they equal zero in this example) and the squared Difference Scores (they equal 24.50), but we don't use them for the \(\chi^2\) formula, so you can save some time and not calculate them.

Also, if you used a spreadsheet, the final sum of \(\dfrac{Diff^2}{E} = 1.09\); those darn rounding differences!

What would this look like in the Chi-Square formula?

Example \(\PageIndex{5}\)

Use the \(\chi^2\) formula with the Observed frequencies and Expected frequencies to calculate the test statistic for \(\chi^2\):

\[\chi^{2}=\sum_{Each}\left(\dfrac{\left(E-O\right)^{2}}{E} \right)\nonumber \]

Solution

\[\chi^{2}= \dfrac{(22.50-19)^{2}}{22.50} + \dfrac{(22.50-26)^{2}}{22.50} = 0.54 + 0.54 = 1.08 \nonumber \]

Using the table to calculate the \(\chi^2\) and using the formula resulted in the same result (because you are doing the same things mathematically). It's your choice which option that you prefer. It seems easier to use the formula when there are so few categories (k), but the table seems easier to use when there are more categories. The table is also easier to use if you're using a spreadsheet.

Now that we have the calculated \(\chi^2\), we can make the decision!

Step 4: Make the Decision

Our observed test statistic had a value of 1.08 and our critical value was 3.84. What do we do if this is still true?

Note

Slightly modified from earlier versions to fit the hypotheses, but the idea is the same:

Critical \(<\) Calculated \(=\) Reject null \(=\) There is a pattern of relationship. \(= p<.05\)

Critical \(>\) Calculated \(=\) Retain null \(=\) There is no pattern of relationship. \(= p>.05\)

Based on this note...

Example \(\PageIndex{6}\)

Do we retain or reject the null hypothesis?

Solution

Because our critical value is larger than our calculated value, we retain the null hypothesis.

The debate rages on.

Exercise \(\PageIndex{2}\)

What would our results look like in the statistical sentence?

Answer: \(\chi^2\)(1)=1.08, p>.05

Write-Up

How might we write this up? We can't quite fulfill the four requirements for reporting results because there are no means to include. Instead, let's include the Observed frequencies.

Example \(\PageIndex{7}\)

Report the results in a concluding paragraph that includes the four requirements (but use Observed frequencies instead of descriptive statistics).

Solution

The research hypothesis was that there would be a pattern of difference such that more people would dislike pineapples on pizza than like pineapples on pizza. This research hypothesis was not supported (\(\chi^2\)(1)=1.08, p>.05). There does not seem to be a pattern of difference; of our 45 participants, 19 people like pineapple on pizza, and 26 people dislike pineapple on pizza.

That's it! If you want more practice, check out this blog post about the frequency of the different colors of M&Ms.

We now move on to the other kind of Chi-Square analysis, the Test of Independence.

Contributors and Attributions

Foster et al. (University of Missouri-St. Louis, Rice University, & University of Houston, Downtown Campus)
Dr. MO (Taft College)