2.2: Tabular Displays
( \newcommand{\kernel}{\mathrm{null}\,}\)
Relative Frequency Tables
Frequencies by themselves are not as useful to tell other people what is going on in the data. If you want to know what percentage the category is of the total sample then we can use the relative frequency of each category. The relative frequency is just the frequency divided by the total. The relative frequency is the proportion in each category and may be given as a decimal, percentage, or fraction.
Using the car data’s frequency distribution, we will create a third column labeled relative frequency. Take each frequency and divide by the sample size, see Figure 2-2. The relative frequencies should add up to one (ignoring rounding error).
Type of Car | Frequency | Relative Frequency |
---|---|---|
Chevy | 12 | 12/50 = 0.24 |
Ford | 5 | 5/50 = 0.1 |
Honda | 6 | 6/50 = 0.12 |
Nissan | 10 | 10/50 = 0.2 |
Toyota | 12 | 12/50 = 0.24 |
Other | 5 | 5/50 = 0.1 |
Total | 50 | 1 |
Figure 2-2
Many people understand percentages better than proportions so we may want to multiply each of these decimals by 100% to get the following relative frequency percent table.
Type of Car | Percent |
---|---|
Chevy | 24 |
Ford | 10 |
Honda | 12% |
Nissan | 20% |
Toyota | 24% |
Other | 10% |
Total | 100% |
Figure 2-3
We can summarize the car data and see that for college students Chevy and Toyota make up 48% of the car models.
Excel
Recall the frequency table for the credit card applicants. The relative frequency table for the random sample of 35 ages from credit card applications follows.
ClassFrequencyRelative Frequency 20-24 4 4/35 = 0.1143 25-29 12 12/35 = 0.3429 30-34 6 6/35 = 0.1714 35-39 4 4/35 = 0.1143 40-44 2 2/35 = 0.0571 45-49 7 7/35 = 0.2 Total 35 1Making a relative frequency table using Excel
Solution
In Excel, type your frequencies into a column and then in the next column type in =(cell reference number)/35. Then copy and paste the formula in cell B2 down the page. If you used the cell reference number, Excel will automatically change the copied cells to the next row down.
You get the following relative frequency table. The sum of the relative frequencies will be one (the sum may add to 0.9999 or 1.0001 if you are doing the calculations by hand due to rounding).
To get Excel to show the percentage instead of the proportion by highlighting the relative frequencies and selecting the percent % button on the Home tab.
Class | Frequency | Relative Frequency | Relative Frequency Percent |
20-24 | 4 | 0.1143 | 11% |
25-29 | 12 | 0.3429 | 34% |
30-34 | 6 | 0.1714 | 17% |
35-39 | 4 | 0.1143 | 11% |
40-44 | 2 | 0.0571 | 6% |
45-49 | 7 | 0.2 | 20% |
Total | 35 | 1 | 100% |
The relative frequency table lets us quickly see that a little more than half 34% + 20% = 54% of the ages of the credit card holders are between the ages of 25 – 29 and 45 – 49 years-old.
Cumulative & Cumulative Relative Frequency Tables
Another useful piece of information is how many data points fall below a particular class. As an example, a teacher may want to know how many students received below a 70%, a doctor may want to know how many adults have cholesterol above 160, or a manager may want to know how many stores gross less than $2,000 per day. This calculation is known as a cumulative frequency and is used for ordinal or quantitative data. If you want to know what percent of the data falls below a certain class, then this fact would be a cumulative relative frequency.
To create a cumulative frequency distribution, count the number of data points that are below the upper class limit, starting with the first class and working up to the top class. The last upper class should have all of the data points below it.
Recall the credit card applicants. Make a cumulative frequency table.
ClassFrequencyRelative Frequency Percent 20-24 4 11% 25-29 12 34% 30-34 6 17% 35-39 4 11% 40-44 2 6% 45-49 7 20% Total 35 100%Solution
To find the cumulative frequency, carry over the first frequency of 4 to the first row of the cumulative frequency column. Then take this 4 and add it to the next frequency of 12 to get 16 for the second cumulative frequency value. For the third cumulative frequency, take 16 and add to the third frequency of 6 to get 22. Keep doing this additive process until you finish the column.
The cumulative frequency in the last class should have the last number as the sample size. The cumulative relative frequency is just the cumulative frequency divided by the sample size. The cumulative relative frequencies should have one as the last number.
Class | Frequency | Relative Frequency | Cumulative Frequency | Cumulative Relative Frequency |
20-24 | 4 | 0.1143 | 4 | 4/35 = 0.1143 |
25-29 | 12 | 0.3429 | 4 + 12 = 16 | 16/35 = 0.4571 |
30-34 | 6 | 0.1714 | 16 + 6 = 22 | 22/35 = 0.6286 |
35-39 | 4 | 0.1143 | 22 + 4 = 26 | 26/35 = 0.7429 |
40-44 | 2 | 0.0571 | 26 + 2 = 28 | 28/35 = 0.8 |
45-49 | 7 | 0.2 | 28 + 7 = 35 | 35/35 = 1 |
Total | 35 | 1 |
Use Excel to add up the values for you.
You can also express the cumulative relative frequencies as a percentage instead of a proportion.
Class | Cumulative Frequency | Cumulative Relative Frequency |
20-24 | 4 | 11 |
25-29 | 16 | 46 |
30-34 | 22 | 63 |
35-39 | 26 | 74 |
40-44 | 28 | 80 |
45-49 | 35 | 100 |
If a manager wanted to know how many applicants were under the age of 40 we could look across the 35-39 year-old class to see that there were 26 applicants that were under 40 (39 years old or younger), or if we used the cumulative relative frequency, 74% of the applicants were under 40 years old. If the manager wants to know how many applicants were over 44, we could subtract 100% – 80% = 20%.
Contingency Tables
A contingency table provides a way of portraying data that can facilitate calculating probabilities. A contingency table summarizes the frequency of two qualitative variables. The table displays sample values in relation to two different variables that may be dependent or contingent on one another. There are other names for contingency tables. Excel calls the contingency table a pivot table. Other common names are two-way table, cross-tabulation, or cross-tab for short.
A fitness center coach kept track of members over the last year. They recorded if the person stretched before they exercised, and whether they sustained an injury. The following contingency table shows their results. Find the relative frequency for each value in the table.
Injury No Injury Stretched 52 270 Did Not Stretch 21 57Solution
Each value in the table represents the number of times a particular combination of variable outcomes occurred, for example, there were 57 members that did not stretch and did not sustain an injury. It is helpful to total up the categories. The row totals provide the total counts across each row (i.e. 52 + 270 = 322), and column totals are total counts down each column (i.e. 52 + 21 = 73). See Figure 2-4.
Injury | No Injury | Total | |
Stretched | 52 | 270 | 322 |
Did Not Stretch | 21 | 57 | 78 |
Total | 73 | 327 | 400 |
Figure 2-4
We can quickly summarize the number of athletes for each category. The bottom right-hand number in Figure 2-4 is the grand total and represents the total number of 400 people. There were 322 people that stretched before exercising. There were 73 people that sustained an injury while exercising, etc.
If we find the relative frequency for each value in the table, we can find the proportion of the 400 people for each category. To find a relative frequency we divide each value in the table by the grand total. See Figure 2-5.
Injury | No Injury | Total | |
Stretched | 52/400 = =0.13 | 270/400 = 0.675 | 322/400 = 0.805 or 0.13 + 0.675 = 0.805 |
Did Not Stretch | 21/400 = 0.0525 | 57/400 = 0.1425 | 78/400 = 0.195 or 0.0525 + 0.1425 = 0.195 |
Total | 73/400 = 0.1825 or 0.13 + 0.0525 = 0.1825 | 327/400 = 0.8175 or 0.675 + 0.1425 = 0.8175 | 400/400 = 1 or the sum of either the row or column totals |
Figure 2-5
When data is collected, it is usually presented in a spreadsheet where each row represents the responses from an individual or case.
“Of course, one never has the slightest notion what size or shape different species are going to turn out to be, but if you were to take the findings of the latest Mid‐Galactic Census report as any kind of accurate Guide to statistical averages you would probably guess that the craft would hold about six people, and you would be right. You'd probably guessed that anyway. The Census report, like most such surveys, had cost an awful lot of money and didn't tell anybody anything they didn't already know - except that every single person in the Galaxy had 2.4 legs and owned a hyena. Since this was clearly not true the whole thing had eventually to be scrapped.” (Adams, 2002)
Make a pivot table using Excel. A random sample of 500 records from the 2010 United States Census were downloaded to Excel. Below is an image of just the first 20 people.
There are seven variables:
- State
- Total family income (in United States dollars)
- Age
- Biological Sex (with reported categories female and male)
- Race (with reported categories American Indian or Alaska Native, Black, Chinese, Japanese, Other Asian or Pacific Islander, Two major races, White, Other)
- Marital status (with reported categories Divorced, Married/spouse absent, Married/spouse present, Never married/single, Separated, Widowed)
- Total personal income (in United States dollars).
Solution
In Excel, select the Insert tab, then select Pivot Table.
Excel should automatically select all 500 rows in the Table/Range cell, if not then use your mouse and highlight all the data including the labels. Then select OK.
Each version of Excel may look different at this point. One common area, though, is the bottom right-hand drag and drop area of the Pivot Table dialogue box.
Drag the sex variable to the COLUMNS box, and marital status variable to the ROWS box.
You will see the contingency table column and row headers appear as you drop the variables in the boxes.
To get the counts to appear, drag and drop marital status into the Values box and the default will usually say, “Count of maritalStatus,” if not then change it to count in the drop-down menu. A contingency table should appear on your spreadsheet as you fill in the pivot table dialogue box. Contingency tables go by many names including pivot tables, two-way tables, cross tabulations, or cross-tabs.
Count of Marital Status | Column Labels | ||
Row Labels | Female | Male | Grand Total |
Divorced | 21 | 17 | 38 |
Married/spouse absent | 5 | 9 | 14 |
Married/spouse present | 92 | 100 | 192 |
Never married/single | 93 | 129 | 222 |
Separated | 1 | 2 | 3 |
Widowed | 20 | 11 | 31 |
Grand Total | 232 | 268 | 500 |