2.2: Tabular Displays
- Page ID
- 24024
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Frequency Tables for Quantitative Data
To create many of these graphs, you must first create the frequency distribution. The idea of a frequency distribution is to take the interval that the data spans and divide it into equal sized subintervals called classes. The grouped frequency distribution gives either the frequency (count) or the relative frequency (usually expressed as a percent) of individuals who fall into each class. When creating frequency distributions, it is important to note that the number of classes that are used and the value of the first class boundary will change the shape of, and hence the impression given by, the distribution. There is no one correct width to the class boundaries. It usually takes several tries to create a frequency distribution that looks just the way you want. As a reader and interpreter of such tables, you should be aware that such features are selected so that the table looks the way it does to show a particular point of view. For small samples, we usually have between four and seven classes, and as you get more data you will need more classes.
We will start with an example of a random sample of 35 ages from credit card applications given below in no particular order. Organize the data in a frequency distribution table.
Note that the class limits should never overlap; we call these mutually exclusive classes. For example, if your class went from 20-25, 25-30 etc., the 25-year-olds would fall within both classes. Also, make sure each of the classes have the same width. A more formal way to pick your classes uses the following process.
Steps involved in making a frequency distribution table:
- Find the range = largest value – smallest value.
- Pick the number of classes to use. Usually the number of classes is between five and twenty. Five classes are used if there are a small number of data points and twenty classes if there are a large number of data points (over 1,000 data points).
- Class width = \(\frac{\text { range }}{\text { # of classes }}\). Always round up to the next integer (if the answer is already a whole number, go to the next integer). If you do not round up, your last class will not contain your largest data value, and you would have to add another class just for it. If you round up, then your largest data value will fall in the last class, and there are no issues.
- Create the classes. Each class has limits that determine which values fall in each class. To find the class limits, set the smallest value in the data set as the lower class limit for the first class. Then add the class width to the lower class limit to get the next lower class limit. Repeat until you get all the classes. The upper class limit for a class is one less than the lower limit for the next class.
- If your data value has decimal places, then round up the class width to the nearest value with the same number of decimal places as the original data. As an example, if your data was out to two decimal places, and you divided your range by the number of classes to get 4.8333, then the class width would round the second decimal place up and end on 4.84.
- The frequency for a class is the number of data values that fall in the class.
For the age data let us use 6 classes. Find the range by taking 49 – 20 = 29 and divide this by the number of classes 29/6 = 4.8333. Round this number up to 5 and use 5 for the class width. Once you determine your class width and class limits, place each of these classes in a table and then tally up the ages that fall within each class. Count the tally marks and record the number in the frequency table.
The total of the frequency column should be the number of observations in the data. You may want to total the frequencies to make sure you did not leave any of the numbers out of the table.
Class Limits | Tally | Class | Frequency | |
20-24 | 4 | 20-24 | 4 | |
25-29 | 12 | 25-29 | 12 | |
30-34 | 6 | 30-34 | 6 | |
35-39 | 4 | 35-39 | 4 | |
40-44 | 2 | 40-44 | 2 | |
45-49 | 7 | 45-49 | 7 | |
Total | 35 |
Figure 2-1
Using the frequency table, we can now see that there are more people in the 25- 29 year-old class, followed by the 45-49 year-old class.
We call this most frequent category the modal class. There may be no mode at all or more than one mode.
Frequency Tables for Qualitative Data
A frequency distribution can also be made for qualitative data.
Suppose you have the following data for which type of car students at a college drive. Make a frequency table to summarize the data.
Category | Frequency |
---|---|
Chevy | 12 |
Ford | 5 |
Honda | 6 |
Nissan | 10 |
Toyota | 12 |
Other | 5 |
Total | 50 |
For nominal data, either alphabetize the classes or arrange the classes from most frequent to least frequent, with the “other” category always at the end. For ordinal data put the classes in their order with the “other” category at the end.
Relative Frequency Tables
Frequencies by themselves are not as useful to tell other people what is going on in the data. If you want to know what percentage the category is of the total sample then we can use the relative frequency of each category. The relative frequency is just the frequency divided by the total. The relative frequency is the proportion in each category and may be given as a decimal, percentage, or fraction.
Using the car data’s frequency distribution, we will create a third column labeled relative frequency. Take each frequency and divide by the sample size, see Figure 2-2. The relative frequencies should add up to one (ignoring rounding error).
Type of Car | Frequency | Relative Frequency |
---|---|---|
Chevy | 12 | 12/50 = 0.24 |
Ford | 5 | 5/50 = 0.1 |
Honda | 6 | 6/50 = 0.12 |
Nissan | 10 | 10/50 = 0.2 |
Toyota | 12 | 12/50 = 0.24 |
Other | 5 | 5/50 = 0.1 |
Total | 50 | 1 |
Figure 2-2
Many people understand percentages better than proportions so we may want to multiply each of these decimals by 100% to get the following relative frequency percent table.
Type of Car | Percent |
---|---|
Chevy | 24 |
Ford | 10 |
Honda | 12% |
Nissan | 20% |
Toyota | 24% |
Other | 10% |
Total | 100% |
Figure 2-3
We can summarize the car data and see that for college students Chevy and Toyota make up 48% of the car models.
Excel
Recall the frequency table for the credit card applicants. The relative frequency table for the random sample of 35 ages from credit card applications follows.
Class | Frequency | Relative Frequency | Relative Frequency Percent |
20-24 | 4 | 0.1143 | 11% |
25-29 | 12 | 0.3429 | 34% |
30-34 | 6 | 0.1714 | 17% |
35-39 | 4 | 0.1143 | 11% |
40-44 | 2 | 0.0571 | 6% |
45-49 | 7 | 0.2 | 20% |
Total | 35 | 1 | 100% |
The relative frequency table lets us quickly see that a little more than half 34% + 20% = 54% of the ages of the credit card holders are between the ages of 25 – 29 and 45 – 49 years-old.
Cumulative & Cumulative Relative Frequency Tables
Another useful piece of information is how many data points fall below a particular class. As an example, a teacher may want to know how many students received below a 70%, a doctor may want to know how many adults have cholesterol above 160, or a manager may want to know how many stores gross less than $2,000 per day. This calculation is known as a cumulative frequency and is used for ordinal or quantitative data. If you want to know what percent of the data falls below a certain class, then this fact would be a cumulative relative frequency.
To create a cumulative frequency distribution, count the number of data points that are below the upper class limit, starting with the first class and working up to the top class. The last upper class should have all of the data points below it.
Recall the credit card applicants. Make a cumulative frequency table.
Class | Frequency | Relative Frequency | Cumulative Frequency | Cumulative Relative Frequency |
20-24 | 4 | 0.1143 | 4 | 4/35 = 0.1143 |
25-29 | 12 | 0.3429 | 4 + 12 = 16 | 16/35 = 0.4571 |
30-34 | 6 | 0.1714 | 16 + 6 = 22 | 22/35 = 0.6286 |
35-39 | 4 | 0.1143 | 22 + 4 = 26 | 26/35 = 0.7429 |
40-44 | 2 | 0.0571 | 26 + 2 = 28 | 28/35 = 0.8 |
45-49 | 7 | 0.2 | 28 + 7 = 35 | 35/35 = 1 |
Total | 35 | 1 |
Use Excel to add up the values for you.
You can also express the cumulative relative frequencies as a percentage instead of a proportion.
Class | Cumulative Frequency | Cumulative Relative Frequency |
20-24 | 4 | 11 |
25-29 | 16 | 46 |
30-34 | 22 | 63 |
35-39 | 26 | 74 |
40-44 | 28 | 80 |
45-49 | 35 | 100 |
If a manager wanted to know how many applicants were under the age of 40 we could look across the 35-39 year-old class to see that there were 26 applicants that were under 40 (39 years old or younger), or if we used the cumulative relative frequency, 74% of the applicants were under 40 years old. If the manager wants to know how many applicants were over 44, we could subtract 100% – 80% = 20%.
Contingency Tables
A contingency table provides a way of portraying data that can facilitate calculating probabilities. A contingency table summarizes the frequency of two qualitative variables. The table displays sample values in relation to two different variables that may be dependent or contingent on one another. There are other names for contingency tables. Excel calls the contingency table a pivot table. Other common names are two-way table, cross-tabulation, or cross-tab for short.
A fitness center coach kept track of members over the last year. They recorded if the person stretched before they exercised, and whether they sustained an injury. The following contingency table shows their results. Find the relative frequency for each value in the table.
Injury | No Injury | Total | |
Stretched | 52 | 270 | 322 |
Did Not Stretch | 21 | 57 | 78 |
Total | 73 | 327 | 400 |
Figure 2-4
We can quickly summarize the number of athletes for each category. The bottom right-hand number in Figure 2-4 is the grand total and represents the total number of 400 people. There were 322 people that stretched before exercising. There were 73 people that sustained an injury while exercising, etc.
If we find the relative frequency for each value in the table, we can find the proportion of the 400 people for each category. To find a relative frequency we divide each value in the table by the grand total. See Figure 2-5.
Injury | No Injury | Total | |
Stretched | 52/400 = =0.13 | 270/400 = 0.675 | 322/400 = 0.805 or 0.13 + 0.675 = 0.805 |
Did Not Stretch | 21/400 = 0.0525 | 57/400 = 0.1425 | 78/400 = 0.195 or 0.0525 + 0.1425 = 0.195 |
Total | 73/400 = 0.1825 or 0.13 + 0.0525 = 0.1825 | 327/400 = 0.8175 or 0.675 + 0.1425 = 0.8175 | 400/400 = 1 or the sum of either the row or column totals |
Figure 2-5
When data is collected, it is usually presented in a spreadsheet where each row represents the responses from an individual or case.
“Of course, one never has the slightest notion what size or shape different species are going to turn out to be, but if you were to take the findings of the latest Mid‐Galactic Census report as any kind of accurate Guide to statistical averages you would probably guess that the craft would hold about six people, and you would be right. You'd probably guessed that anyway. The Census report, like most such surveys, had cost an awful lot of money and didn't tell anybody anything they didn't already know - except that every single person in the Galaxy had 2.4 legs and owned a hyena. Since this was clearly not true the whole thing had eventually to be scrapped.” (Adams, 2002)
Make a pivot table using Excel. A random sample of 500 records from the 2010 United States Census were downloaded to Excel. Below is an image of just the first 20 people.
There are seven variables:
- State
- Total family income (in United States dollars)
- Age
- Biological Sex (with reported categories female and male)
- Race (with reported categories American Indian or Alaska Native, Black, Chinese, Japanese, Other Asian or Pacific Islander, Two major races, White, Other)
- Marital status (with reported categories Divorced, Married/spouse absent, Married/spouse present, Never married/single, Separated, Widowed)
- Total personal income (in United States dollars).
Solution
In Excel, select the Insert tab, then select Pivot Table.
Excel should automatically select all 500 rows in the Table/Range cell, if not then use your mouse and highlight all the data including the labels. Then select OK.
Each version of Excel may look different at this point. One common area, though, is the bottom right-hand drag and drop area of the Pivot Table dialogue box.
Drag the sex variable to the COLUMNS box, and marital status variable to the ROWS box.
You will see the contingency table column and row headers appear as you drop the variables in the boxes.
To get the counts to appear, drag and drop marital status into the Values box and the default will usually say, “Count of maritalStatus,” if not then change it to count in the drop-down menu. A contingency table should appear on your spreadsheet as you fill in the pivot table dialogue box. Contingency tables go by many names including pivot tables, two-way tables, cross tabulations, or cross-tabs.
Count of Marital Status | Column Labels | ||
Row Labels | Female | Male | Grand Total |
Divorced | 21 | 17 | 38 |
Married/spouse absent | 5 | 9 | 14 |
Married/spouse present | 92 | 100 | 192 |
Never married/single | 93 | 129 | 222 |
Separated | 1 | 2 | 3 |
Widowed | 20 | 11 | 31 |
Grand Total | 232 | 268 | 500 |