10.4: Cross Classified Data
- Page ID
- 64726
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)When more than one variable is observed, it is often of interest to researchers to explore whether the two variables have an association between them. That is, does one value or range of values of one of the variables tend to occur along with one value or range of values of the other variable. As we have seen, it could be that there is a direct relationship between the variables, but it could also be the case that the variables are both related some other common variables that have not been observed. In either case, knowledge of such an association can be useful to researchers either in predicting one variable from another, or in understanding how different variables are associated with or affect social conditions.
If the frequency distribution of two variables are computed separately, we do not obtain any information about how the two variables behave in relation to one another. These types or relationships can be studied using frequency distributions as long as both variables are tracked simultaneously. To consider this idea in detail, we return to the example data given in Table 10.1. The frequency distribution for the race and ethnicity variable is given in Table 10.3. While the means were computed for the income variable with respect to each category of race, the frequency distribution for the income variable was given in Table 10.9. Note that Tables 10.3 and 10.9 do not give us the same information that we see with the means in Table 10.2, namely that African American alumni have more debt than individuals in the other categories. This is because Tables 10.3 and 10.9 are considering the two variables separately and therefore do not contain any information of the simultaneous behavior of the variables.
To get information on how the two variables behave simultaneously using frequency distributions, a researcher must construct what is called a cross-classified frequency table. This is based on the same idea as a frequency table with the difference that the table contains frequencies for all possible pairs of values of the two variables that can occur within an observation. The relative frequencies and percentages are computed as detailed earlier.
A cross-classified frequency distribution of two variables is a table that contains the frequencies, relative frequencies, and percentages of the number of times each pair of values occur in an observation of the two variables in the data.
An example of a cross-classified frequency distribution is given in Table 10.14. The frequency distribution shown in this table is a cross classified table for the data on race and ethnicity and student debt at graduation for the data shown in Table 10.1. The columns of this table correspond to the different classifications of race and ethnicity, and the rows of this table correspond to the amount of debt at graduation using the classes defined earlier.
Table 10.14 Cross-classified frequency distribution of debt at graduation by race and ethnicity for the sample of 100 alumni. The first row of each entry is the observed frequency, the second row is the relative frequency, and the third row is the percentage.
|
Race and Ethnicity |
|||||
|
Debt |
AF |
AS |
HI |
WH |
Total |
|
0 to 9 |
4 |
4 |
9 |
21 |
38 |
|
0.04 |
0.04 |
0.09 |
0.21 |
0.38 |
|
|
4% |
4% |
9% |
21% |
38% |
|
|
10 to 19 |
1 |
0 |
3 |
6 |
10 |
|
0.01 |
0.00 |
0.03 |
0.06 |
0.10 |
|
|
1% |
0% |
3% |
6% |
10% |
|
|
20 to 29 |
1 |
0 |
8 |
12 |
21 |
|
0.01 |
0.00 |
0.08 |
0.12 |
0.21 |
|
|
1% |
0% |
8% |
12% |
21% |
|
|
30 to 39 |
1 |
1 |
1 |
11 |
14 |
|
0.01 |
0.01 |
0.01 |
0.11 |
0.14 |
|
|
1% |
1% |
1% |
11% |
14% |
|
|
40 to 49 |
1 |
0 |
1 |
8 |
10 |
|
0.01 |
0.00 |
0.01 |
0.08 |
0.10 |
|
|
1% |
0% |
1% |
8% |
10% |
|
|
50 to 59 |
3 |
0 |
0 |
3 |
6 |
|
0.03 |
0.00 |
0.00 |
0.03 |
0.06 |
|
|
3% |
0% |
0% |
3% |
6% |
|
|
60 to 69 |
0 |
0 |
0 |
1 |
1 |
|
0.00 |
0.00 |
0.00 |
0.01 |
0.01 |
|
|
0% |
0% |
0% |
1% |
1% |
|
|
Total |
11 |
5 |
22 |
62 |
100 |
|
0.11 |
0.05 |
0.22 |
0.62 |
1.00 |
|
|
11% |
5% |
22% |
62% |
100% |
|
While this table looks very large and complicated, it is as simple to interpret as any frequency distribution if we keep the cross-classifications in mind. For example, the uppermost left cell of this table corresponds to individuals from the study who are African American and whose debt at graduation was between 0 and 9 thousand dollars. The frequency distribution reports that there are four individuals that matched both characteristics, corresponding to 4% of the total number of observations. For the row 20 to 29 under the column for Hispanic, we can observe that there are 12 individuals that matched both characteristics, corresponding to 12% of the total number of observations. Note that this table also allows us to easily compare the frequency distributions of the debt by race and vice-versa. In this case it does not appear, for example, that other than the number of individuals within each race and ethnicity classification, there does not seem to be a major difference in the frequency distributions of debt by race, except for the fact that African American students have a relatively large frequency for the debt class 50 to 59. This is the reason that the African American individuals have the large mean and median debt as shown in Table 10.2.
Earlier we considered a research study where twenty-five students who identified as female were randomly sampled from the graduating class at a small midwestern college. The researchers used eight questions based on the perceptions regarding gender discrimination. The observations were then added to get a score on how much gender discrimination was experienced. The observed data are given in Table 10.4. As all eight questions used by the measurement system are related to experiences with gender discrimination, we would expect that the responses should be associated with one another. That is, an individual who has experienced gender discrimination would likely give higher scores to many of the questions, whereas an individual who has not experienced gender discrimination may tend to give lower scores to each of the questions. Therefore, we would expect that if an individual score on question as a 5 on Likert scale, then they would tend to give higher scores for the other questions as well. To investigate whether this trend holds for the data observed in Table 10.4 we have computed a cross classified frequency distribution from the observed responses from the first question and the second question. This frequency distribution is given in Table 10.15.
Table 10.15 Cross-classified frequency distributions for the first two questions of the survey on gender discrimination.
|
Question 1 |
Question 2 Response |
|||||
|
Response |
1 |
2 |
3 |
4 |
5 |
Total |
|
1 |
4 |
0 |
1 |
0 |
0 |
5 |
|
0.16 |
0.00 |
0.04 |
0.00 |
0.00 |
0.10 |
|
|
16% |
0% |
4% |
0% |
0% |
20% |
|
|
2 |
1 |
2 |
1 |
0 |
0 |
4 |
|
0.04 |
0.08 |
0.04 |
0.00 |
0.00 |
0.16 |
|
|
4% |
8% |
4% |
0% |
0% |
16% |
|
|
3 |
2 |
1 |
0 |
2 |
0 |
5 |
|
0.08 |
0.04 |
0.00 |
0.08 |
0.00 |
0.20 |
|
|
8% |
4% |
0% |
8% |
0% |
20% |
|
|
4 |
0 |
0 |
1 |
2 |
1 |
4 |
|
0.00 |
0.00 |
0.04 |
0.08 |
0.04 |
0.16 |
|
|
0% |
0% |
4% |
8% |
4% |
16% |
|
|
5 |
0 |
0 |
0 |
2 |
5 |
7 |
|
0.00 |
0.00 |
0.00 |
0.08 |
0.20 |
0.28 |
|
|
0% |
0% |
0% |
8% |
20% |
28% |
|
|
Total |
7 |
3 |
3 |
6 |
6 |
25 |
|
0.28 |
0.12 |
0.12 |
0.24 |
0.24 |
1.00 |
|
|
28% |
12% |
12% |
24% |
24% |
100% |
|
First note that all the frequencies are quite small in this table. Even with the small frequencies, there is an interesting trend to note in this table. Looking at the individuals who responded with a score of either 1 or 2 for the first question, one can observe that all of the corresponding responses for the second question are also either 1, 2, or 3. Similarly, individuals who responded with a score of either 4 or 5 for the first question also responded to the second question with either 3, 4, or 5. Therefore, the response to the first question tends to provide a information about what the response will be for the second question. From the data, those who gave a low score on the first question also tended to give a low score on the second question. Those who gave a high score on the first question also tended to give a high score on the second question.

