10.5: Simpson's Paradox
- Page ID
- 64727
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Capital punishment has been controversial for many decades in the United States. One of the major concerns is that capital punishment may be unevenly applied to defendants based on factors such as race, ethnicity, and socioeconomic status. One study considered data on race of the defendant and whether the death penalty was imposed for defendants from twenty Florida counties during 1976–1977 (Radelet 1981). The study looked at the issue in response to new capital punishment statutes that were enacted after a supreme court ruling. The counties were selected randomly with larger counties being more likely to be selected. Hence the data is based on a cluster sample, but without simple random sampling of the clusters. All cases with either African American or white defendants for which there was complete data were included in the study. A cross-classified frequency table of the race of the defendant and whether they received the death penalty for the observed data is given in Table 10.16.
Table 10.16 Race of the defendant and whether the death penalty was imposed for defendants from twenty Florida counties during 1976–1977.
|
Death Penalty |
|||
|
Defendant |
Yes |
No |
Total |
|
White |
19 |
141 |
160 |
|
0.058 |
0.433 |
0.491 |
|
|
5.8% |
43.3% |
49.1% |
|
|
African American |
17 |
149 |
166 |
|
0.052 |
0.457 |
0.509 |
|
|
5.2% |
45.7% |
50.9% |
|
|
Total |
36 |
290 |
326 |
|
0.110 |
0.890 |
1.0001 |
|
|
11% |
89% |
100% |
|
Looking at the data in table, there does not really seem to be any bias in the application of the death penalty as related to race. There are 160 white defendants in the study and 19 of them, or \(19\div 160\times 100%=11.9%\) of the white defendants received the death penalty. Similarly, there are a total of 166 African American defendants in the study and 17 of them, or \(17\div 166\times 100%=10.2%\) of the African American defendants received the death penalty. The percentages are close, and if anything, it appears that the rate at which African American defendants receive the death penalty is less than for white defendants. This is a surprising result given the secondary evidence available on this subject, and it takes a little more detailed analysis to see what is going on with this data.
The frequency distribution given in Table 10.17 has been broken down even further, and includes not only the race of the defendant, but also the race of the victim. When this second classification is added, something very interesting is observed. First consider white defendants. When a white defendant was convicted of murdering a white victim, \(19\div 151\times 100%=12.6%\) of the defendants received the death penalty. When a white defendant was convicted of murdering an African American victim, none of the defendants received the death penalty. These entries alone show an incredible bias for white defendants who murdered African American victims. Moving to the case of African American defendants, those convicted of murdering a white victim received the death penalty \(11\div 63\times 100%=17%\) of the time, which is larger than the rate for white defendants convicted of murdering white victims. Finally, for African American defendants who were convicted of murdering African American victims, the defendant received the death penalty \(6\div 103\times 100%=5.8%\) of the time. It is clear from this data that African American defendants convicted of murdering white victims receive the death penalty in far more cases that any other, and that when the victim is African American, the defendant receives the death penalty in far fewer cases.
Table 10.17 Race of the defendant and whether the death penalty was imposed for defendants from twenty Florida counties during 1976–1977 broken down by race of the victim.
|
Death Penalty |
||||
|
Defendant |
Victim |
Yes |
No |
Total |
|
White |
White |
19 |
132 |
151 |
|
0.058 |
0.405 |
0.463 |
||
|
5.8% |
40.5% |
46.3% |
||
|
White |
African American |
0 |
9 |
9 |
|
0.000 |
0.003 |
0.003 |
||
|
0% |
0.3% |
0.3% |
||
|
African American |
White |
11 |
52 |
63 |
|
0.033 |
0.160 |
0.193 |
||
|
3.3% |
16.0% |
19.3% |
||
|
African American |
African American |
6 |
97 |
103 |
|
0.018 |
0.298 |
0.316 |
||
|
1.8% |
29.8% |
31.6% |
||
|
Total |
36 |
290 |
326 |
|
|
0.110 |
0.890 |
1.000 |
||
|
11.0% |
89.0% |
100% |
||
This example begs the obvious question: Why are the trends in the second table not visible in the first table? There are several things going on in Table 10.17 that contribute to why a trend is not observed in Table 10.16, but what essentially happens is that the low rate of the African American defendants being given the death penalty for African American victims is added to the high rate of African American defendants being given the death penalty for white victims. Essentially, these two opposing trends cancel each other out. In the end what is really covered up is two biases: one against African American defendants and one against African American victims.
The way the opposite trends can average themselves together to cover each other up is an example of what is known in statistics as Simpson's Paradox. Someone reading a research paper and looking at the results of cross-classified frequency distributions, or even for a single frequency distribution, should be aware of this potential problem, particularly when the trend exhibited in a frequency distribution is unusual or goes against common knowledge about an issue. It is, however, very difficult to ascertain whether a frequency distribution may have this problem unless additional data are provided. The problem here is essentially one of confounding variables, and just as we considered confounding when we considered empiricism, a critical evaluator of research must consider potential confounding variables when evaluating any frequency distributions.

