16.4: Sri Lanka in 2010
- Page ID
- 57789
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\(\newcommand{\longvect}{\overrightarrow}\)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)For our second full example, let us examine the official returns from the 2010 presidential election in Sri Lanka. In fact, let us check to see if there is evidence of differential invalidation. The term "differential invalidation" refers to certain types of ballots having a higher probability of being invalidated. If this invalidation is a function of the ballot recipient, then the election is unfair.
Is there evidence of differential invalidation in the 2010 presidential election in Sri Lanka?
If ballots cast for Candidate X have a higher probability of being invalidated, then the election is unfair against Candidate X. Since we only have ballot counts at the electoral division level, such an unfair election would show itself by having a significant relationship between the invalidation rate at the division level and the support level for the candidate.
Thus, the dependent variable is the proportion of ballots declared invalid by the official counters. The independent variable is the level of support for Candidate X. Both are measured at the electoral division level. If the relationship between these two variables is statistically significant, then we have evidence of differential invalidation. If the slope is also negative, then this helps Candidate X.
These previous paragraphs illustrate limitations of using aggregate data (measurement unit) when we want to draw conclusions about the individual (experimental unit). We must say that the results are "consistent with" our hypothesis about the individual. We can conclude there is "evidence of" our hypothesis. That is about all.
The ecological fallacy is a logical error that arises when characteristics observed at the group level are attributed to individuals within that group. Essentially, it involves assuming that what's true for a population is automatically true for every individual within it. In statistics, this means that findings on aggregated data are not necessary true for the individuals being aggregated.
- For example, we know that states with higher percentages of college graduates also tend to have lower rates of obesity. It would be fallacious to conclude that college graduates are less likely to be obese.
- An example in US elections: We know that states with higher proportion of their residents being African-American tend to be more Republican. It would be wrong to conclude that African-Americans tend to be Republican.
Theory Application
An applied statistician needs to be able to translate between the theory of the scientist and the theory of the statistician. This frequently takes a lot of time and practice. Remember to be able to ask questions and to present results from different standpoints.
To illustrate this symbiosis between science and statistics, let's apply this theory to the 2010 Sri Lankan election. In 2010, Sri Lanka held a presidential election between incumbent Mahinda Rajapaksa and challenger General Sarath Fonseka. Both were instrumental in ending the Sri Lankan civil war between the Sinhalese and the Tamils.
Arguably, the two were not as effective as nature. The 2004 Boxing Day Indian Ocean earthquake caused a tsunami that swamped the Tamil navy — a blow from which the rebel Tamils did not recover.
As was expected for Sri Lanka at this time, there was violence on election day, intimidation throughout the campaign, and claims of fraud by both parties. To secure his victory, Rajapaksa had Fonseka arrested and imprisoned. The entire scene is explored by Ratnayake in That Blue Thing: An Engineer's Travel.
With that brief background, the data are located at: http://rur.kvasaheim.com/data/sri2010pres.csv
Load the data as usual, but do not attach it… yet. Examine the data. Look at the relationships in the data. Think of the meanings of the data. Become one with the data.
Always do this if you are exploring the data; you need to be as aware of the data as possible.
Arguably, this is the most under-appreciated part of statistical analysis. It is also the most tricky. As always, know the difference between exploratory and confirmatory analysis. If you are exploring the data and relationships in the data, then make sure you explore all of their nooks and crannies.
If, on the other hand, you are testing hypotheses about the data (confirmatory analysis), then you will not be doing any explorations. You will simply be testing for statistically significant relationships in your model. Always be aware of your purpose.
Doing this here, you will see several records identified as "postal" and several as "displaced." The former refer to votes sent in the mail. The latter refer to votes by people outside their division. At this point in Sri Lankan history, the country was still suffering the after-effects of a decades-long civil war. A large proportion of the population was displaced from their homelands… especially from the north and east of the country. The "displaced" ballots are the votes of these people.
Note that the number of displaced and postal votes is rather low. We probably should consider dropping these records for a couple of reasons. The most important reason is that the number of invalidated ballots in many of these records is 0, and zeroes tend to cause problems in analyses. As an illustration of this, calculate a 95% confidence interval for a proportion when x=0 and n=10. The standard Wald confidence interval gives a confidence interval from 0 to 0. This is one reason Agresti and Coull (1998) created a new confidence interval for proportion data.
A second reason is that it is unclear where these votes were counted and who counted them. If we are interested in checking how the government in each division affected the vote counting, then the Postal and Displaced votes muddy the argument.
Of course, it is best to create two models (as I do here), one with the displaced and postal votes and one without, and fit both models. Remember the goal of science it to better understand relationships. Creating and interpretation multiple models allows us to have a greater insight into the relationships.
For the record, there is not much of a difference between the two models (see Figures \(\PageIndex{2}\) and \(\PageIndex{3}\), below). The figure below is a graphic for the models fit on the data with the Postal and Displaced ballots removed; the next figure, for the models fit on the entire data set. Since the two graphics are very similar, we have more evidence that the underlying model does describe reality well. That is, we are much more confident in our conclusion that there is differential invalidation and that it benefited Mahinda Rajapaksa.
Interpretation
From the graphics and the results of the statistical analysis, we can conclude that there is a statistically significant relationship between the invalidation rate and the support level for Mahinda Rajapaksa (\(p \ll 0.0001\)). Furthermore, that relationship is negative. This means that those divisions that supported Rajapaksa more had a higher proportion of their votes counted.
Since the relationship is significant and negative, electoral theory lets us conclude that there is strong evidence of differential invalidation in the 2010 election, and that it helped Rajapaksa retain the presidency.
Did you remember to check if these models violate the assumptions? Checking for overdispersion leads us to use maximum quasi-likelihood as our estimation method. Checking model fit (runs test) tells us that these models are appropriate. Always check your models.


