Skip to main content
Statistics LibreTexts

8.2: The South Sudanese Referendum

  • Page ID
    57745
  • \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \( \newcommand{\dsum}{\displaystyle\sum\limits} \)

    \( \newcommand{\dint}{\displaystyle\int\limits} \)

    \( \newcommand{\dlim}{\displaystyle\lim\limits} \)

    \( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

    ( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\id}{\mathrm{id}}\)

    \( \newcommand{\Span}{\mathrm{span}}\)

    \( \newcommand{\kernel}{\mathrm{null}\,}\)

    \( \newcommand{\range}{\mathrm{range}\,}\)

    \( \newcommand{\RealPart}{\mathrm{Re}}\)

    \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

    \( \newcommand{\Argument}{\mathrm{Arg}}\)

    \( \newcommand{\norm}[1]{\| #1 \|}\)

    \( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

    \( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

    \( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

    \( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

    \( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \( \newcommand{\vectorC}[1]{\textbf{#1}} \)

    \( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

    \( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

    \( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

    \( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

    \(\newcommand{\longvect}{\overrightarrow}\)

    \( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

    \(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)

    This page presents a case study in electoral forensics, applying linear regression to a pressing political question: Was the 2011 South Sudanese independence referendum conducted fairly?

    We begin with a fundamental statistical test for electoral fairness: the independence between a voter's intent and its probability of being invalidated. You will transform vote-count data, perform a linear regression on logit-transformed proportions, and interpret the results as evidence for or against electoral fairness. Crucially, you will also learn how to visualize this analysis effectively by plotting data points, a prediction curve, and confidence bands in the original proportion units — revealing the story behind the numbers with compelling clarity. The workflow demonstrates how statistical modeling, when paired with thoughtful visualization, can provide powerful insights into complex real-world events.

    ✦•················• ✦ •··················•✦

    Electoral Forensics Background

    Location of South Sudan
    Republic of South Sudan
    (Source: Wikipedia)

    Free and fair elections are one of the requirements for a legitimate democratic system; furthermore, being a legitimate democratic State is necessary for some forms of external assistance. As such, many not-so-democratic States wish to appear democratic. They hold elections, but the elections are either fraudulent or the electoral system (rules governing the elections) is unfair.

    There are many definitions for fairness in an election, but they all contain the same requirement that a person's vote has the same probability of being counted as anyone else's. In other words, the probability of a vote being invalidated is independent of the characteristics of the person casting the vote — including who the vote was for. This aspect of fairness can actually be tested in elections where the number of invalidated votes is counted: If the proportion of the vote for a specific candidate or position is not independent of the proportion of the vote invalidated in the electoral division, then there is evidence against this assumption of fairness.

    The Research Question

    And so, with this background in elections and democracy, ...

    Research Question:

    Does the 2011 independence referendum in southern Sudan indicate an issue with fairness?

    Narrative Solution

    As one of the conditions of the 2005 Naivasha Agreement, which ended the civil war in Sudan, the South was allowed to vote on independence from the North. That referendum was held January 9–15, 2011. Official results stated that 98.83% of the South Sudanese voted against unity and in favor of independence.

    The xsd2011referendum data set contains the number of votes in favor of independence (Secession), the number of votes declared invalid (Invalid), and the total number of votes cast (Votes). Load it and save it into the xsd variable without attaching the data.

    ### Preamble
    
    library(KnoxStats)
    library(car)
    library(lmtest)
    
    xsd = read.csv("https://rur.kvasaheim.com/data/xsd2011referendum.csv")

    Because we need to determine if there is a relationship between the proportion of the vote for a specific side and the proportion of the vote invalidated in the electoral division, and because we just have vote counts, we need to create those proportions. The proportion of the vote for the candidate is the number of votes for the candidate divided by the number of valid votes. The invalidation rate is the number of invalid ballots divided by the number of cast ballots.

    xsd$Valid = xsd$Votes - xsd$Invalid
    xsd$pSec = xsd$Secession/xsd$Valid
    xsd$pInv = xsd$Invalid/xsd$Votes

    Once that is done, we need to transform these proportions using the logit transformation (why?), perform linear regression, and check for a (linear) relationship. If a relationship exists in the transformed variables, then a relationship exists in the untransformed variables. First, however, it is always a good idea to plot the variables to see if there is an obvious answer to the question. Figure \(\PageIndex{1}\) gives a plot of the proportion of the vote invalidated against the proportion of the vote in favor of independence.

    fig-ch01_patchfile_01.jpg
    Figure \(\PageIndex{1}\): A scatterplot of the results of the 2011 referendum on independence for South Sudan. Note the apparent presence of a relationship between these two variables. As such, there appears to be evidence that the election was not fair for those voting against independence.

    Suggested by the plot, there appears to be a strong relationship between the two variables, evidence of an election that is not fair. Because of the direction of the slope, it appears as though those areas voting most strongly in favor of independence had a much lower probability of having their votes rejected.

    As we are using the logit transform, we must drop any electoral division (here, county) that has zero invalid votes or zero votes in favor of secession. We need to do this because the domain of the logit function is \(p \in (0, 1)\).

    Question:

    How will removing these counties affect the conclusions drawn?

    To easily do this in R, we can use the which function, which returns which entries have the provided condition. Thus,

    dr = which(xsd$Invalid==0)
    

    returns a vector of values \(\{15, 19, 23, 24, 28, 46, 47, 49, 50, 57, 72, 73\}\).

    These numbers correspond to the counties that had zero invalid votes cast. Storing this vector in the variable dr (for "drop") allows us to remove those counties from any subsequent calculations. As such, our proportion calculations are:

    p.ind = xsd$Secession[-dr]/xsd$Votes[-dr]
    p.inv = xsd$Invalid[-dr]/xsd$Votes[-dr]
    

    The negative signs tells R to return values in the vector other than these entries (to drop these entries).

    And so, the two lines to transform the dependent variable and fit the OLS model are

    l.inv = logit(p.inv)
    model.xsd = lm(l.inv ~ p.ind)
    

    The results of the linear regression on the transformed dependent variable are given in the table below. There is a very strong relationship between the proportion of the vote invalidated in the county and the proportion of the vote in favor of secession: Those counties with a greater proportion of people voting for independence also had a lower proportion of the vote invalidated. That there is a strong relationship between these two variables is troubling.

                                          Estimate   StdErr   t-value     p-value
    Constant term                           1.8978   0.7690     2.468      0.0155
    Proportion of Vote for Independence    -9.3991   0.8287   -11.342   << 0.0001
    

    Table: Results table for the South Sudan referendum. The results are in logit units. Note the high level of statistical significance in the effect of the proportion of the vote in favor of independence. This very strongly suggests a lack of fairness in the election.

    To make this relationship more obvious, and to make our point stronger, we can plot the data, the prediction curve, and the 95% Working-Hotelling confidence bands on the same plot.

    Recall that what confidence intervals are to univariate data, confidence bands are to bivariate data.

    The Graphing Philosophy of R

    In base R, the philosophy behind graphing using base graphics is to start with a fresh plot and paint successive layers on top of it. This allows us to create graphs that tell the story and to do so easily. To make the graph described above, we need to

    1. Plot the points (displayed in proportion units),
    2. Plot the prediction curve (displayed in proportion units, but calculated in logit units),
    3. Plot the 95% confidence bands (displayed in proportion units, but calculated in logit units).

    The first step has been done already (Figure \(\PageIndex{1}\)).

    The second step requires the repeated use of the predict function. First, to make things easier, let us define newX as a series of "proportion of vote in favor of independence" values for which we want to make predictions:

    newX = seq(0, 1, length=1e4)
    

    This creates a vector containing 10,000 values equally spaced between 0 and 1.

    With this, our predict statement will be

    l.pred = predict(model.xsd, newdata=data.frame(p.ind=newX), se.fit=TRUE)
    
    Note

    The se.fit=TRUE parameter, which calculates the standard error of the fit at that x-value, will be important for calculating the confidence bands. This is just a courtesy from R, as we know how to calculate this value from earlier in this book.

    Remember that these predictions are in logit units. To get them into level units, we just apply the logistic function to these point predictions:

    p.pred = logistic(l.pred$fit)
    

    The $fit selects only the fitted predictions from the l.pred variable. This is necessary as we are also using the se.fit=TRUE parameter.

    Now that we have the predictions in the original units, we merely paint it on the current plot (from Step 1):

    lines(newX, p.pred)
    

    The third step requires us to calculate the 95% confidence bands and paint them on the plot as well. For want of better estimates, let us use the Working-Hotelling bands. The formula to calculate the 95% confidence bands is

    prwh = predictWH(model.xsd)

    Once again, we must back-transform these two variables using the logistic function. So, our final confidence bands are

    ucb = logistic(prwh$ucb)
    lcb = logistic(prwh$lcb)
    

    Finally, we paint this on the current plot with

    lines(newX, ucb, col="grey")
    lines(newX, lcb, col="grey")
    

    This gives the Figure \(\PageIndex{2}\), below.

    fig-ch01_patchfile_01.jpg
    Figure \(\PageIndex{2}\): A plot of the results of the South Sudan referendum. Included are the prediction line (in black) and the 95\% confidence bands (in grey). Note that a horizontal line cannot fit between the confidence bands. This indicates a statistically significant relationship between the proportion of the votes invalidated and the proportion of the votes in favor of independence. This, in turn, supports the conclusion of an unfair election.

    Note that the predictions are curved in these units; they are straight in logit units. Also note the confidence bands are wider where the value of \(x\) is farther from \(\bar{x}\). Lastly, note that no horizontal line can fit between the two confidence bands. This illustrates that there is a statistically significant relationship between the two variables at the \(\alpha=0.05\) level.

    Question:

    This illustrates that there is a statistically significant relationship between the two variables at the \(\alpha=0.05\) level. (Why?)

    Note that the figure gives the same information as the regression table. The difference is that the graphic tells a clear story.

    Caution

    Graphs usually make the points more manifest.

    The Graphic's Code

    Here is the code I used to obtain Figure \(\PageIndex{2}\). There are some interesting things in it. Please pore through the code to know what each line does.

    par(mar=c(4,4,0,0)+0.5)
    par(xaxs="i", yaxs="i")
    par(family="serif", las=1)
    par(font.lab=2, cex.lab=1.2)
    par(xpd=NA)
    
    
    plot.new()
    plot.window(xlim=c(0,1), ylim=c(0,0.1))
    
    axis(1, at=0:10/10); axis(2)
    title(xlab="Support for Succession", line=2.5)
    title(ylab="Invalidation Rate", line=2.75)
    
    points(p.ind,p.inv, pch=21, bg="lavender")
    
    lines(newX, p.pred)
    lines(newX, ucb, col="grey")
    lines(newX, lcb, col="grey")

    This page titled 8.2: The South Sudanese Referendum is shared under a CC BY-NC-SA 4.0 license and was authored, remixed, and/or curated by Ole Forsberg.

    • Was this article helpful?