17.3: Overdispersion
- Page ID
- 57796
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\( \newcommand{\dsum}{\displaystyle\sum\limits} \)
\( \newcommand{\dint}{\displaystyle\int\limits} \)
\( \newcommand{\dlim}{\displaystyle\lim\limits} \)
\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)
( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\id}{\mathrm{id}}\)
\( \newcommand{\Span}{\mathrm{span}}\)
\( \newcommand{\kernel}{\mathrm{null}\,}\)
\( \newcommand{\range}{\mathrm{range}\,}\)
\( \newcommand{\RealPart}{\mathrm{Re}}\)
\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)
\( \newcommand{\Argument}{\mathrm{Arg}}\)
\( \newcommand{\norm}[1]{\| #1 \|}\)
\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)
\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)
\( \newcommand{\vectorA}[1]{\vec{#1}} % arrow\)
\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}} % arrow\)
\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vectorC}[1]{\textbf{#1}} \)
\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)
\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)
\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)
\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)
\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)
\(\newcommand{\avec}{\mathbf a}\) \(\newcommand{\bvec}{\mathbf b}\) \(\newcommand{\cvec}{\mathbf c}\) \(\newcommand{\dvec}{\mathbf d}\) \(\newcommand{\dtil}{\widetilde{\mathbf d}}\) \(\newcommand{\evec}{\mathbf e}\) \(\newcommand{\fvec}{\mathbf f}\) \(\newcommand{\nvec}{\mathbf n}\) \(\newcommand{\pvec}{\mathbf p}\) \(\newcommand{\qvec}{\mathbf q}\) \(\newcommand{\svec}{\mathbf s}\) \(\newcommand{\tvec}{\mathbf t}\) \(\newcommand{\uvec}{\mathbf u}\) \(\newcommand{\vvec}{\mathbf v}\) \(\newcommand{\wvec}{\mathbf w}\) \(\newcommand{\xvec}{\mathbf x}\) \(\newcommand{\yvec}{\mathbf y}\) \(\newcommand{\zvec}{\mathbf z}\) \(\newcommand{\rvec}{\mathbf r}\) \(\newcommand{\mvec}{\mathbf m}\) \(\newcommand{\zerovec}{\mathbf 0}\) \(\newcommand{\onevec}{\mathbf 1}\) \(\newcommand{\real}{\mathbb R}\) \(\newcommand{\twovec}[2]{\left[\begin{array}{r}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\ctwovec}[2]{\left[\begin{array}{c}#1 \\ #2 \end{array}\right]}\) \(\newcommand{\threevec}[3]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\cthreevec}[3]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \end{array}\right]}\) \(\newcommand{\fourvec}[4]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\cfourvec}[4]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \end{array}\right]}\) \(\newcommand{\fivevec}[5]{\left[\begin{array}{r}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\cfivevec}[5]{\left[\begin{array}{c}#1 \\ #2 \\ #3 \\ #4 \\ #5 \\ \end{array}\right]}\) \(\newcommand{\mattwo}[4]{\left[\begin{array}{rr}#1 \amp #2 \\ #3 \amp #4 \\ \end{array}\right]}\) \(\newcommand{\laspan}[1]{\text{Span}\{#1\}}\) \(\newcommand{\bcal}{\cal B}\) \(\newcommand{\ccal}{\cal C}\) \(\newcommand{\scal}{\cal S}\) \(\newcommand{\wcal}{\cal W}\) \(\newcommand{\ecal}{\cal E}\) \(\newcommand{\coords}[2]{\left\{#1\right\}_{#2}}\) \(\newcommand{\gray}[1]{\color{gray}{#1}}\) \(\newcommand{\lgray}[1]{\color{lightgray}{#1}}\) \(\newcommand{\rank}{\operatorname{rank}}\) \(\newcommand{\row}{\text{Row}}\) \(\newcommand{\col}{\text{Col}}\) \(\renewcommand{\row}{\text{Row}}\) \(\newcommand{\nul}{\text{Nul}}\) \(\newcommand{\var}{\text{Var}}\) \(\newcommand{\corr}{\text{corr}}\) \(\newcommand{\len}[1]{\left|#1\right|}\) \(\newcommand{\bbar}{\overline{\bvec}}\) \(\newcommand{\bhat}{\widehat{\bvec}}\) \(\newcommand{\bperp}{\bvec^\perp}\) \(\newcommand{\xhat}{\widehat{\xvec}}\) \(\newcommand{\vhat}{\widehat{\vvec}}\) \(\newcommand{\uhat}{\widehat{\uvec}}\) \(\newcommand{\what}{\widehat{\wvec}}\) \(\newcommand{\Sighat}{\widehat{\Sigma}}\) \(\newcommand{\lt}{<}\) \(\newcommand{\gt}{>}\) \(\newcommand{\amp}{&}\) \(\definecolor{fillinmathshade}{gray}{0.9}\)Recall that one result of using the Poisson as our distribution of choice is that the residual variance and the expected value are assumed equal, because the probability mass function has \(a(\phi) = 1\). As in the case of the Binomial models, overdispersion means that \(a(\phi) > 1\).
In essence, this means \(\mathrm{V}[Y \mid X] > \mathrm{E}[Y \mid X]\). For a Poisson model (and for a Binomial model), the overdispersion measure equals the ratio of the residual deviance to the residual degrees of freedom. We use the "residual," since overdispersion is a function of the model, as well as the data. For the Initiative Use model, the overdispersion factor is \(681.68 / 48 = 14.2\). In other words, the level of unexplained variance is 14.2 times too high for this model.
Please revisit the section on binomial overdispersion for one way of testing whether \(681.68\) really is evidence of significant overdispersion. If you create the right test, you will get a critical value of just \(69\). Since \(681.86 \gg 69\), there is a lot of overdispersion in this data.
The code I used was qchisq( c(0.025,0.975), df=48 ).
Since overdispersion is a function of the model you are fitting to your data, the first solution is to determine if you are missing some important variables (or powers of variables). Frequently, modifying your linear predictor by adding appropriate variables will reduce the overdispersion.
Even though this is the most appropriate method in many ways, there is an extreme danger to using this method: you may need to include too many variables and combinations of variables to eliminate the overdispersion. This results in over-fitting the data; that is, you are fitting the data and not the data-generating process in which we are actually interested.
Thus, if you end up including too many variables before the overdispersion is treated, you may want to consider other options.
The first is to adjust the standard errors by hand. This frequently works acceptably, as the primary effect of overdispersion is to underestimate the standard errors. The second option is to fit the model using a different fitting technique, one that allows you to use the Poisson but also allows you to have a different relationship between the mean and variance. Maximum quasi-likelihood estimation (MQLE) is a common alternative to the usual maximum likelihood methods. Finally, you can fit the model using a different distribution, one that does not require the mean to equal the variance. The Negative Binomial is a common alternative to the Poisson.
Adjusting the Standard Errors
This first option adjusts the estimated standard errors to try to compensate for the overdispersion. Recall that the dispersion factor is the ratio of the residual variance to the expected variance. As the standard error is the square root of a variance, it would make sense that we could "fix" the overdispersion by multiplying the standard errors by the square root of the dispersion factor.
The table below presents the original standard error estimate along with the adjusted standard errors, z-values, and p-values. Note that the 1990 population was highly significant in the unadjusted model, but is not significant in the adjusted model (\(p=0.2578\)).
Parameter Estimate Original StdErr Adjusted StdErr z-value p-value Intercept 2.136 0.0814 0.307 6.960 << 0.0001 1990 Population -0.00000007433 0.00000001743 0.00000006568 -1.131 0.2578
The strength of this method is that it is easily performed. The drawback is that the correction is only an approximate estimate. In the era of expensive computational times, this method was commonly used; in the modern era of cheap computing, not-so-much. The next two methods are more appropriate in that their results are more statistically sound than this approximation.
Maximum Quasi-Likelihood Estimation
The maximum likelihood estimation method makes assumptions about the relationship between the mean and variance of the underlying distribution. For the Poisson distribution, that relationship is the identity function; that is, \(E[Y \mid X] = V[Y \mid X]\). The presence of overdispersion indicates that this relationship — or this value of \(a(\phi)\) — is incorrect. A different way of estimating the parameters is to use Maximum Quasi-Likelihood Estimation. This method allows for modeling different relationships between the expected value and variance for the distribution. It effectively includes an additional parameter for \(a(\phi)\).
The strength of using MQLE is that you can use the same distributions with which we are familiar, and the interpretation is identical. The weakness is that some statistical programs are not able to model using this method. Thankfully, R can. To model using MQLE in R, we prefix the distribution with the word quasi. Thus, we would use
glm(y~x, family=quasipoisson(link=log))
to fit this model. This command produces the results in the table above. Note that the coefficient estimates are the same as for the Poisson model. The difference is in the standard errors — they are increased. This reduction causes our z-values to decrease, resulting in increased p-values.
Parameter Estimate Original StdErr Adjusted StdErr z-value p-value Intercept 2.136 0.0814 0.345 6.103 << 0.0001 1990 Population -0.00000007433 0.00000001743 0.00000007492 -0.992 0.3261
The only two distributions that have the MQLE option in R are the Poisson (quasipoisson) and the Binomial (quasiBinomial). These are the only two popular distributions that have a specific numeric value for \(a(\phi)=1\). The rest have a value for \(a(\phi)\) that can be estimated from the data; for instance, the Gaussian has \(a(\phi) = \sigma^2\).
The Negative Binomial Family
In the Generalized Linear Model framework, you need to select an appropriate distribution that matches your dependent variables. If that variable is a count, then the sole requirements for that distribution are that the outcomes can only be discrete and non-negative. The Poisson is the usual distribution, but it is not the only one. An alternative distribution is the Negative Binomial. The Negative Binomial family allows for both over- and under-dispersion in the model. It does this by assuming the rate parameter \(\lambda\) in the Poisson is distributed as a Gamma random variable. Specifically, it assumes
\begin{align}
Y \mid \mu, \theta\ &\sim NegBin(\mu, \theta) \\[2em]
\text{where} \qquad \qquad Y \mid W &\sim Pois(\mu W) \\[1em]
\text{with} \qquad \qquad\qquad W &\sim \frac{1}{\theta} Gamma(\theta)
\end{align}
This results in \(E[W] = 1\) and \(V[W]=1/\theta\).
With this formulation, it can be shown that \(E[Y] = \mu\), \(V[Y]=\mu + \mu^2/\theta\), and that the probability mass function for \(Y\) is
\[ f(y; \theta) = \frac{ \Gamma(\theta+y) }{ \Gamma(\theta)\ y!}\ \frac{ \mu^y \theta^\theta}{(\mu+\theta)^{\theta+y}} \]
The strength of this formulation is that a greater number of variations are able to be fit.
The drawback is that interpreting the results is a bit more difficult. However, since we make the computer do all the heavy lifting, this drawback is minor. It does, however, introduce a new set of possible error messages and parameters that you may have to interpret.
The other drawback is that the Negative Binomial distribution is not a member of the exponential family (unless \(\theta\) is known, which it is not). As such, it cannot be used within the GLM paradigm (strictly speaking). With that said, fitting a model using the Negative Binomial distribution is just as easy as it is for any of the previous distributions.
In R, you will have to load the MASS package to use the Negative Binomial family, since it has its own regression function: glm.nb. The options for glm.nb are similar to those for glm.
Thus, the command
m2n = glm.nb(inituse~pop90, data=vcr, subset=(ccode!=93))
will perform Negative Binomial regression similar to the regression performed in the MQLE section. The first thing to notice is that the overdispersion is no longer relevant. With this, we can have more confidence in the parameter estimates (provided in the table above). The second thing to notice is that the effect of population is still no longer statistically significant. This agrees with our observation in the adjusted standard errors and MQLE sections. Finally, we notice that there are additional parameters estimated (at the bottom). The Theta is the estimated value of \(\theta\) in the Gamma distribution above.
Estimate StdErr z-value p-value
Constant Term 2.2376 0.4835 4.63 << 0.0001
Population in 1990 -0.00000010091 0.000000081903 -1.23 0.2179
This model estimates that Utah will have had approximately 7.9 initiatives during the 1990s. I leave it as an exercise to determine this.
The direction of the coefficient estimate is still directly comparable to the other coefficients estimates we have examined. The magnitudes are also comparable, but only to the other log-linked models. Thus, this model tells us that there is a negative relationship between the state's population and the level of initiative use (although it is not statistically significant).
That the direction is comparable is due to choosing a link function that is strictly increasing. That the estimates are comparable is due to having the same link function or same transform function.


