Beware of Software for Fixed Effects Negative Binomial Regression
June 8, 2012 By Paul Allison
If you’ve ever considered using Stata or LIMDEP to estimate a fixed effects negative binomial regression model for count data, you may want to think twice. Here’s the story:
For panel data with repeated measures, fixed effects regression models are attractive for their ability to control for unobserved variables that are constant over time. They accomplish this by introducing an additional parameter for each individual in the sample. Fixed effects models come in many forms depending on the type of outcome variable: linear models for quantitative outcomes, logistic models for dichotomous outcomes, and Poisson regression models for count data (Allison 2005, 2009).
Logistic and Poisson fixed effects models are often estimated by a method known as conditional maximum likelihood. In conditional likelihood, the “incidental parameters” for each individual are conditioned out of the likelihood function. Specifically, for each individual, the contribution to the likelihood function is conditioned on the sum of the repeated measures, thereby eliminating the individual-specific parameters from the likelihood function.
Conditional likelihood has two advantages: 1. Conditional likelihood can greatly reduce computer time and memory requirements because the individual-specific parameters don’t have to be estimated. 2. For logistic models, conditional likelihood eliminates what’s known as “incidental parameters bias,” which can be quite severe when the number of repeated measurements per individual is small. Incidental parameter bias does not occur with Poisson models, however.
The problem with Poisson regression models is that count data frequently suffer from overdispersion—the conditional variance is larger than the conditional mean. As a consequence, both standard errors and p-values are too low, sometimes way too low.
An effective alternative is negative binomial regression, which generalizes the Poisson regression model by introducing a dispersion parameter. Most statistical software packages now have procedures for doing negative binomial regression. But can you do conditional maximum likelihood for a fixed effects negative binomial regression model? If so, how?
In 1984, Hausman, Hall and Griliches (hereafter HHG) proposed a conditional likelihood method for negative binomial regression that has been in available in Stata and LIMDEP for several years. It has also been recently introduced as an experimental procedure in SAS called TCOUNTREG. Unfortunately, the HHG method does not qualify as a true fixed effects method because it does not control for unchanging covariates.
As I explained in a 2002 paper with Richard Waterman, the problem with the HHG negative binomial method is that it allows for individual-specific variation in the dispersion parameter rather than in the conditional mean. As a result, unlike other conditional likelihood methods, you can put time-invariant covariates into an HHG model and get non-zero coefficient estimates for those variables. And those coefficients will often be statistically significant. Guimarães (2008) and Greene (2005) reached the same conclusion about the HHG method.
What’s the solution? I know of three reasonable options. First, you can do unconditional estimation of a fixed effects negative binomial model simply by including dummy (indicator) variables for all individuals. That can be computationally demanding for conventional software if the number of individuals is large. However, LIMDEP has a computational method for unconditional fixed effects models that is extremely computationally efficient. What about “incidental parameters bias”? Although I’m not aware of any proof that unconditional negative binomial estimation yields consistent estimators, the simulations that Waterman and I reported in our 2002 paper are very encouraging on that score.
The second viable approach is to estimate a random effects negative binomial model with all the time-varying covariates expressed as deviations from the individual-specific means. That “hybrid method” is described in Chapter 4 of my book Fixed Effects Regression Methods for Longitudinal Data Using SAS. Since the hybrid method does not require the estimation of individual-specific parameters, there is no reason to expect that it would suffer from incidental parameters bias.
A third approach is the approximate conditional score method described in my 2002 paper with Waterman. That method appears to have good statistical properties, but it is not easily implemented in commercial software.
Allison, Paul D. (2005) Fixed Effects Regression Methods for Longitudinal Data Using SAS. Cary, NC: The SAS Institute.
Allison, Paul D. (2009) Fixed Effects Regression Models. Thousand Oaks, CA: Sage Publications.
Allison, Paul D. and Richard Waterman (2002) “Fixed effects negative binomial regression models.” In Ross M. Stolzenberg (ed.), Socological Methodology 2002. Oxford: Basil Blackwell. Download.
Greene, William (2005) “Functional form and heterogeneity in models for count data.” Foundations and Trends in Econometrics 1: 113–218
Guimarães, P., (2008), “The fixed effects negative binomial model revisited.” Economics Letters, 99: 63-66.
Hausman, Jerry, Hall, Bronwyn H. and Griliches, Zvi (1984.) “Econometric models for count data with an application to the patents-R&D relationship.” Econometrica, 52: 909-938.