Skip to content

Beware of Software for Fixed Effects Negative Binomial Regression

Paul Allison
June 8, 2012

If you’ve ever considered using Stata or LIMDEP to estimate a fixed effects negative binomial regression model for count data, you may want to think twice. Here’s the story:


For panel data with repeated measures, fixed effects regression models are attractive for their ability to control for unobserved variables that are constant over time. They accomplish this by introducing an additional parameter for each individual in the sample. Fixed effects models come in many forms depending on the type of outcome variable: linear models for quantitative outcomes, logistic models for dichotomous outcomes, and Poisson regression models for count data (Allison 2005, 2009).

Logistic and Poisson fixed effects models are often estimated by a method known as conditional maximum likelihood. In conditional likelihood, the “incidental parameters” for each individual are conditioned out of the likelihood function. Specifically, for each individual, the contribution to the likelihood function is conditioned on the sum of the repeated measures, thereby eliminating the individual-specific parameters from the likelihood function.


Conditional likelihood has two advantages: 1. Conditional likelihood can greatly reduce computer time and memory requirements because the individual-specific parameters don’t have to be estimated. 2. For logistic models, conditional likelihood eliminates what’s known as “incidental parameters bias,” which can be quite severe when the number of repeated measurements per individual is small. Incidental parameter bias does not occur with Poisson models, however.

The problem with Poisson regression models is that count data frequently suffer from overdispersion—the conditional variance is larger than the conditional mean. As a consequence, both standard errors and p-values are too low, sometimes way too low.

An effective alternative is negative binomial regression, which generalizes the Poisson regression model by introducing a dispersion parameter. Most statistical software packages now have procedures for doing negative binomial regression. But can you do conditional maximum likelihood for a fixed effects negative binomial regression model? If so, how?

The Problem

In 1984, Hausman, Hall and Griliches (hereafter HHG) proposed a conditional likelihood method for negative binomial regression that has been in available in Stata and LIMDEP for several years. It has also been recently introduced as an experimental procedure in SAS called TCOUNTREG. Unfortunately, the HHG method does not qualify as a true fixed effects method because it does not control for unchanging covariates.

As I explained in a 2002 paper with Richard Waterman, the problem with the HHG negative binomial method is that it allows for individual-specific variation in the dispersion parameter rather than in the conditional mean. As a result, unlike other conditional likelihood methods, you can put time-invariant covariates into an HHG model and get non-zero coefficient estimates for those variables. And those coefficients will often be statistically significant. Guimarães (2008) and Greene (2005) reached the same conclusion about the HHG method.


What’s the solution? I know of three reasonable options. First, you can do unconditional estimation of a fixed effects negative binomial model simply by including dummy (indicator) variables for all individuals. That can be computationally demanding for conventional software if the number of individuals is large. However, LIMDEP has a computational method for unconditional fixed effects models that is extremely computationally efficient. What about “incidental parameters bias”? Although I’m not aware of any proof that unconditional negative binomial estimation yields consistent estimators, the simulations that Waterman and I reported in our 2002 paper are very encouraging on that score.

The second viable approach is to estimate a random effects negative binomial model with all the time-varying covariates expressed as deviations from the individual-specific means. That “hybrid method” is described in Chapter 4 of my book Fixed Effects Regression Methods for Longitudinal Data Using SAS. Since the hybrid method does not require the estimation of individual-specific parameters, there is no reason to expect that it would suffer from incidental parameters bias.

A third approach is the approximate conditional score method described in my 2002 paper with Waterman. That method appears to have good statistical properties, but it is not easily implemented in commercial software.


Allison, Paul D. (2005) Fixed Effects Regression Methods for Longitudinal Data Using SAS. Cary, NC: The SAS Institute.

Allison, Paul D. (2009) Fixed Effects Regression Models. Thousand Oaks, CA: Sage Publications.

Allison, Paul D. and Richard Waterman (2002) “Fixed effects negative binomial regression models.” In Ross M. Stolzenberg (ed.), Socological Methodology 2002. Oxford: Basil Blackwell.

Greene, William (2005) “Functional form and heterogeneity in models for count data.” Foundations and Trends in Econometrics 1: 113–218

Guimarães, P., (2008), “The fixed effects negative binomial model revisited.” Economics Letters, 99: 63­-66.

Hausman, Jerry, Hall, Bronwyn H. and Griliches, Zvi (1984.) “Econometric models for count data with an application to the patents-R&D relationship.” Econometrica, 52: 909-938.



  1. Do you know of any software for negative binomial regression that allows for proper handling of data from complex survey samples? SUDAAN has a procedure for poisson regression, but none for negative binomial regression (yet).

  2. I did community fixed effects negative binomial. And some of observations are dropped because of only one observation per group (community) and some are dropped because of a group has all zero outcomes. Do you have any explanation for this? Thank you.

    1. In your case, fixed effects only uses within-community variation. If there’s only one observation in a community or if everyone has a value of 0, there is no variation. This is typical of fixed effects. Assuming that the same regression coefficients apply to everyone, this should not cause any bias. But the loss of cases may increase standard errors.

  3. I need to check if the results of my study are consistant when I use a zero inflated negative binomial instead of negative binomial using STATA. I used firm dummy variables to control for fixed effects in both model. But Zero inflated model doesn’t converge as I have year dummy variables as well. I need to keep exatly the same RHS variables to compare the results.
    I appreciate if you have any suggestion.

  4. I am doing a longitudinal study with a Poisson distribution (with overdispersion of zeros) with weights and complex sampling. I was told that proc loglink in SUDAAN is not ideal for Poisson distributions because of overdispersion, proc glimmix in SAS doesn’t account for the complex design and proc svy STATA is good for the negative binomial regression but cannot do my study longitudinally. What would you consider most appropriate?

    1. PROC GLIMMIX can handle a lot of the survey sampling issues by defining appropriate random effects for the study design characteristics. In Stata, I’d use the nbreg command with the svy prefix and robust standard errors to deal with the longitudinal dependency.

  5. Dear Paul,
    thanks for your contributions.I try to follow your hybrid approach as described in your book. However, I need to include interaction effects in the model too. Regarding the interacting variables, should I then add the product of the mean-variables in addition to the product of the deviations from the individual-specific means? I would be very grateful for any suggestions here. Kind regards, Anja

    1. Everything should be done as if the interaction were just another variable. That is, first create the product variable, then compute its mean and the deviations from that mean, both of which go into the model.

  6. Dear Paul,
    Thanks for sharing your knowledge. I try to use your hybrid approach. my sample represent several firms in different years, so I wonder whether the mean of a certain continuous variable for example firm size is the mean for the whole sample , or should I calculate the mean of firm-size each year? thanks

    egen mean_size= mean(size)
    bysort year: egen mean_size= mean(size)

      1. Paul, thanks for all the great responses and for sharing your thoughts.

        Would your advice about which mean to use be the same if Nahla had a multicategory dummy variable for year in the model (i.e., i.year in Stata)?

      2. Thanks for all your kind responses to all the questions you are getting. I’m just seeking for some clarification here
        When you say to use “the mean across years for each firm” does that mean that for every firm and for every year, one should calculate the ‘year-specific firm mean’ and then withdraw this value from the observation to use the hybrid approach? Additionally, should we still use year dummies as well then?

        So in stata the code would look like this
        sort grantyear
        egen m_size_y = mean(size), by(firm year)
        gen hybrid_size= size – m_size_y

        When doing this, do the resulting coefficients resemble the unconditional fixed effects estimators or the conditional ones?
        Thanks in advance!

        1. No, for each firm, you calculate the mean of the predictor over all years. Then subtract that from the original variable.

  7. Hello Paul:
    Thank you for this article. I also read your paper.

    I do not have panel data, but I have pooled cross sectional data across three years

    I am using stata nbreg with svy prefix for the analysis. Fixed effects for the two time periods and some interactions terms

    This is a viable methods, and do I have a reason to expect biased parameters?

    Thank you in advance

  8. I am dealing with panel data and selected a ZINB using STATA.

    The regression output exhibits the number of observations but not the number of firms (is the panel variable) considered.
    Is there any command hat allows to know the number of firms indeed considered in the regression?

    1. The zinb command is not designed for panel data, so it won’t show the number of firms. If you ran the same regression in xtpoisson it would would tell you the number of firms actually used.

  9. Dear Dr. Paul,

    How can we compare between the NB random-effects regression and the the NB fixed-effects one (the hybrid method)? The Hausman test wont work because the variables are different(i.e. the random-effects uses raw numbers, while the fixed- effect uses deviation from the mean).

    Many thanks

    1. One of the big attractions of the hybrid method is that it’s straightforward to get a test of the fixed effects vs. random effects. Just test whether the coefficients for the deviation variables are the same as those for the corresponding mean variables. In Stata use a test command; in SAS use a TEST statement or a CONTRAST statement.

  10. Hello Paul. I am doing panel data analysis and the endogenous variable is the count data (number of products) So the first stage is the FE negative binomial regression while the second stage is the usual FE regression. How can I correct the standard errors then in the second stage? Or is there a canned routine to do these two stages together? Thanks!!

    1. Sorry, but I don’t understand exactly what you’re proposing to do. Why do you need two stages? What statistical package(s) are you using?

  11. Thank you for your excellent work on panel analysis, fixed effects, and issues with STATA’s conditional fixed effects estimation for count models.
    I have 2 questions:
    1. In your Sage book, you include comparisons of the hybrid, xtgee (pa) model and xtnbreg. I have panel data (with evidence of overdispersion) at the country level and have done cic and qic diagnostics and found that (using STATA) xtgee, corr(indep) models are preferred over xtgee, corr(ar1), despite the fact that I have monthly data over 10 years for 5 countries (nt=600). What do you think of including 4 country-dummies in an xtgee, family(nbinomial) link(log) corr(indep) robust SE model?
    2. Another solution is suggested by Hilbe’s 2012 Negative Binomial textbook (Ch 14): first, obtain estimates from (pooled) nbreg to get an estimate of the overdispersion parameter (e.g. 2.19), and then specify this same parameter in a xtgee, family(nb 2.19) in what he calls the “nb 2” model. What do you think of this solution? Thanks!

    1. 1. I’m generally skeptical of the cic and qic statistics as a way of determining the proper correlation structure. But with only 5 countries, I think your proposal is OK, with a couple of exceptions: (a) if you’re doing corr(indep), why not just use nbreg? (b) robust standard errors are likely to be inaccurate with t>>n.
      2. Haven’t read Hilbe’s ch. 14, but I guess I don’t see the rationale for not just estimating the overdispersion in the xtgee model.

      1. Thanks for the amazingly quick and insightful reply!

        1. I am inclined to use an xtgee model to take the panel
        structure into account (rather than the pooled nbreg),
        –I get similar effects, for all except one parameter though!
        2. I agree with you about the using the nbreg parameter
        in xtgee proposed by Hilbe. Seems tricky.
        3. I think I will also have to hunker down and do the comparison
        with my proposed xtgee solution and your hybrid model–I will
        probably learn something from it, despite all the typing. I have
        used it before and it was useful!

        1. 1. When you specify corr(ind), gee is not taking the panel structure into account.
          3. I don’t think the hybrid approach is really appropriate with only 5 level-two units.

          1. Thanks – I agree. The corr(ar1) seems more appropriate.
            I appreciate your advice on the hybrid approach also.

  12. Dear Paul,
    Thank you for your work in this area — I have found it very useful. I have a question about running a negative binomial estimator with two high dimensional fixed effects (i.e. fixed effects for individual street segments, and fixed effects for census tract*quarters). Is it possible to use the hybrid method with two sets of fixed effects? If not, do you have other suggestions?

  13. Hi;
    I have a panel data that has zero-inflated property. I want to estimate a panel count zero-inflated poison or NB. I know that stata has command for zinb for cross section data but I am not sure that it has any command for Panel data zero-inflation poison or NB. WOuld you please let me know is there any command that I can use in stata for panel version of count data with zero-inflated feature.

    1. No, there is no command for panel data with ZINB in Stata. But I argue in my post that the conventional NB can often do fine for data sets with a large fraction of zeros. For a random effects model, check out the new menbreg command.

  14. Hi Paul,
    Thanks for the very helpful post, and your wonderful work in this area. I’m very new to this approach (and statistics in general) and would really appreciate a little guidance.

    I have run a negative binomial regression model on a pooled cross sectional civil war data set which includes information on 130 conflicts in 80 states that last between 1 and 13 years. I have been told that i need to run fixed effects (for the states). The observations are civil conflict years – and there can be more than one conflict going on within a states. The data is unbalanced as the conflicts last different lengths.

    Should I be using the xtnbreg command in stata? If so with re or fe?

    Any advice would be warmly welcomed.
    Thanks in advance.

    1. As I note in my post, I am not a fan of xtnbreg, either for fixed or random effects. I would use nbreg, treating state as a factor variable. Keep in mind that this will effectively exclude all states with only one year. Actually, you’re better off excluding such states beforehand because their coefficients for the factor variable will not converge.

  15. Hi Paul,
    I am estimating a rate model with about 3000+ airports, each with 15 years of (annual) data. The dependent variable is the number of accidents and the exposure variable is the number of operations. The explanatory variables include: whether an air traffic control tower is operating (dummy), share of commercial air traffic, general aviation traffic, etc.

    The only effect that I am interested in quantifying is the air traffic tower effect (controlling for observed and unobserved factors).

    I ran xtnbreg (before being pointed to your post) and the Hausman test which indicated that fe was the best model.

    Reading the alternative solutions, they seem to pose a problem given that I have 3000+ individuals.

    Is there a “glass half full” interpretation of xtnbreg fe for my application?

    1. I really can’t approve the use of xtnbreg for either fixed or random effects models. Instead, I recommend using menbreg together with the hybrid (within-between) method described in my books on fixed effects.

  16. Hi Paul,
    As I understand your post,the problem with the HHG method is its ability to control for variables whose values do not change over time. Does this mean the HHG method does not pose a problem for data sets with no time-invariate variables?
    Thank you in advance!

    1. No. The primary reason people use fixed effects methods is to control for time-invariant variables, both observed and unobserved. But the HHG method doesn’t do that.

  17. Hi Paul
    Thank you for this interesting post. Here is another question for you.

    I am benchmarking the quality of care of 100.000 COPD patients among app. 2000 family physicians. I have 1 observation per patient.
    I regard this as multilevel data (patients nested within physicians).

    My outcome variable is binary so i was thinking of using a fixed effect logit model with :
    dummies for physicians as the parameters of interest (i.e. 2000-1 dummies)
    risk adjustment variables at patient level (severity of COPD, socio-economic variables ect.)

    Should I worry about the incidental parameters problem in this case, or is my model legitimate? I was planning on using the estimated model to calculate “predictive margins” to get a risk adjusted benchmarking score for each physician.

    1. It’s well known that doing fixed effects logistic by dummy variables yields upwardly biased estimates of coefficients (the incidental parameters problems). But the bias declines as the number of patients per physician increases. With an average of 50 patients per, you might be OK. The standard alternative is conditional logistic regression, but that would make it difficult/impossible to get the benchmarking scores that you desire. The hybrid (between/within) method that I describe in my books might work well in this situation.

  18. Dear Dr. Allison,
    How are you?
    I am estimating the deterrent effect of executions on homicides by using the state-level panel data between January 1, 1979 and December 31, 1998 (7,305 days). The unit of analysis in this panel data is the state-day and the total number of cases is 372,555. The dependent variable is the daily number of homicides (count data) and the main independent variables of interest include a set of 29 daily dummy variables with 14 days before and after an execution and the execution day; a set of 29 newspaper dummy variables; and a set of 29 TV dummy variables. The explanatory variables include: year, month, day of the week, national holiday, and state (dummy), and state and prison populations (numerical), etc.
    I ran xtnbreg in Stata 14.1 and the Hausman test to examine which model is more appropriate, fixed effect- or random effect- models. Unfortunately, I found that both FE and RE models fail to converge.
    Reading the alternative solutions, either (1) using the nbreg with individual dummies or (2) using menbreg, the hybrid-model-strategy, would be possible to measure the deterrent effect of executions on homicides. If this is the case, which one would be better alternative, (1) or (2)? Also, do I need to pay attention on the random-effect negative binomial regression?

  19. Dear Dr. Allison,

    Thank you for all of your work.

    I am using unconditional fixed effects NB2. I have 125 country fixed effects and 41 year fixed effects. Should I use robust clustered standard errors? OR, should I scale my standard errors using square root of Pearson X2-based dispersion? I am not sure if scaled standard errors will account for with-in group correlation. If I can get your opinion on that it would help me a lot.

    Thank you,


      1. Thank you for reply,

        The overall sample size is 2024 observations, with 1867 degrees of freedom. The Pearson scale factor is 1.215997.

        1. OK. The fixed effects already adjust for the clustering of observations. I’d adjust the standard errors as you suggest, but with one difference. I’d use the deviance chi-square rather than the Pearson. Our simulations showed that that worked better.

          1. Dear Mr. Allison,
            I am working with a count panel dataset and am facing issues you described in Allison & Waterman (2002). Thank you for providing solutions to this issue. When trying to implement it, I struggled to use deviance chi-square errors. How does it work with Stata?

            Thank you very much!

  20. Hi Dr. Allison,

    This is my first foray into negative binomial regression, and I would appreciate your thoughts. My data include 1,393 observations and 175 groups. As I understand it, I have three options to properly introduce FE (time and school). I’ve attempted the first two you presented, but want to make sure I did so correctly. First, I used the following command in Stata:
    nbreg DV IV i.year, vce(robust)

    For the second option, I took the value of a given IV in year t for institution i and subtracted the mean of the IV for institution i across all years. Then I used:
    xtnbreg DV IV i.year, re

    In most cases, the direction of the coefficients remained the same, but the standard errors were smaller using the -xtnbreg- command and I lost statistical significance in several instances.

    Ultimately, (a) from what I’ve described did I run these correctly, and (b) would you recommend using the second method, assuming everything was calculated correctly?

    Thank you again for your time.

    1. The first method is fine. And that’s the one I prefer, in most cases. The second method is good, in principle, but not with the xtnbreg command, which has a very strange parameterization of the re model. Instead, use menbreg. I’m a little surprised that you LOST statistical significance with the second method if the standard errors were smaller.

      1. Thank you for the quick response. Just to explore the -menbreg- route, I understand I should use the mean-adjusted IVs as described above in option 2, correct? Then use:
        menbreg DV IV, vce(robust) || school || year

        Is that correct?

        Once more, thanks for your assistance and continued monitoring of this forum; it’s greatly appreciated.

  21. Hi Dr. Allison,

    I am using panel data for 250 counties over 30 years where the dependent variable is a corner solution outcome i.e. it is a mixture of zeros as well as continuous non-count data. I have estimated a tobit model, but would like to also estimate a negative binomial regression, since any violation of heteroskedasticity (despite use of vce(cluster county)) in Tobit can lead to inconsistent parameter estimates due to the scaling factor, while the negative binomial is robust to this, and also accounts for overdispersion

    My question is can I make use of nbreg for the type of DV I have?
    In effect my Stata command would be: nbreg DV IV i.year i.county, vce(cluster county). Is this correct?

    Thank you in advance for your help

    1. Well, this is sort of an “off label” use of the NB since it’s really designed for discrete data rather than continuous. But it might be OK. The key thing is that you’re estimating a model in which the expected value of the dependent variable is a log-linear function of the predictors. Not sure I’d go with using both county fixed effects and robust standard errors clustered at the county level. That’s a common recommendation in econometrics, but I think it tends to overcorrect for clustering. I’d probably go with vce(robust).

        1. I agree with this post by William Gould. Near the end of the post, he says flat out that the NB seems to do as well as the Poisson, but without the constraint that the conditional mean has to be the same as the conditional variance. So, yes, go ahead and use the NB.

  22. Dear Dr. Allison,

    I am using panel data of 1215 firms over 18 years. The dependent variable is the number of acquisitions. I am conducting similar analyses for European data as did Arikan & Stulz (2016) for US data, whom you also helped. They write “we estimate regressions with random effects using the generalized equation estimation method”. I fail to understand how to implement this in Stata.

    This is how I did it after reading your work on hybrid (within-between) methods:

    xtgee acquisitions age age2 [firm-level deviations] [firm-level means], vce(robust) family(nbinomial) corr(exchangeable)

    Following are my questions:

    1- How can I find the correct dispersion parameter in family(nbinomal)?
    2- How do I determine the appropriate correlation structure for this type of application of -xtgee-?

    I’d be really grateful for any advice!

    1. Your Stata command seems appropriate. The exchangeable structure is probably too restrictive, however. I would try ar17. The default dispersion parameter will be estimated, and that should be fine. You may also want to include year, either as a quantitative or factor variable.

      1. Thank you for the quick and helpful response!

        I had included a qualitative variable to capture calendar year effects, just forgot to mention it here. Thank you!

        Regarding the correlation structure: I have tried ar17 – convergence cannot be achieved for that. Every firm that has 16 or less observations is excluded, which reduces the total number of observations by 90%.

        My description of the panel dataset was not accurate enough I guess. The 1215 firms that are included go public from 1995 to 2013 and their acquisitions are included from 1996 to 2015. Hence, I do not have data over 18 years for all 1215 firms, but rather the time period covered per firm depends on when it went public. This is why so many observations are excluded using ar17. I hope that makes sense.

        Do you have an alternative suggestion regarding the corr structure?

        Again, really grateful for your advice, professor!

  23. Dear Prof. Allison.
    Thank you for taking the time to respond.
    I am modelling the openings of manufacturing plants (dependent variable) with regard to various macroeconomic indicators, e.g gdp, labor cost, and the like. I have a set of 20 countries and my observations go back in 15 years. Please not that I have indications of overdispersion and a big number of zeros (>50% of observations).
    I would use the xtnbreg command in STATA until I read your article.
    My understanding regarding one of your suggestions is that nbreg combined with a dummy for countries ( is equal to xtnbreg, but gives a better estimate due to the problem you describe. Is my understanding correct? Also, does this mean that the absence of an xtzinb in STATA can be substituted by zinb combined with a dummy for country? Thank you again in advance for your response.

    1. Yes, nbreg with dummies for countries is preferable to xtnbreg with fixed effects. However, I would not necessarily recommend zinb with dummies. As I explain in my post, the fact that you have a lot of zeroes in not a sufficient justification for using a zero inflated model.

        1. Including the country dummies corrects for over-time dependence in the observations. However, it may not be sufficient. I recommend also using robust standard errors that cluster for country.

  24. Hi Dr. Allison,

    You suggest a variance correction for unconditional negative binomial fixed effects models on page 259 of your 2002 paper with Richard Waterman. Can/should that variance correction be used with robust or clustered standard errors?

    I appreciate your time!

    Thank you,


    1. Good question but I’m not sure of the answer. It’s possible that robust standard errors would substitute for the SE correction that we proposed.

  25. Dr.Allison,
    Can i ask what’s the stata command for fixed and random factor in negative binomial regression nbreg. It seems the the panel does have this function. Millions of thanks.

    1. menbreg will estimate random effects negative binomial. xtnbreg claims to do both fixed and random effects, but I don’t like the way it does either.

  26. Dear Prof. Allison,
    I am trying to use the method that you suggest as the “hybrid method” (i.e. “estimate a negative binomial model with all the time-varying covariates expressed as deviations from the individual-specific means”).

    I am not sure, however if I should also express the DV in terms of deviation from individual-specific mean.

    Some texts report that the DV shall also be expressed as a deviation from the individual mean.
    However, if I am doing it (subtracting the individual-specific mean from the value of the DV), I am ending up with negative and non integer values of the DV that cannot be used in negative binomial regressions.

    How would you suggest to handle it?
    Shall I refrain from expressing the DV as a deviation from individual-specific mean?

    Thank you very much in advance.


    1. In the hybrid method, the dependent variable should NOT be expressed as deviations from the mean.

  27. Dear Prof. Allison,

    Firstly, let me thank you for the excellent work of yours in the field.

    I just would like to confirm if I understood everything that has been explained both in this webpage as well as in the book you cited.

    I want to measure the effects of an exogenous shock on the ownership patterns of a given good that is a count variable. My dataset consists in an unbalanced panel spanning over 30 years.

    My first instinct was to estimate the effects of the shock through a negative binomial with fixed effects, something to the effect of:

    xtnbreg DV x1 x2 i.shock, fe

    However, after reading your work, I learned that xtnbreg does not deliver fixed effects to satisfaction. You propose three solutions to the problem and my question concerns these solutions since I am not sure if I completely understood how the specification changes.

    If I decide to run individual dummies (id) to capture the fixed effect, should I run it following which way?

    xtnbreg DV x1 x2 i.shock, re (I)


    nbreg DV x1 x2 i.shock (II) ?

    The same question applies to the hybird model. After demeaning all variables, should I then run a -xtbreg, fe- with all demeaned variables or should I run a -nbreg- with demeaned variables plus

    I would like to thank you in advance for any further input.



    1. For the dummy variable method, use nbreg.
      For the hybrid method, use menbreg. This command estimates mixed (random effects) negative binomial models.
      In my view, xtnbreg should never be used.

      1. Thanks for your help, Prof. Allison.

        In the hybrid method, how should one interpret the coefficients? Because it seems counter-intuitive to provide the reader with two coefficients representing the same variable with different interpretations. Is the effect a combination of both? But then what if one of the terms is not statistically significant?


        1. In they hybrid method, one coefficient describes the within-individual effect of a variable, the other describes the between-individual effect. I generally focus on the former and ignore the latter. The within-individual effect controls for all time-invariant variables. The between-individual coefficient does not.

  28. Hi Professor,

    Thank you for the advice and guidance on statistical methods.
    (1) Please, regarding the fixed effects for count data, I’ve seen other scholars suggest using xtpoisson with the fixed effects (i.e. xtpoisson dv iv1 iv2, fe vce(robust)), especially from Richard Williams ( personal communication) and Joao Santos Silva. Please, do you think this command can replace the xtnbreg (which you advise us against).

    (2) Also, is there any text where you run the stata commands for the fixed effects for count data (including the nbeg)? I do not understand most of the examples most of your commenters used because I am in sociology. I am a bit unclear about what individual dummies would mean in the command, if it implies using the id number as part of the command or having dummy variables for all the independent variables we want to use. I would really appreciate a stata example from you, as a sociologist.

    (3) I ran your sas commands on negbin that you have in the chapter 4 of your 2005 sas book and the results I got did not look like a fixed effects. It did not have the large r-square that fixed effects are known for neither did it recognize the longitudinal structure of the data.


    1. 1. Although fixed effects Poisson should be approximately unbiased, it’s not efficient (in the technical sense). And my experience is that results often differ dramatically from those of the negative binomial.
      2. See my “little green book,” Fixed Effects Regression Models (Sage).
      3. Are you referring to the results in Output 4.11? Those were produced by PROC GENMOD which doesn’t report an R-square. However, you can get one by regressing observed values on predicted values of the dependent variable, and that yields an R-square of .97.

      1. Thank you for your response Professor.

        I saw the commands in the appendix of your little green book. Please, I just wanted to confirm the command that represents the nbreg you suggest:
        nbreg part_w i.partystrg i.interest discuss i.ideology info id [iweight = full_wght_]

        I just discovered that I lost a huge number of respondents (up to 1,000 out of 3310) whenever I ran the xtpoisson command, unlike If I applied the `xtreg’ command. Please, is there a reason for that?

        I spoke to Prof Joao Santos Silva about your approach and he said: `As for efficiency (in the technical sense), none of them will be efficient unless the distribution is correctly specified; in my view, it is extremely unlikely that either of them is exactly correctly specified and without further information I have no reason to believe that the NB is superior to Poisson…Poisson regression with fixed effects is more robust than the NB and it is much easier to estimate (as you have found out); moreover, adding dummies to the NB model will not be a valid approach unless you have a lot of observations in each group.’

        The Professor cited this paper as a justification for using Poisson FE in place of Nbreg: Wooldrigde (1999). Distribution-free estimation of some nonlinear panel data models, Journal of Econometrics, 90 (1), 77-97.

        Also, Please Sir, I know that this may seem like an unrelated topic, in what format should one impute the missing values of panel data? In the wide format (where each variable waves have their own column) or in the long format (which is normally used from the `xtset id wave’ command)?

        Thank you.


        1. 1. Your command has id as a predictor. Assuming that variable is the unique id number for each individual, you would want to include it as so that Stata will create a dummy variable for each value of id (less one).

          2. The reason xtpoisson loses cases is that individuals who have a count of 0 for all their observations on the dependent variable do not contribute anything to the likelihood function, and are deleted from the analysis. nbreg won’t explicitly do this, but it does the equivalent.

          3. Efficiency may not seem like a big deal, but I often find dramatic differences between poisson and NB. I have more trust in the latter.

          4. Wide format is the way to go.

          1. Thank you for the reply.

            I am attempting to run the analyses on SAS given it may take me a century to get my results on Stata (i.e. if I am lucky).

            Please, I ran a chained multiple imputation to take advantage of most of my data and I am confused about how to use your NBREG FE individual dummy method to run my analyses. My DV is (political) participation, the ‘id’ is the identification number, `wave’ refers to the periods or waves of data and my commands I tried are:

            proc genmod data=polpartfixed;
            by _Imputation_;
            class wave id;
            model participation = informationsource ideoleft ideoright afffairly affvery someinterest veryinterest polatt id / DIST=NB SCALE=0; RUN;
            ods output ParameterEstimates=gmparms

            proc print data=gmparms (obs=8);
            var _Imputation_ Parameter Estimate StdErr;
            title ‘GENMOD Model Coefficients (First Two Imputations)’;

            proc print data=gmcovb (obs=8);
            var _Imputation_ RowName Prm1 Prm2 Prm3;
            title ‘GENMOD Covariance Matrices (First Two Imputations)’;

            proc mianalyze parms=gmparms covb=gmcovb parminfo=gmpinfo;
            modeleffects Intercept informationsource ideoleft ideoright afffairly affvery someinterest veryinterest polatt id;

            Everytime I run these commands, I then to get error messages. Please, I would like to know if these are the correct commands?

            Also, is there any reliable resource on how to run panel analyses with mi analyze? I will like to run a HLM analysis and Growth curve with multiple imputation data on SAS.

            Thank you and looking forward to your reply.


          2. A couple of points: Don’t use SCALE=0. And in the list of variables for PROC MIANALYZE, don’t include ID. Apart from that I’d have to see the LOG window output. You can send it to me.

  29. Dear Dr. Paul,

    In my research paper I used a two-step estimation in which my dependent variable is employment counts model included an estimated failure rate from a discrete hazard model:

    First stage estimation: discrete hazard model
    Second stage estimation: negative binomial regression

    My questions:
    1) Is it essential that both steps be estimated using the same variance correction?
    2) Among variance correction procedures {correction by deviance statistic; bootstrapping; OPG (Outer Product Gradient; OIM (observed information matrix)} which one do you suggest to use in my two step estimation?

    Thank you

    1. 1) No.
      2) Among those methods, I would feel most confident about bootstrapping. But of course, it’s also the most computationally demanding.

  30. Dear Dr. Allison,
    Thank you very much for your help on this topic.

    Could I confirm something about the nbreg with dummy variables method?

    1) Here, is it right to say we would include state dummies to account for cross-sectional heterogeneity and the Stata command ‘i.year’ to control for time fixed effects?

    2) As well, before conducting the regression, is it still necessary to use ‘xtset’ and ‘sort’ to let Stata know that panel data is being used?

    I am a little confused as to how to use nbreg in a way that still recognises and utilizes the data as a panel regression as opposed to pooled OLS.

    Thanks very much,

  31. Dear Prof. Alisson,

    I am doing my masters in political science investigating the relationship between mercenaries and civil deaths (count).

    my data is overdispearsed and there are 62 % zeroes. I have panel data with 102 countries for 28 years.

    Using R, how would you run a regression two-way fe in the best way?


    1. Here’s my quick take, which might change if I knew more about your data and research question. I’d start with a mixed negative binomial regression. In R, this could be done with the glmer.nb function in the MASS package. In that framework, I would estimate a between-within model to handle the country fixed effects. For the time fixed effects, you could just include 27 dummies. Don’t worry about over-dispersion. The negative binomial model is designed to handle over-dispersion.

  32. Dear Prof.Allison,

    Thank you for this very useful information. I’m using menbreg for my hybrid method analysis as described in Sage book on fixed effect.

    I have two questions for you:

    1) Is there any way to graph between-effect results?
    2) I’m conducting gender-separate analyses. Are there any formal tests of equality for the coefficients in men vs women?

    Thank you!



    1. 1. There’s probably a way to do it in Stata, but I’m not sufficiently adept at Stata graphics to tell you how to do it.
      2. If you’re running the models separately, you should also run a combined model that includes a dummy variable for gender. You can than construct a likelihood ratio chi-quare by using the log-likelihoods from each of those three models, as follows. Let Lm be the log-likelihood for men, Lw the log-likelihood for women, and Lc be the combined log-likelihood. The formula is -2(Lm+Lw-Lc). That will be a chi-square statistic with degrees of freedom equal to the number of coefficients being compared. Alternatively, you can run a combined model with interactions between gender and each of you predictors.

Leave a Reply

Your email address will not be published. Required fields are marked *