Skip to content

Don’t Put Lagged Dependent Variables in Mixed Models

Paul Allison
June 2, 2015

When estimating regression models for longitudinal panel data, many researchers include a lagged value of the dependent variable as a predictor. It’s easy to understand why. In most situations, one of the best predictors of what happens at time t is what happened at time t-1.

This can work well for some kinds of models, but not for mixed models, otherwise known as a random effects models or multilevel models.  Nowadays, mixed modeling is probably the most popular approach to longitudinal data analysis. But including a lagged dependent variable in a mixed model usually leads to severe bias.

In economics, models with lagged dependent variables are known as dynamic panel data models.  Economists have known for many years that lagged dependent variables can cause major estimation problems, but researchers in other disciplines are often unaware of these issues.


The basic argument is pretty straightforward.  Let yit be the value of the dependent variable for individual i at time t.  Here’s a random intercepts model (the simplest mixed model) that includes a lagged value of the dependent variable, as well as a set of predictor variables represented by the vector xit:

yit = b0 + b1yi(t-1) + b2xit +  ui + eit

The random intercept ui represents the combined effect on y of all unobserved variables that do not change over time. It is typically assumed to be normally distributed with a mean of 0, constant variance, and independent of the other variables on the right-hand side.

That’s where the problem lies. Because the model applies to all time points, i has a direct effect on yi(t-1).  But if i affects yi(t-1), it can’t also be statistically independent of yi(t-1). The violation of this assumption can bias both the coefficient for the lagged dependent variable (usually too large) and the coefficients for other variables (usually too small).

Later I’ll discuss some solutions to this problem, but first let’s consider an example. I use the wages data set that is available on this website. It contains information on annual wages of 595 people for seven consecutive years. The data are in “long form”, so there’s a total of 4,165 records in the data set. I use Stata for the examples because there are good Stata commands for solving the problem.

Using the xtreg command, let’s first estimate a random intercepts model for lwage (log of wage) with the dependent variable lagged by one year, along with two predictors that do not change over time: ed (years of education) and fem (1 for female, 0 for male).

Here’s the Stata code:

use "", clear
 xtset id t
 xtreg lwage L.lwage ed fem t

The xtset command tells Stata that this is a “cross-section time-series” data set with identification numbers for persons stored in the variable id and a time variable t that ranges from 1 to 7.  The xtreg command fits a random-intercepts model by default, with lwage as the dependent variable and the subsequent four variables as predictors.  L.lwage specifies the one-year lag of lwage.

Here’s the output:

       lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
       lwage |
         L1. |   .8747517   .0085886   101.85   0.000     .8579183    .8915851
          ed |   .0108335   .0011933     9.08   0.000     .0084947    .0131724
         fem |    -.06705    .010187    -6.58   0.000    -.0870162   -.0470839
           t |   .0071965   .0019309     3.73   0.000     .0034119    .0109811
       _cons |   .7624068   .0491383    15.52   0.000     .6660974    .8587161

When the dependent variable is logged and the coefficients are small, multiplying them by 100 gives approximate percentage changes in the dependent variable. So this model says that each additional year of schooling is associated with a 1 percent increase in wages and females make about 6 percent less than males.  Each additional year is associated with about a 0.7 percent increase in wages. All these effects are dominated by the lagged effect of wages on itself, which amounts to approximately a 0.9 percent increase in this year’s wages for a 1 percent increase in last year’s wages.

As I explained above, the lagged dependent variable gives us strong reasons to be skeptical of these estimates. Economists have developed a variety of methods for solving the problem, most of them relying on some form of instrumental variable (IV) analysis. For a discussion of how to implement IV methods for lagged dependent variables in Stata, see pp. 274-278 in Rabe-Hesketh and Skrondal (2012).

Personally, I prefer the maximum likelihood approach pioneered by Bhargava and Sargan (1983) which incorporates all the restrictions implied by the model in an optimally efficient way. Their method has recently been implemented by Kripfganz (2015) in a Stata command called xtdpdqml. This unwieldy set of letters stands for “cross-section time-series dynamic panel data estimation by quasi-maximum likelihood.”

Here’s how to apply xtdpdqml to the wage data:

xtset id t
xtdpdqml lwage ed fem t, re initval(0.1 0.1 0.2 0.5)

The re option specifies a random effects (random intercepts) model.  By default, the command includes the lag-1 dependent variable as a predictor.  The initval option sets the starting values for the four variance parameters that are part of the model.  Here is the output:

       lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
       lwage |
         L1. |   .4142827   .0230843    17.95   0.000     .3690383     .459527
          ed |   .0403258   .0031841    12.66   0.000     .0340851    .0465666
         fem |  -.2852665   .0271688   -10.50   0.000    -.3385164   -.2320166
           t |   .0533413   .0027533    19.37   0.000     .0479449    .0587378
       _cons |    3.25368   .1304816    24.94   0.000      2.99794    3.509419

Results are markedly different from those produced above by xtreg.  The coefficient of the lagged dependent variable is greatly reduced, while the others show substantial increases in magnitude. An additional year of schooling now produces a 4 percent increase in wages rather than 1 percent. Blacks now make 8 percent less than non-blacks rather than 1 percent less. And females make 24 percent less (calculated as 100(exp(-.28)-1) than males compared to 6 percent less. The annual increase in wages is 5 percent instead of 1 percent.

So doing it right can make a big difference.  Unfortunately, xtdpdqml has a lot of limitations. For example, it can’t handle missing data except by listwise deletion. With Richard Williams and Enrique Moral-Benito, I have been developing a new Stata command, xtdpdml, that removes many of these limitations. (Note that the only difference in the names for the two commands is the q in the middle). It’s not quite ready for release, but we expect it out by the end of 2015.

To estimate a model for the wage data with xtdpdml, use

xtset id t
xtdpdml lwage, inv(ed fem blk) errorinv

The inv option is for time-invariant variables.  The errorinv option forces the error variance to be the same at all points in time. Like xtdpdqml, this command automatically includes a 1-time unit lag of the dependent variable. Unlike xtdpdqml, xtdpdml can include longer lags and/or multiple lags.

Here is the output:

             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
lwage2       |
      lwage1 |   .4088803   .0229742    17.80   0.000     .3638517     .453909
          ed |   .0406719   .0032025    12.70   0.000     .0343951    .0469486
         fem |  -.2878266    .027345   -10.53   0.000    -.3414218   -.2342315

Results are very similar to those for xtdpdqml. They are slightly different because xtdpdml always treats time as a categorical variable, but time was a quantitative variable in the earlier model for xtdpdqml.

If you’re not a Stata user, you can accomplish the same thing with any linear structural equation modeling software, as explained in Allison et al. (2018) . As a matter of fact, the xtdpdml command is just a front-end to the sem command in Stata. But it’s a lot more tedious and error-prone to set up the equations yourself.  That’s why we wrote the command.

By the way, although I’ve emphasized random effects models in this post, the same problem occurs in standard fixed-effects models. You can’t put a lagged dependent variable on the right-hand side. Both xtdpdqml and xtdpdml can handle this situation also.

If you’d like to learn more about dynamic panel data models, check out my course on Longitudinal Data Analysis Using SEM.


Allison, Paul D., Richard Williams and Enrique Moral-Benito (2017) “Maximum likelihood for cross-lagged panel models with fixed effects.” Socius 3: 1-17.

Bhargava, A. and J. D. Sargan (1983) “Estimating dynamic random effects models from panel data covering short time periods.” Econometrica 51 (6): 1635-1659.

Kripfganz, S. (2016). “Quasi-maximum likelihood estimation of linear dynamic short-T panel-data models.” Stata Journal 16 (4), 1013–1038.

Rabe-Hesketh, Sophia, and Anders Skrondal  (2012) Multilevel and Longitudinal Modeling Using Stata. Volume 1: Continuous Responses. Third Edition. StataCorp LP.



  1. Dear Paul,

    Thank you for another interesting post on your blog.

    I guess that the problem with lagged dependent variables in mixed models is underexposed in health sciences as well. I’ve seen examples where generalized estimating equations (GEE) models – i.e. models that resemble the random intercepts model that you describe – are used in cases with lagged dependent variables, where one supposes an “appropriate correlation structure” to solve the problem. Reading this post made me wonder if this is correct.

    I found that applying a GEE model with an independent correlation structure exactly replicates the results of your xtreg command.
    . xtgee lwage L.lwage ed fem t , fam(gauss) link(iden) i(id) t(t) corr(independent)
    Changing the correlation structure to exchangeable or to autoregressive has only a small effect on the resulting parameters. The command for the exchangeable correlation structure is:
    . xtgee lwage L.lwage ed fem t , fam(gauss) link(iden) i(id) t(t) corr(exchangeable)
    while the command for the autoregressive correlation structure is:
    . xtgee lwage L.lwage ed fem t , fam(gauss) link(iden) i(id) t(t) corr(ar)

    Neither the GEE model with an exchangeable correlation structure, nor the GEE model with an autoregressive correlation structure come close to the results of the xtdpdqml procedure that you showed in your post. Both are in fact similar to the results from the xtreg command. This suggests that the term “autoregressive” may be misleading.

    I managed to replicate the results of your new Stata command xtdpdml, using Kripfganz’s commands

    . quietly tab t, gen(t)
    . xtdpdqml lwage ed fem t3 t4 t5 t6 t7, re initval(0.1 0.1 0.2 0.5)

    and guess that this may be due to the absence of missing data in this particular case. I’d like to end with two questions:

    1. Does the use of taking both lagged and contemporaneous independent variables, as in extending your equation into

    yit = b0 + b1yi(t-1) + b2xit + b3xi(t-1) + ui + eit

    cause any additional problems? I guess not, but am curious of your view on this.

    2. How does this xtdpdml approach relate to the SEM approach assuming random effects?

    Good luck in finalizing the xtdpdml procedure.

    Kind regards, Adriaan Hoogendoorn

    1. I agree that GEE is likely to suffer the same problems with lagged dependent variables as mixed models. Regarding your questions:

      1. I don’t see any special problems with other lagged predictors, unless those predictors are “predetermined”, meaning that they depend on earlier values of the dependent variable. The xtdpdml command can allow for predetermined variables.

      2. The xtdpdml command is a shell for the sem command. For time-invariant predictors, it estimates random effects models. However, for time-varying predictors, it produces fixed-effects estimates.

  2. Hi Paul,
    Thanks for this post. Your xtdpdml command sounds exciting. I look forward to seeing it in action!

    Currently, I am playing around with multiply imputed datasets in Stata, and therefore using commands that often begin with ‘mi estimate, cmdok: xtdpd’. In general, many of the post-estimation routines such as the Sargan or autocorrelation tests have not been optimized for the mi case. Comparing nested models is also not straightforward in the mi case–unless I am missing something (?).

    Given your vast expertise in the area, will xtdpdml be mi friendly?

    Thanks again! I always enjoy reading your posts.


    1. Hi Michael. We haven’t even thought about multiple imputation with this command, in part because FIML is so easy to use instead. I can imagine that there might be some difficulty because the command initially reformats the data from long to wide. It can optionally use wide data as input, however.

  3. If I follow your explanation, the bias comes from assuming that the random effect is the same at times 2,3,etc. It seems to me the problem goes away when there are just 2 time points, since there’s no explicit random effect at time 1.

    The case with 2 time points is important in education, where teacher value-added is often estimated using 2 time points: this year and last year.

      1. Sorry I wasn’t clear. I was thinking of the situation with students nested in school. The model I had in mind was this:
        y_ij2 = b0 + b1*y_i1 + … + u_j + e_i
        where y_i1,y_ij2 are the scores of student i in years 1 and 2, u_j is a school random effect, and e_i is a student random residual.

        With only 2 years, we don’t have to assume the random effect u_j is the same in year 1 as in year 2, so is there still a bias?

        1. Interesting question. You don’t HAVE to assume that u_j is the same in the two years but, at the least, you would expect the school effects to be highly correlated over time. Consequently, you would expect y_i1 to be correlated with u_j, leading to biased estimates with standard mixed model software.

          1. Interesting. So how would you recommend accounting for the correlation among observations from the same school?

          2. Just do a fixed effects model with schools as the fixed effects. Alternatively, express the time 1 measure as deviations from school means.

  4. Dear Paul,

    Thanks for your reply.
    In applying Kripfganz’s procedure, I noticed that in some cases it drops observations: “Note: groups are dropped due to gaps or insufficient number of observations”.
    Is this one of the problems with handling missing data that you refer to?

    Kind regards, Adriaan

  5. Dear Paul,

    Very interesting post. Looking forward to your xtdpdml command as well!

    I had one question which was whether the problems you describe with including a lagged dependent variable in a fixed effects model still hold with very large T (I am currently working with n=32, T=566)? I understand that with large T Nickell bias diminishes, but are there other potential estimation problems?

    All the best,


    1. 1. xtdpdml won’t work if T approaches or exceeds n.
      2. My post was about random effects rather than fixed effects models.
      3. I don’t have enough knowledge/experience with these kinds of situations to comment further.

  6. Dear Paul,

    Thanks so much for your work on this. Any suggestions on what to use for model selection between random and fixed effects? Is it correct that the standard Hausman test is insufficient for this estimator?

    Looking forward to seeing the final release!


    1. I would do a likelihood ratio test of random effects versus fixed effects. Both can be easily estimated with our new xtdpdml command for stata.

  7. Dear Paul – thank you very much for the post.

    Are there any routines that can handle AR(3) models (vs. AR(1))? My understanding is that both xtdpdqml and xtdpdml cannot handle longer lag structure.


    1. xtdpdml can handle any lag structure for measured predictors, but it does not allow autoregressive structures on the disturbances.

  8. Great post. Could you also elaborate on how your newly developed command (xtdpdml) handles missing data better than the xtdpdqml?

    1. xtdpdqml simply drops any records with missing data
      xtdpdml optionally handles missing data by full information maximum likelihood, which preserves all cases under the assumption that the data are missing at random and that the variables with missing data have a multivariate normal distribution

  9. Very nice and understandable post!

    Does the bias only occur with individual fixed effects? Say we would include time fixed effects (or country fixed effects in a different model where the level of observation still is at level i) instead of individual fixed effects, would that also be an issue when the lagged dependent variable is included in the regression?

  10. Dear Professor Allison,

    I am conducting a study for Master in Economics on the impact of elections on the macroeconomy. My model
    contains a lagged dependent variable and so i have been advised to use dynamic maximum likelihood.
    I have come accross the command (xtdpdml) that your colleagues and you developed.
    I am trying to implement it but facing this challenged of some latent variable
    being requested by stata. The specific error is as follows:

    xtdpdml gdpg, inv(election ex wgdp fo lto lgoe ltot) tfix
    latent variable gdpg2 not found;
    ‘gdpg2’ specifies a latent variable.
    For ‘gdpg2’ to be valid, ‘gdpg2’ must begin with a capital letter.


    xtdpdml Gdpg, inv(election ex wgdp fo lto lgoe ltot) tfix
    model not identified;
    no paths from latent variable Gdpg26 to observed variables

    I would be grateful for your assistance in solving this problem.

    1. How many individuals and how many time points? Also, try recoding your time variable to t=1,2,3,… etc. The tfix option doesn’t always work.

      1. Dear proffesor Allison

        I have the same Tmapp’s problem. After I
        run the equation “xtdpdml TRevenue pc1 pc2 Trade Agriculture Inf Grants Natural Education2 logdensity Urban Pop65 Laborf icrg_qog fh_ipolity2 Stability left center election , pred(logGDPP TDS) inv(ingles frances aleman escandinavo) method(mlmv) ylags(1 2) errorinv”

        I get the error advise:

        model not identified;
        no paths from latent variable TRevenue70 to observed variables

        I thank in advance your attention

        1. The problem is that several of your predictor variables have names that begin with capital letters. The sem command (which is the engine for xtdpdml) presumes that variables that start with capital letters are latent variables, which totally confuses it. Change your variable names to have only lower case letters.

  11. Dear Paul,

    I am getting a similar error as TMapp.

    xtdpdml HW xvar1 xvar2 , errorinv
    model not identified;
    no paths from latent variable HW13 to observed variables

    It might help if I describe the data and variables:
    y, x1 and x2 are time-varying. I have 48,494 observations (invidivduals*t) and about 5500 individuals. Time is coded from (t) 1 to 13. Data is in long format.
    What might be an issue: the data is not balanced and the time-sequence has gaps in it. However, if I eliminate both the unbalanced sequences and the individuals with gaps, as well as those not starting with t1 I still get the same error message.

    Do you happen to have any idea what is going on?


  12. Dear Paul,

    Why not just include AR(1) residuals directly in the model? In other words:

    mixed lwage ed fem t || id:, residuals(ar1, t(t))


    1. That can be useful, but many researchers actually want to estimate and control for the effect of a lagged dependent variable.

  13. Hi Paul,

    this is really interesting. Now, I am using xtmixed (Stata) to run a relatively simply cross-classified (firms within both industries and countries) multilevel (longitudinal) model.

    The model (simplified) looks something like:

    xtmixed Dvar IVfirm IVfirm IVindustry IVcountry || _all: R.concode || sic3: || concode:, var

    Now, I need to instrument the country level IV – how would I do this with your approach in stata?

  14. Hi Paul,
    Thank you for a very clear and informative post. I am however having trouble accessing the xtdpdml command in my Stata software, is it available for the STATA MP 13 and STATA 14 version?
    Hanna Lindström

  15. Thanks a lot Prof. Paul for the very enlightening post. Other than on your sample dataset, each time I attemt to use the xtdpdml command, I always get the message “convergence not achieved”. What implication does this message have on my result and how can I resolve it? Thanks.

  16. Thank you so much for writing this program.
    I could not find anything in the SJ article and I was wondering if you had any suggestions about how to apply (longitudinal) weights in any way. Thank you.

      1. Thank you for your response.

        I was also wondering what you would suggest if my ideal model would need to account for selectivity. Specifically, I am modelling occupational status, and I would like to account for selection into employment. In a xtreg with lagged DV I would estimate a first stage probit for the probability into employment and then use the Inverse Mills Ratio from the probit in my xtreg model predicting occupational status.

        What would you suggest if I want to use xtdpdml accounting for selection? Thanks in advance.

        1. xtreg does not have any special features to account for selection. But keep in mind that when you are estimating a fixed effects model (the default in xtdpdml), selection issues are much less important. That’s because you are only using within-individual variation to estimate the model. Selection is primarily a between-person confounder.

  17. Hi Paul,

    Thanks for the post – very interesting. I’m a psychology researcher and I have never seen this issue discussed in the quantitative psych literature. I was curious your thoughts on analyzing data from a common research design I do. I conduct a lot of daily diary studies where I collect participant data every day for 3 weeks straight. I often run longitudinal mixed effect models (days nested within persons) where I use yesterday’s IV to predict today’s DV. In these mixed effect models I almost always control for yesterday’s (i.e., lagged) DV to show that my IV can predict change over time. I will person-mean (aka group-mean) center both yesterday’s IV and yesterday’s (i.e., lagged) DV so only the time-varying components of each remain. Is the lagged DV as a predictor still a problem in this case? I was wondering if it’s only a problem for time-invariant predictors.

    Would love to hear any thoughts you have,

    1. Even if you group-mean center the variables, putting a lagged DV in your model will still lead to severe bias.

      1. Hi Paul,

        I was hoping to follow up on this question, because my intuition was similarly that a Level 1 centered (i.e., cluster, group, person-mean centered) lagged predictor would not include the effect of ui (i.e., the random intercept, or the effect of all time-invariant unobserved variables on y). Although centering the lagged DV is not identical to estimating a random intercept and removing it, in many practical situations it will be very close (e.g., balanced data). What am I (or we) missing here? And, is there literature that you could point to that references the issue with centered data?

        Thank you for considering this follow up question.


        1. Good question, but I don’t have a confident answer. I strongly suspect that there will still be bias, but I’m not sure how much.

      2. Dear Dr. Allison,

        I would like to follow-up on this response of yours and hope you can help clarify the issue. Can we not use a correlated random effects model with a lagged DV? My thinking is that the estimates of the group-mean centered lagged DV might be biased, but the within-group lagged DV should not be, since including the group-mean takes away all unobserved heterogeneity. Thus, if the interest is in the within-group estimates of other independent variables, but there is a need to to control for dynamic effects, would this approach be acceptable?


        1. It’s hard to tell without more detailed analysis, but I still think the answer is no. There’s still unobserved heterogeneity in the dependent variable, so the effect of the lagged variable will be biased. And if that’s biased, any other coefficients might be biased.

  18. Hi Paul,

    Thanks very much for this informative post, and the related resources found on the web. I’m in psychological research and I agree with David that such discussion is rare in psychology.

    I have a set of data collected via ESM/EMA (experience sampling method/ ecological momentary assessment) protocol, so that I collected data from participants repeatedly & randomly 8 times a day across 2 weeks. Then I tried to use time-series mixed model to see if the IV (e.g. stress) predicts DV (e.g. abdominal pain) across time, yet only within the same day. After studying xtdpdml, it seems that it’s not appropriate for my dataset because:

    1. No. of measurement > No. of participants
    The study had 57 participants (30 controls and 27 patients). Each of them contributed up to 114 measurements.

    2. No carry-over effect overnight
    Since xtdpdml automatically look for the DV at the earlier timepoint as LDV, it seems that I can’t manually tell the program to only do so if the measurements were taken within the same day. Previously, I’d create a column for LDV, as a new IV, copy the values from previous timepoint if it comes from the same day, and treat it as missing value if it comes from the previous day.

    I’d love to hear what you think about the issue. Thanks a lot!


  19. Thanks for this post. I am trying to figure out whether it is appropriate to use your xtdpdml command for my problem, which is this:

    I have 10 years of daily data on the volume of trades in 350 stocks. I am trying to test whether the volume of trades in stocks generally increases significantly on a day when the stock goes ex-div. (Each stock goes ex-div about twice a year but on different days.)

    So my y(it) variable is the (log of) volume of trades in stock i on day t.

    But I will include a single one-zero dummy for days when the stock goes ex-div. The inclusion of this dummy doesn’t invalidate the procedure does it?

    And there is no problem with the number of periods greatly outnumbering the number of companies is there?

    I very much appreciate any help you can provide.

    1. Sorry, but your number of time points is way too big for xtdpdml to handle, especially with more periods than companies.

  20. You mention that “the same problem occurs in standard fixed-effects models.” To clarify, if I have a binary outcome variable and run a dynamic fixed-effects logit model (xtlogit or clogit), should I expect the coefficients to suffer from the bias you describe here? Thank you.

  21. Dear Paul,

    Thank you for this interesting post. I would kindly ask for advice.

    I am running analyses on a panel dataset with country-quarterly data. The DV is the number of ICOs in each country/quarter and I believe it is correlated with previous quarters.

    I am thinking of using xtgee with f1.DV (one year forward) so that every variable is in the previous year, family (nbinomial) link corr(ar 1).

    My question is: considering that I am using corr (ar1), is it correct to include the lag of the DV as control?

    Thank you in advance for your time,


      1. Dear Paul,

        Thank you for your prompt reply. In trying to solve the above problem I have come across your paper on xtdpdml.

        I am trying to run the code but every time I get some errors. For example I almost always get

        model not identified;
        no paths from latent variable NumberofICOperquarter20 to observed variables

        I get the same error even if I have a simple code such as
        xtset countryid timeid
        xtdpdml DV IV
        where both are time varying.

        I was wondering whether you have any suggestion?

        Thank you,


        1. Most likely it’s because the variable NumberofICOperquarter20 has a name that begins with an upper case letter. xtdpdml treats such variables as latent variables. Rename it.

  22. What is the case about a static panel data model with a lagged independent variable. (By definition this is not a dynamic panel). For example a previous decision will influence the present decision and so on. Does a lagged independent variable also violates strict exogeneity? Why and how?

    Are there any methods I can use? Is there any literature on this problem?

  23. Dear Professor Allison

    This is an incredibly illuminating post, with implications for a large amount of research in my field (epidemiology).

    In my case, I am trying to estimate the effect of a binary exposure (‘moved’) on a binary outcome (‘y’) from three waves of a survey (N=1,200) with fixed effects and a lagged dependent variable (but not cross-lagged). The primary effect of interest is the association between the exposure and outcomes measured at the same time.

    I have attempted to implement your method in R using the dpm:: package that replicates your xtdpdml as a front-end to lavaan:: (SEM model below this comment).

    lavaan:: fits probit models for categorical endogenous variables, and I am struggling to produce easily-interpreted effects (e.g. marginal effects) from the output. Given the potential for your methods in medical research, it would be great if there was a straight-forward way of producing easily interpreted effects for categorical outcomes, such as RRs, marginal probabilities, etc.

    So my questions are:

    1. Can xtdpdml in STATA implement logit (or even modified-Poisson) regression in the SEM?
    2. It is sensible to talk about marginal effects from regressions fitted in SEM?
    3. If the answer to 1. is ‘no’, and 2. is ‘yes’: do you have any advice for producing marginal effects from probits in SEM?


    ## Main regressions

    y.2 ~ ex1 * moved.2 + p1 * y.1
    y.3 ~ ex1 * moved.3 + p1 * y.2
    y.4 ~ ex1 * moved.4 + p1 * y.3

    ## Alpha latent variable (random intercept)

    alpha =~ 1 * y.2 + 1 * y.3 + 1 * y.4

    ## Alpha free to covary with observed variables (fixed effects)

    alpha ~~ moved.2 + moved.3 + moved.4 + y.1

    ## Exogenous (time varying and invariant) predictors covariances

    moved.2 ~~ y.1
    moved.3 ~~ moved.2 + y.1
    moved.4 ~~ moved.2 + moved.3 + y.1

    ## Holding DV error variance constant for each wave

    y.2 ~~ v*y.2
    y.3 ~~ v*y.3
    y.4 ~~ v*y.4

    ## Let DV variance vary across waves

    y.2 ~ 1
    y.3 ~ 1
    y.4 ~ 1

    1. Here are answers:

      1. Can xtdpdml in STATA implement logit (or even modified-Poisson) regression in the SEM?

      2. It is sensible to talk about marginal effects from regressions fitted in SEM?

      3. If the answer to 1. is ‘no’, and 2. is ‘yes’: do you have any advice for producing marginal effects from probits in SEM?

  24. Dear Paul,

    You write:

    “The random intercept ui represents the combined effect on y of all unobserved variables … and independent of the other variables on the right-hand side.”

    and “By the way, although I’ve emphasized random effects models in this post, the same problem occurs in standard fixed-effects models.”

    However, to my knowledge, pooled OLS and random effects models are required that the (composite) error terms is uncorrelated with the vector of regressors, but not for the -fe- specification, where the panel-wise error is actually correlated with the vector of regressors.

    Hence, in fixed effects model, the violation of the assumption “independent of the other variables on the right-hand side” doesn’t exist, does it?


    1. There’s still a problem with the FE specification, although it’s a little more subtle. Wooldridge explains it in his textbook on panel data.

  25. What if we’re interested in building a purely predictive mixed model: either a linear regression, GLM, or a machine learning technique? In this case I assume lagged response variables are less of an issue since our ultimate aim is to minimize MSE regardless of bias in coefficient estimates.

  26. Dr. Allison, does the caution about lagged dependent variables apply to count models using within-between with the menbreg command in Stata? Like you recommended in your 2012 post, I am only focusing on fixed effects (the dm_ variables) not random (the m_ variables). If so, what would I use instead of menbreg? Thank you for your help.

    1. Yes, you should not put a lagged dependent variable in an menbreg model, even for the between-within method. In my course on Longitudinal Data Analysis Using SEM, I show how to properly estimate such a model using the gsem command. But it’s not straightforward. Trivedi ( shows a way to do it with the gmm command, but I’ve never tried it.

      1. Thank you, Dr. Allison.

        Do you have the estimation method that you describe using the gsem command written anywhere, since I am not able to take your course at this time?

        Also, in my situation (count data using within-between and menbreg), would it work to make all the independent variables leading (one year prior, for example) and keeping the dependent variable without a lead or lag?

        Thank you for your help.

        1. Unfortunately, I don’t have anything written about this. As for your proposed method, I doubt that this would appropriately solve the problem.

  27. Thanks to you, Enrique, and Rich for working on this and implementing it in Stata, Paul. Question: What about the two-time point situation where the nesting factor is not the individual but the group they belong to (e.g., school)? Most people would think of this as a type of cross-sectional multilevel model but since they have a pre score at t1 they want to control for it. Does your advice to not include a lagged DV in the model hold for this scenario?

  28. Dear Paul,

    I have a panel study with only two study waves (t=1, t=2) where I want to investigate whether blood pressure (bp) affects cognition (cog), and I wanted to “adjust for baseline cognition”.

    xtset id t
    xtdpdml cog, predetermined(L.bp)

    I tried the xtdpdml but got an error Message:

    “T value is too small given the lags specified
    For example if xlag = 2 then T must equal at least 4”

    Do I need more waves?

    1. Yes, you need more waves. In this kind of model, you can’t have both a lagged predictor and fixed effects (the default in xtdpdml).

  29. Paul, Thanks for this post. You write that “Because the model applies to all time points, u­i has a direct effect on yi(t-1). ”
    However, in psychological data, ui undoubtedly has an effect on many time-varying predictors at Level 1. So this should make all multilevel models problematic if there are predictors at Level 1. Am I understanding you correctly?

    1. Well, yes, u(i) could be affecting many time-varying predictors. But, as with any regression, we make the heroic assumption that it’s not. And the data can’t prove us wrong. But u(i) is necessarily correlated with yi(t-1). So we know for sure that the model violates a key assumption.

      1. Thanks. Are you by chance familiar with the Bell and Jones (2014) paper Explaining Fixed Effects: Random Effects Modeling of Time-Series Cross-Sectional and Panel Data?
        I found it really helpful for understanding why to include groups means at level 2 and group-center (whenever sensible) at level 1. It has to do with u(i) among other things.

  30. Dear prof. Allison, dear Paul:

    Many thanks for this blog entry and for your continuous commitment in addressing people’s queries on the use of xtdpdml. I have one of my own. Together with a colleague, I am working on a balanced panel dataset on policy agendas, with 40 time periods, 17 groups, for a total of N=680. In the late stages of the review process, we have been involved in discussions directly with the editors with regard to the model. To cut short, our previous models were (rightly) considered as biased because of the lagged DepVar. We are leaning towards the use of xtdpdml, which would address the problem. However, our DepVar is expressed in fractional terms, insofar it varies between 0 and 100. Would xtdpdml still be appropriate, or what would you consider as a best alternative?

    many thanks for your answer and for your time,
    Best regards,
    Auste Vaznonyte & Francesco Nicoli

  31. Paul, thanks you for the post and for your continued effort to educate. I have been facing an exact same issue on the use of a lagged predictor on panel data. Is there anything in SAS/ Python/ R that mimics xtdpdqml/ xtdpdml? If not, is the source code available?

    Thanks in advance for your time.


    1. Yes, in fact, there is a very nice “clone” of xtdpdml in an R package called dpm by Jacob Long. It’s not yet on CRAN but you can download it from github with the code:


      A tutorial can be found at

  32. Paul, thanks for your informative page. Finding this post a while back set me up on a path to begin to model our longitudinal data in much more responsible and sophisticated ways. It’s been quite useful for my entire team.

    I’m now doing some other forms of multilevel analyses where I’m realizing that similar concerns may be in play, and I wanted to learn more. So, I was wondering: could you point me to a scholarly resource that outlines in more detail the two primary claims about MLM/HLM that form the basis of your recommendations:

    1. That MLM assumes independence between random terms and other terms on the RHS
    2. That a lack of such independence biases parameter estimates in characteristic ways, (typically inflating the non-independent fixed effects while deflating others)

    I yet haven’t been able to dig anything up that speaks to this directly, so anything you have would be greatly appreciated.

    One other related thing, if I may. What do you think would be a suitable diagnostic test for the presence of such non-independence in an already-fitted model? Could you, say, extract the coefficients for an estimated random effect and look for remaining correlations against fixed-effects variables also in the model?

    1. I presume you’re referring to one of the options in xtdpdqml. I didn’t write this command, so I can’t say much about it. But the help file says that the initval option is “seldom used.”

  33. Thank you very much for this insightful information and the command! Would it be possible to do a system equation (seemingly unrelated regression) with FE or RE and a lagged dependent variable? I guess that the SEM command will become extremely long and it will be computationally very difficult to solve, or is there hope? Best, Christian

  34. Dear Professor PAUL ALLISON
    Could I includded lagged independent variables in Mixed logistics regression?
    Kind Regards

  35. Dear Professor,

    Thank you for this insightful post. I am trying to run a random effects model with xtdpdml. I have weekly data for just one year for week number 7 to 23 each month with gaps, the number of groups are 2925.

    However, when I run the model, I am getting the following error
    variable case_rates6 not found;
    Perhaps you meant ‘case_rates6’ to specify a latent variable.
    For ‘case_rates6’ to be a valid latent variable specification,
    ‘case_rates6’ must begin with a capital letter.

    Here case_rates is the dependent variable. I lowered the case of all the variable names in the dataset but that did not solve the issue. I believe, the code is trying to find week 6 data. Can you please help? Thanks.

    1. Unfortunately, xtdpdml cannot handle data with gaps in the time variable. That variable needs to have values 1,2,3,4,…, etc., with no gaps. So try recoding the data with no gaps.

      That should work fine if you don’t have any lagged variables, either dependent or independent. But if you do have lagged variables, then some lags will apply to longer intervals (between gaps) and others to shorter. It would be implausible that the lagged effects would be the same for different intervals between observations. An ad hoc solution would be to add interactions between the lagged predictor and a variable capturing the length of the gap.

  36. Hello Professor Allison,
    Thank you for sharing your knowledge with us.
    Could you please tell me if the lagged independent variable used in the equation below is appropriate and if it can be interpreted? It was used to control for autocorrelation in the equation but I’m not sure if this is a statistical issue.

    log (casesit) = a + Xig + tempit + precit +windit + testsit + casesit-1 + countyi + dayt + εit

    1. If you’re estimating a mixed model, this would not be appropriate. It will cause bias in all your coefficient estimates.

Leave a Reply

Your email address will not be published. Required fields are marked *