Don’t Put Lagged Dependent Variables in Mixed Models

June 2, 2015 By Paul Allison

Allison PicWhen estimating regression models for longitudinal panel data, many researchers include a lagged value of the dependent variable as a predictor. It’s easy to understand why. In most situations, one of the best predictors of what happens at time t is what happened at time t-1. 

This can work well for some kinds of models, but not for mixed models, otherwise known as a random effects models or multilevel models.  Nowadays, mixed modeling is probably the most popular approach to longitudinal data analysis. But including a lagged dependent variable in a mixed model usually leads to severe bias.

In economics, models with lagged dependent variables are known as dynamic panel data models.  Economists have known for many years that lagged dependent variables can cause major estimation problems, but researchers in other disciplines are often unaware of these issues.

The basic argument is pretty straightforward.  Let yit be the value of the dependent variable for individual i at time t.  Here’s a random intercepts model (the simplest mixed model) that includes a lagged value of the dependent variable, as well as a set of predictor variables represented by the vector xit:

            yit = b0 + b1yi(t-1) + b2xit +  ui + eit

The random intercept ui represents the combined effect on y of all unobserved variables that do not change over time. It is typically assumed to be normally distributed with a mean of 0, constant variance, and independent of the other variables on the right-hand side.

That’s where the problem lies. Because the model applies to all time points, i has a direct effect on yi(t-1).  But if i affects yi(t-1), it can’t also be statistically independent of yi(t-1). The violation of this assumption can bias both the coefficient for the lagged dependent variable (usually too large) and the coefficients for other variables (usually too small).

Later I’ll discuss some solutions to this problem, but first let’s consider an example. I use the wages data set that is available on this website. It contains information on annual wages of 595 people for seven consecutive years. The data are in “long form”, so there’s a total of 4,165 records in the data set. I use Stata for the examples because there are good Stata commands for solving the problem.

Using the xtreg command, let’s first estimate a random intercepts model for lwage (log of wage) with the dependent variable lagged by one year, along with two predictors that do not change over time: ed (years of education) and fem (1 for female, 0 for male). 

Here’s the Stata code:

use “https://statisticalhorizons.com/wp-content/uploads/wages.dta”, clear
xtset id t
xtreg lwage L.lwage ed fem t

The xtset command tells Stata that this is a “cross-section time-series” data set with identification numbers for persons stored in the variable id and a time variable t that ranges from 1 to 7.  The xtreg command fits a random-intercepts model by default, with lwage as the dependent variable and the subsequent four variables as predictors.  L.lwage specifies the one-year lag of lwage

Here’s the output:

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       lwage |
         L1. |   .8747517   .0085886   101.85   0.000     .8579183    .8915851
          ed |   .0108335   .0011933     9.08   0.000     .0084947    .0131724
         fem |    -.06705    .010187    -6.58   0.000    -.0870162   -.0470839
           t |   .0071965   .0019309     3.73   0.000     .0034119    .0109811
       _cons |   .7624068   .0491383    15.52   0.000     .6660974    .8587161
-------------+----------------------------------------------------------------
 

When the dependent variable is logged and the coefficients are small, multiplying them by 100 gives approximate percentage changes in the dependent variable. So this model says that each additional year of schooling is associated with a 1 percent increase in wages and females make about 6 percent less than males.  Each additional year is associated with about a 0.7 percent increase in wages. All these effects are dominated by the lagged effect of wages on itself, which amounts to approximately a 0.9 percent increase in this year’s wages for a 1 percent increase in last year’s wages. 

As I explained above, the lagged dependent variable gives us strong reasons to be skeptical of these estimates. Economists have developed a variety of methods for solving the problem, most of them relying on some form of instrumental variable (IV) analysis. For a discussion of how to implement IV methods for lagged dependent variables in Stata, see pp. 274-278 in Rabe-Hesketh and Skrondal (2012).

Personally, I prefer the maximum likelihood approach pioneered by Bhargava and Sargan (1983) which incorporates all the restrictions implied by the model in an optimally efficient way. Their method has recently been implemented by Kripfganz (2015) in a Stata command called xtdpdqml. This unwieldy set of letters stands for “cross-section time-series dynamic panel data estimation by quasi-maximum likelihood.”

Here’s how to apply xtdpdqml to the wage data:

xtset id t
xtdpdqml lwage ed fem t, re initval(0.1 0.1 0.2 0.5)

The re option specifies a random effects (random intercepts) model.  By default, the command includes the lag-1 dependent variable as a predictor.  The initval option sets the starting values for the four variance parameters that are part of the model.  Here is the output:

------------------------------------------------------------------------------
       lwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       lwage |
         L1. |   .4142827   .0230843    17.95   0.000     .3690383     .459527
          ed |   .0403258   .0031841    12.66   0.000     .0340851    .0465666
         fem |  -.2852665   .0271688   -10.50   0.000    -.3385164   -.2320166
           t |   .0533413   .0027533    19.37   0.000     .0479449    .0587378
       _cons |    3.25368   .1304816    24.94   0.000      2.99794    3.509419
------------------------------------------------------------------------------

Results are markedly different from those produced above by xtreg.  The coefficient of the lagged dependent variable is greatly reduced, while the others show substantial increases in magnitude. An additional year of schooling now produces a 4 percent increase in wages rather than 1 percent. Blacks now make 8 percent less than non-blacks rather than 1 percent less. And females make 24 percent less (calculated as 100(exp(-.28)-1) than males compared to 6 percent less. The annual increase in wages is 5 percent instead of 1 percent.

So doing it right can make a big difference.  Unfortunately, xtdpdqml has a lot of limitations. For example, it can’t handle missing data except by listwise deletion. With Richard Williams and Enrique Moral-Benito, I have been developing a new Stata command, xtdpdml, that removes many of these limitations. (Note that the only difference in the names for the two commands is the q in the middle). It’s not quite ready for release, but we expect it out by the end of 2015.

To estimate a model for the wage data with xtdpdml, use

xtset id t
xtdpdml lwage, inv(ed fem blk) errorinv

The inv option is for time-invariant variables.  The errorinv option forces the error variance to be the same at all points in time. Like xtdpdqml, this command automatically includes a 1-time unit lag of the dependent variable. Unlike xtdpdqml, xtdpdml can include longer lags and/or multiple lags.

Here is the output:

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
lwage2       |
      lwage1 |   .4088803   .0229742    17.80   0.000     .3638517     .453909
          ed |   .0406719   .0032025    12.70   0.000     .0343951    .0469486
         fem |  -.2878266    .027345   -10.53   0.000    -.3414218   -.2342315
------------------------------------------------------------------------------

Results are very similar to those for xtdpdqml. They are slightly different because xtdpdml always treats time as a categorical variable, but time was a quantitative variable in the earlier model for xtdpdqml.

If you’re not a Stata user, you can accomplish the same thing with any linear structural equation modeling software, as explained in my unpublished paper.  As a matter of fact, the xtdpdml command is just a front-end to the sem command in Stata. But it’s a lot more tedious and error-prone to set up the equations yourself.  That’s why we wrote the command.  

By the way, although I’ve emphasized random effects models in this post, the same problem occurs in standard fixed-effects models. You can’t put a lagged dependent variable on the right-hand side. Both xtdpdqml and xtdpdml can handle this situation also. 

If you’d like to learn more about dynamic panel data models, check out my 2-day course on Longitudinal Data Analysis Using SEM. It will be offered again October 16-17, 2015, in Los Angeles. 

References

Bhargava, A. and J. D. Sargan (1983) “Estimating dynamic random effects models from panel data covering short time periods.” Econometrica 51 (6): 1635-1659.

Rabe-Hesketh, Sophia, and Anders Skrondal  (2012) Multilevel and Longitudinal Modeling Using Stata. Volume 1: Continuous Responses. Third Edition. StataCorp LP. 


41 Responses

  1. Adriaan Hoogendoorn says:

    Dear Paul,

    Thank you for another interesting post on your blog.

    I guess that the problem with lagged dependent variables in mixed models is underexposed in health sciences as well. I’ve seen examples where generalized estimating equations (GEE) models – i.e. models that resemble the random intercepts model that you describe – are used in cases with lagged dependent variables, where one supposes an “appropriate correlation structure” to solve the problem. Reading this post made me wonder if this is correct.

    I found that applying a GEE model with an independent correlation structure exactly replicates the results of your xtreg command.
    . xtgee lwage L.lwage ed fem t , fam(gauss) link(iden) i(id) t(t) corr(independent)
    Changing the correlation structure to exchangeable or to autoregressive has only a small effect on the resulting parameters. The command for the exchangeable correlation structure is:
    . xtgee lwage L.lwage ed fem t , fam(gauss) link(iden) i(id) t(t) corr(exchangeable)
    while the command for the autoregressive correlation structure is:
    . xtgee lwage L.lwage ed fem t , fam(gauss) link(iden) i(id) t(t) corr(ar)

    Neither the GEE model with an exchangeable correlation structure, nor the GEE model with an autoregressive correlation structure come close to the results of the xtdpdqml procedure that you showed in your post. Both are in fact similar to the results from the xtreg command. This suggests that the term “autoregressive” may be misleading.

    I managed to replicate the results of your new Stata command xtdpdml, using Kripfganz’s commands

    . quietly tab t, gen(t)
    . xtdpdqml lwage ed fem t3 t4 t5 t6 t7, re initval(0.1 0.1 0.2 0.5)

    and guess that this may be due to the absence of missing data in this particular case. I’d like to end with two questions:

    1. Does the use of taking both lagged and contemporaneous independent variables, as in extending your equation into

    yit = b0 + b1yi(t-1) + b2xit + b3xi(t-1) + ui + eit

    cause any additional problems? I guess not, but am curious of your view on this.

    2. How does this xtdpdml approach relate to the SEM approach assuming random effects?

    Good luck in finalizing the xtdpdml procedure.

    Kind regards, Adriaan Hoogendoorn

    • Paul Allison says:

      I agree that GEE is likely to suffer the same problems with lagged dependent variables as mixed models. Regarding your questions:

      1. I don’t see any special problems with other lagged predictors, unless those predictors are “predetermined”, meaning that they depend on earlier values of the dependent variable. The xtdpdml command can allow for predetermined variables.

      2. The xtdpdml command is a shell for the sem command. For time-invariant predictors, it estimates random effects models. However, for time-varying predictors, it produces fixed-effects estimates.

  2. Michael Zyphur says:

    Hi Paul,
    Thanks for this post. Your xtdpdml command sounds exciting. I look forward to seeing it in action!

    Currently, I am playing around with multiply imputed datasets in Stata, and therefore using commands that often begin with ‘mi estimate, cmdok: xtdpd’. In general, many of the post-estimation routines such as the Sargan or autocorrelation tests have not been optimized for the mi case. Comparing nested models is also not straightforward in the mi case–unless I am missing something (?).

    Given your vast expertise in the area, will xtdpdml be mi friendly?

    Thanks again! I always enjoy reading your posts.

    Mike

    • Paul Allison says:

      Hi Michael. We haven’t even thought about multiple imputation with this command, in part because FIML is so easy to use instead. I can imagine that there might be some difficulty because the command initially reformats the data from long to wide. It can optionally use wide data as input, however.

  3. Paul von Hippel says:

    If I follow your explanation, the bias comes from assuming that the random effect is the same at times 2,3,etc. It seems to me the problem goes away when there are just 2 time points, since there’s no explicit random effect at time 1.

    The case with 2 time points is important in education, where teacher value-added is often estimated using 2 time points: this year and last year.

    • Paul Allison says:

      Yes, but then you’d just be estimating a standard linear regression with OLS, right?

      • Paul von Hippel says:

        Sorry I wasn’t clear. I was thinking of the situation with students nested in school. The model I had in mind was this:
        y_ij2 = b0 + b1*y_i1 + … + u_j + e_i
        where y_i1,y_ij2 are the scores of student i in years 1 and 2, u_j is a school random effect, and e_i is a student random residual.

        With only 2 years, we don’t have to assume the random effect u_j is the same in year 1 as in year 2, so is there still a bias?

        • Paul Allison says:

          Interesting question. You don’t HAVE to assume that u_j is the same in the two years but, at the least, you would expect the school effects to be highly correlated over time. Consequently, you would expect y_i1 to be correlated with u_j, leading to biased estimates with standard mixed model software.

          • Paul von Hippel says:

            Interesting. So how would you recommend accounting for the correlation among observations from the same school?

          • Paul Allison says:

            Just do a fixed effects model with schools as the fixed effects. Alternatively, express the time 1 measure as deviations from school means.

  4. Adriaan Hoogendoorn says:

    Dear Paul,

    Thanks for your reply.
    In applying Kripfganz’s procedure, I noticed that in some cases it drops observations: “Note: groups are dropped due to gaps or insufficient number of observations”.
    Is this one of the problems with handling missing data that you refer to?

    Kind regards, Adriaan

  5. Matt says:

    Dear Paul,

    Very interesting post. Looking forward to your xtdpdml command as well!

    I had one question which was whether the problems you describe with including a lagged dependent variable in a fixed effects model still hold with very large T (I am currently working with n=32, T=566)? I understand that with large T Nickell bias diminishes, but are there other potential estimation problems?

    All the best,

    Matt

    • Paul Allison says:

      1. xtdpdml won’t work if T approaches or exceeds n.
      2. My post was about random effects rather than fixed effects models.
      3. I don’t have enough knowledge/experience with these kinds of situations to comment further.

  6. Micah says:

    Dear Paul,

    Thanks so much for your work on this. Any suggestions on what to use for model selection between random and fixed effects? Is it correct that the standard Hausman test is insufficient for this estimator?

    Looking forward to seeing the final release!

    Thanks!

    • Paul Allison says:

      I would do a likelihood ratio test of random effects versus fixed effects. Both can be easily estimated with our new xtdpdml command for stata.

  7. Daniel says:

    Dear Paul – thank you very much for the post.

    Are there any routines that can handle AR(3) models (vs. AR(1))? My understanding is that both xtdpdqml and xtdpdml cannot handle longer lag structure.

    Sincerely,
    Daniel

    • Paul Allison says:

      xtdpdml can handle any lag structure for measured predictors, but it does not allow autoregressive structures on the disturbances.

  8. sina says:

    Great post. Could you also elaborate on how your newly developed command (xtdpdml) handles missing data better than the xtdpdqml?

    • Paul Allison says:

      xtdpdqml simply drops any records with missing data
      xtdpdml optionally handles missing data by full information maximum likelihood, which preserves all cases under the assumption that the data are missing at random and that the variables with missing data have a multivariate normal distribution

  9. Andreas Kam says:

    Very nice and understandable post!

    Does the bias only occur with individual fixed effects? Say we would include time fixed effects (or country fixed effects in a different model where the level of observation still is at level i) instead of individual fixed effects, would that also be an issue when the lagged dependent variable is included in the regression?

  10. Tmapp says:

    Dear Professor Allison,

    I am conducting a study for Master in Economics on the impact of elections on the macroeconomy. My model
    contains a lagged dependent variable and so i have been advised to use dynamic maximum likelihood.
    I have come accross the command (xtdpdml) that your colleagues and you developed.
    I am trying to implement it but facing this challenged of some latent variable
    being requested by stata. The specific error is as follows:

    xtdpdml gdpg, inv(election ex wgdp fo lto lgoe ltot) tfix
    latent variable gdpg2 not found;
    ‘gdpg2’ specifies a latent variable.
    For ‘gdpg2’ to be valid, ‘gdpg2’ must begin with a capital letter.

    or

    xtdpdml Gdpg, inv(election ex wgdp fo lto lgoe ltot) tfix
    model not identified;
    no paths from latent variable Gdpg26 to observed variables

    I would be grateful for your assistance in solving this problem.

    • Paul Allison says:

      How many individuals and how many time points? Also, try recoding your time variable to t=1,2,3,… etc. The tfix option doesn’t always work.

  11. Christian says:

    Dear Paul,

    I am getting a similar error as TMapp.

    xtdpdml HW xvar1 xvar2 , errorinv
    >
    model not identified;
    no paths from latent variable HW13 to observed variables
    <

    It might help if I describe the data and variables:
    y, x1 and x2 are time-varying. I have 48,494 observations (invidivduals*t) and about 5500 individuals. Time is coded from (t) 1 to 13. Data is in long format.
    What might be an issue: the data is not balanced and the time-sequence has gaps in it. However, if I eliminate both the unbalanced sequences and the individuals with gaps, as well as those not starting with t1 I still get the same error message.

    Do you happen to have any idea what is going on?

    Best,
    Christian

  12. Wolfgang says:

    Dear Paul,

    Why not just include AR(1) residuals directly in the model? In other words:

    mixed lwage ed fem t || id:, residuals(ar1, t(t))

    Best,
    Wolfgang

    • Paul Allison says:

      That can be useful, but many researchers actually want to estimate and control for the effect of a lagged dependent variable.

  13. Bo Nielsen says:

    Hi Paul,

    this is really interesting. Now, I am using xtmixed (Stata) to run a relatively simply cross-classified (firms within both industries and countries) multilevel (longitudinal) model.

    The model (simplified) looks something like:

    xtmixed Dvar IVfirm IVfirm IVindustry IVcountry || _all: R.concode || sic3: || concode:, var

    Now, I need to instrument the country level IV – how would I do this with your approach in stata?

  14. Hanna Lindström says:

    Hi Paul,
    Thank you for a very clear and informative post. I am however having trouble accessing the xtdpdml command in my Stata software, is it available for the STATA MP 13 and STATA 14 version?
    Regards
    Hanna Lindström

  15. Adewale Abideen says:

    Thanks a lot Prof. Paul for the very enlightening post. Other than on your sample dataset, each time I attemt to use the xtdpdml command, I always get the message “convergence not achieved”. What implication does this message have on my result and how can I resolve it? Thanks.

  16. Anna Manzoni says:

    Thank you so much for writing this program.
    I could not find anything in the SJ article and I was wondering if you had any suggestions about how to apply (longitudinal) weights in any way. Thank you.

    • Paul Allison says:

      Sorry, but xtdpdml currently does not allow weights.

      • Anna Manzoni says:

        Thank you for your response.

        I was also wondering what you would suggest if my ideal model would need to account for selectivity. Specifically, I am modelling occupational status, and I would like to account for selection into employment. In a xtreg with lagged DV I would estimate a first stage probit for the probability into employment and then use the Inverse Mills Ratio from the probit in my xtreg model predicting occupational status.

        What would you suggest if I want to use xtdpdml accounting for selection? Thanks in advance.

        • Paul Allison says:

          xtreg does not have any special features to account for selection. But keep in mind that when you are estimating a fixed effects model (the default in xtdpdml), selection issues are much less important. That’s because you are only using within-individual variation to estimate the model. Selection is primarily a between-person confounder.

  17. David Disabato says:

    Hi Paul,

    Thanks for the post – very interesting. I’m a psychology researcher and I have never seen this issue discussed in the quantitative psych literature. I was curious your thoughts on analyzing data from a common research design I do. I conduct a lot of daily diary studies where I collect participant data every day for 3 weeks straight. I often run longitudinal mixed effect models (days nested within persons) where I use yesterday’s IV to predict today’s DV. In these mixed effect models I almost always control for yesterday’s (i.e., lagged) DV to show that my IV can predict change over time. I will person-mean (aka group-mean) center both yesterday’s IV and yesterday’s (i.e., lagged) DV so only the time-varying components of each remain. Is the lagged DV as a predictor still a problem in this case? I was wondering if it’s only a problem for time-invariant predictors.

    Would love to hear any thoughts you have,
    David

    • Paul Allison says:

      Even if you group-mean center the variables, putting a lagged DV in your model will still lead to severe bias.

  18. Yawen Chan says:

    Hi Paul,

    Thanks very much for this informative post, and the related resources found on the web. I’m in psychological research and I agree with David that such discussion is rare in psychology.

    I have a set of data collected via ESM/EMA (experience sampling method/ ecological momentary assessment) protocol, so that I collected data from participants repeatedly & randomly 8 times a day across 2 weeks. Then I tried to use time-series mixed model to see if the IV (e.g. stress) predicts DV (e.g. abdominal pain) across time, yet only within the same day. After studying xtdpdml, it seems that it’s not appropriate for my dataset because:

    1. No. of measurement > No. of participants
    The study had 57 participants (30 controls and 27 patients). Each of them contributed up to 114 measurements.

    2. No carry-over effect overnight
    Since xtdpdml automatically look for the DV at the earlier timepoint as LDV, it seems that I can’t manually tell the program to only do so if the measurements were taken within the same day. Previously, I’d create a column for LDV, as a new IV, copy the values from previous timepoint if it comes from the same day, and treat it as missing value if it comes from the previous day.

    I’d love to hear what you think about the issue. Thanks a lot!

    Yawen

Leave a Reply