## Don’t Put Lagged Dependent Variables in Mixed Models

##### June 2, 2015 By Paul Allison

When estimating regression models for longitudinal panel data, many researchers include a lagged value of the dependent variable as a predictor. It’s easy to understand why. In most situations, one of the best predictors of what happens at time *t* is what happened at time *t*-1.

This can work well for some kinds of models, but not for mixed models, otherwise known as a random effects models or multilevel models. Nowadays, mixed modeling is probably the most popular approach to longitudinal data analysis. But including a lagged dependent variable in a mixed model usually leads to severe bias.

In economics, models with lagged dependent variables are known as *dynamic panel data* models. Economists have known for many years that lagged dependent variables can cause major estimation problems, but researchers in other disciplines are often unaware of these issues.

The basic argument is pretty straightforward. Let *y _{it}* be the value of the dependent variable for individual

*i*at time

*t*. Here’s a random intercepts model (the simplest mixed model) that includes a lagged value of the dependent variable, as well as a set of predictor variables represented by the vector

*x*

_{it}: *y _{it}* =

*b*

_{0}+

*b*

_{1}

*y*

_{i}_{(t-1)}+

*b*

_{2}

*x*+

_{it}*u*+

_{i}*e*

_{it}The random intercept *u _{i}* represents the combined effect on

*y*of all unobserved variables that do not change over time. It is typically assumed to be normally distributed with a mean of 0, constant variance, and

*independent of the other variables on the right-hand side*.

That’s where the problem lies. Because the model applies to all time points, *u _{i}* has a direct effect on

*y*

_{i}_{(t-1)}. But if

*u*affects

_{i}*y*

_{i}_{(t-1)}, it can’t also be statistically independent of

*y*

_{i}_{(t-1)}. The violation of this assumption can bias both the coefficient for the lagged dependent variable (usually too large) and the coefficients for other variables (usually too small).

Later I’ll discuss some solutions to this problem, but first let’s consider an example. I use the **wages** data set that is available on this website. It contains information on annual wages of 595 people for seven consecutive years. The data are in “long form”, so there’s a total of 4,165 records in the data set. I use Stata for the examples because there are good Stata commands for solving the problem.

Using the **xtreg** command, let’s first estimate a random intercepts model for **lwage **(log of wage) with the dependent variable lagged by one year, along with two predictors that do not change over time: **ed** (years of education) and **fem** (1 for female, 0 for male).

Here’s the Stata code:

**use “https://statisticalhorizons.com/wp-content/uploads/wages.dta”, clear**** xtset id t**** xtreg lwage L.lwage ed fem t**

The **xtset** command tells Stata that this is a “cross-section time-series” data set with identification numbers for persons stored in the variable **id** and a time variable **t **that ranges from 1 to 7. The **xtreg** command fits a random-intercepts model by default, with **lwage** as the dependent variable and the subsequent four variables as predictors. **L.lwage** specifies the one-year lag of **lwage**.

Here’s the output:

------------------------------------------------------------------------------

lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

lwage |

L1. | .8747517 .0085886 101.85 0.000 .8579183 .8915851

ed | .0108335 .0011933 9.08 0.000 .0084947 .0131724

fem | -.06705 .010187 -6.58 0.000 -.0870162 -.0470839

t | .0071965 .0019309 3.73 0.000 .0034119 .0109811

_cons | .7624068 .0491383 15.52 0.000 .6660974 .8587161

-------------+----------------------------------------------------------------

When the dependent variable is logged and the coefficients are small, multiplying them by 100 gives approximate percentage changes in the dependent variable. So this model says that each additional year of schooling is associated with a 1 percent increase in wages and females make about 6 percent less than males. Each additional year is associated with about a 0.7 percent increase in wages. All these effects are dominated by the lagged effect of wages on itself, which amounts to approximately a 0.9 percent increase in this year’s wages for a 1 percent increase in last year’s wages.

As I explained above, the lagged dependent variable gives us strong reasons to be skeptical of these estimates. Economists have developed a variety of methods for solving the problem, most of them relying on some form of instrumental variable (IV) analysis. For a discussion of how to implement IV methods for lagged dependent variables in Stata, see pp. 274-278 in Rabe-Hesketh and Skrondal (2012).

Personally, I prefer the maximum likelihood approach pioneered by Bhargava and Sargan (1983) which incorporates all the restrictions implied by the model in an optimally efficient way. Their method has recently been implemented by Kripfganz (2015) in a Stata command called **xtdpdqml**. This unwieldy set of letters stands for “cross-section time-series dynamic panel data estimation by quasi-maximum likelihood.”

Here’s how to apply **xtdpdqml** to the wage data:

**xtset id t****xtdpdqml lwage ed fem t, re initval(0.1 0.1 0.2 0.5)**

The **re** option specifies a random effects (random intercepts) model. By default, the command includes the lag-1 dependent variable as a predictor. The **initval** option sets the starting values for the four variance parameters that are part of the model. Here is the output:

------------------------------------------------------------------------------

lwage | Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

lwage |

L1. | .4142827 .0230843 17.95 0.000 .3690383 .459527

ed | .0403258 .0031841 12.66 0.000 .0340851 .0465666

fem | -.2852665 .0271688 -10.50 0.000 -.3385164 -.2320166

t | .0533413 .0027533 19.37 0.000 .0479449 .0587378

_cons | 3.25368 .1304816 24.94 0.000 2.99794 3.509419

------------------------------------------------------------------------------

Results are markedly different from those produced above by **xtreg**. The coefficient of the lagged dependent variable is greatly reduced, while the others show substantial increases in magnitude. An additional year of schooling now produces a 4 percent increase in wages rather than 1 percent. Blacks now make 8 percent less than non-blacks rather than 1 percent less. And females make 24 percent less (calculated as 100(exp(-.28)-1) than males compared to 6 percent less. The annual increase in wages is 5 percent instead of 1 percent.

So doing it right can make a big difference. Unfortunately, **xtdpdqml** has a lot of limitations. For example, it can’t handle missing data except by listwise deletion. With Richard Williams and Enrique Moral-Benito, I have been developing a new Stata command, **xtdpdml**, that removes many of these limitations. (Note that the only difference in the names for the two commands is the **q** in the middle). It’s not quite ready for release, but we expect it out by the end of 2015.

To estimate a model for the wage data with **xtdpdml**, use

**xtset id t**** xtdpdml lwage, inv(ed fem blk) errorinv**

The **inv** option is for time-invariant variables. The **errorinv** option forces the error variance to be the same at all points in time. Like **xtdpdqml**, this command automatically includes a 1-time unit lag of the dependent variable. Unlike **xtdpdqml**, **xtdpdml** can include longer lags and/or multiple lags.

Here is the output:

------------------------------------------------------------------------------

| Coef. Std. Err. z P>|z| [95% Conf. Interval]

-------------+----------------------------------------------------------------

lwage2 |

lwage1 | .4088803 .0229742 17.80 0.000 .3638517 .453909

ed | .0406719 .0032025 12.70 0.000 .0343951 .0469486

fem | -.2878266 .027345 -10.53 0.000 -.3414218 -.2342315

------------------------------------------------------------------------------

Results are very similar to those for **xtdpdqml**. They are slightly different because **xtdpdml **always treats time as a categorical variable, but time was a quantitative variable in the earlier model for **xtdpdqml.**

If you’re not a Stata user, you can accomplish the same thing with any linear structural equation modeling software, as explained in my unpublished paper. As a matter of fact, the **xtdpdml** command is just a front-end to the **sem** command in Stata. But it’s a lot more tedious and error-prone to set up the equations yourself. That’s why we wrote the command.

By the way, although I’ve emphasized random effects models in this post, the same problem occurs in standard fixed-effects models. You can’t put a lagged dependent variable on the right-hand side. Both **xtdpdqml** and **xtdpdml **can handle this situation also.

If you’d like to learn more about dynamic panel data models, check out my 2-day course on Longitudinal Data Analysis Using SEM. It will be offered again October 16-17, 2015, in Los Angeles.

__References__

Bhargava, A. and J. D. Sargan (1983) “Estimating dynamic random effects models from panel data covering short time periods.” *Econometrica* 51 (6): 1635-1659.

Rabe-Hesketh, Sophia, and Anders Skrondal (2012) *Multilevel and Longitudinal Modeling Using Stata. Volume 1: Continuous Responses*. Third Edition. StataCorp LP.

Dear Paul,

Thank you for another interesting post on your blog.

I guess that the problem with lagged dependent variables in mixed models is underexposed in health sciences as well. I’ve seen examples where generalized estimating equations (GEE) models – i.e. models that resemble the random intercepts model that you describe – are used in cases with lagged dependent variables, where one supposes an “appropriate correlation structure” to solve the problem. Reading this post made me wonder if this is correct.

I found that applying a GEE model with an independent correlation structure exactly replicates the results of your xtreg command.

. xtgee lwage L.lwage ed fem t , fam(gauss) link(iden) i(id) t(t) corr(independent)

Changing the correlation structure to exchangeable or to autoregressive has only a small effect on the resulting parameters. The command for the exchangeable correlation structure is:

. xtgee lwage L.lwage ed fem t , fam(gauss) link(iden) i(id) t(t) corr(exchangeable)

while the command for the autoregressive correlation structure is:

. xtgee lwage L.lwage ed fem t , fam(gauss) link(iden) i(id) t(t) corr(ar)

Neither the GEE model with an exchangeable correlation structure, nor the GEE model with an autoregressive correlation structure come close to the results of the xtdpdqml procedure that you showed in your post. Both are in fact similar to the results from the xtreg command. This suggests that the term “autoregressive” may be misleading.

I managed to replicate the results of your new Stata command xtdpdml, using Kripfganz’s commands

. quietly tab t, gen(t)

. xtdpdqml lwage ed fem t3 t4 t5 t6 t7, re initval(0.1 0.1 0.2 0.5)

and guess that this may be due to the absence of missing data in this particular case. I’d like to end with two questions:

1. Does the use of taking both lagged and contemporaneous independent variables, as in extending your equation into

yit = b0 + b1yi(t-1) + b2xit + b3xi(t-1) + ui + eit

cause any additional problems? I guess not, but am curious of your view on this.

2. How does this xtdpdml approach relate to the SEM approach assuming random effects?

Good luck in finalizing the xtdpdml procedure.

Kind regards, Adriaan Hoogendoorn

I agree that GEE is likely to suffer the same problems with lagged dependent variables as mixed models. Regarding your questions:

1. I don’t see any special problems with other lagged predictors, unless those predictors are “predetermined”, meaning that they depend on earlier values of the dependent variable. The xtdpdml command can allow for predetermined variables.

2. The xtdpdml command is a shell for the sem command. For time-invariant predictors, it estimates random effects models. However, for time-varying predictors, it produces fixed-effects estimates.

Hi Paul,

Thanks for this post. Your xtdpdml command sounds exciting. I look forward to seeing it in action!

Currently, I am playing around with multiply imputed datasets in Stata, and therefore using commands that often begin with ‘mi estimate, cmdok: xtdpd’. In general, many of the post-estimation routines such as the Sargan or autocorrelation tests have not been optimized for the mi case. Comparing nested models is also not straightforward in the mi case–unless I am missing something (?).

Given your vast expertise in the area, will xtdpdml be mi friendly?

Thanks again! I always enjoy reading your posts.

Mike

Hi Michael. We haven’t even thought about multiple imputation with this command, in part because FIML is so easy to use instead. I can imagine that there might be some difficulty because the command initially reformats the data from long to wide. It can optionally use wide data as input, however.

If I follow your explanation, the bias comes from assuming that the random effect is the same at times 2,3,etc. It seems to me the problem goes away when there are just 2 time points, since there’s no explicit random effect at time 1.

The case with 2 time points is important in education, where teacher value-added is often estimated using 2 time points: this year and last year.

Yes, but then you’d just be estimating a standard linear regression with OLS, right?

Sorry I wasn’t clear. I was thinking of the situation with students nested in school. The model I had in mind was this:

y_ij2 = b0 + b1*y_i1 + … + u_j + e_i

where y_i1,y_ij2 are the scores of student i in years 1 and 2, u_j is a school random effect, and e_i is a student random residual.

With only 2 years, we don’t have to assume the random effect u_j is the same in year 1 as in year 2, so is there still a bias?

Interesting question. You don’t HAVE to assume that u_j is the same in the two years but, at the least, you would expect the school effects to be highly correlated over time. Consequently, you would expect y_i1 to be correlated with u_j, leading to biased estimates with standard mixed model software.

Interesting. So how would you recommend accounting for the correlation among observations from the same school?

Just do a fixed effects model with schools as the fixed effects. Alternatively, express the time 1 measure as deviations from school means.

Dear Paul,

Thanks for your reply.

In applying Kripfganz’s procedure, I noticed that in some cases it drops observations: “Note: groups are dropped due to gaps or insufficient number of observations”.

Is this one of the problems with handling missing data that you refer to?

Kind regards, Adriaan

Yes.

Dear Paul,

Very interesting post. Looking forward to your xtdpdml command as well!

I had one question which was whether the problems you describe with including a lagged dependent variable in a fixed effects model still hold with very large T (I am currently working with n=32, T=566)? I understand that with large T Nickell bias diminishes, but are there other potential estimation problems?

All the best,

Matt

1. xtdpdml won’t work if T approaches or exceeds n.

2. My post was about random effects rather than fixed effects models.

3. I don’t have enough knowledge/experience with these kinds of situations to comment further.

Dear Paul,

Thanks so much for your work on this. Any suggestions on what to use for model selection between random and fixed effects? Is it correct that the standard Hausman test is insufficient for this estimator?

Looking forward to seeing the final release!

Thanks!

I would do a likelihood ratio test of random effects versus fixed effects. Both can be easily estimated with our new xtdpdml command for stata.

Dear Paul – thank you very much for the post.

Are there any routines that can handle AR(3) models (vs. AR(1))? My understanding is that both xtdpdqml and xtdpdml cannot handle longer lag structure.

Sincerely,

Daniel

xtdpdml can handle any lag structure for measured predictors, but it does not allow autoregressive structures on the disturbances.

Great post. Could you also elaborate on how your newly developed command (xtdpdml) handles missing data better than the xtdpdqml?

xtdpdqml simply drops any records with missing data

xtdpdml optionally handles missing data by full information maximum likelihood, which preserves all cases under the assumption that the data are missing at random and that the variables with missing data have a multivariate normal distribution

Very nice and understandable post!

Does the bias only occur with individual fixed effects? Say we would include time fixed effects (or country fixed effects in a different model where the level of observation still is at level i) instead of individual fixed effects, would that also be an issue when the lagged dependent variable is included in the regression?

Time fixed effects or country fixed effects shouldn’t cause a problem.

Dear Professor Allison,

I am conducting a study for Master in Economics on the impact of elections on the macroeconomy. My model

contains a lagged dependent variable and so i have been advised to use dynamic maximum likelihood.

I have come accross the command (xtdpdml) that your colleagues and you developed.

I am trying to implement it but facing this challenged of some latent variable

being requested by stata. The specific error is as follows:

xtdpdml gdpg, inv(election ex wgdp fo lto lgoe ltot) tfix

latent variable gdpg2 not found;

‘gdpg2’ specifies a latent variable.

For ‘gdpg2’ to be valid, ‘gdpg2’ must begin with a capital letter.

or

xtdpdml Gdpg, inv(election ex wgdp fo lto lgoe ltot) tfix

model not identified;

no paths from latent variable Gdpg26 to observed variables

I would be grateful for your assistance in solving this problem.

How many individuals and how many time points? Also, try recoding your time variable to t=1,2,3,… etc. The tfix option doesn’t always work.

Dear proffesor Allison

I have the same Tmapp’s problem. After I

run the equation “xtdpdml TRevenue pc1 pc2 Trade Agriculture Inf Grants Natural Education2 logdensity Urban Pop65 Laborf icrg_qog fh_ipolity2 Stability left center election , pred(logGDPP TDS) inv(ingles frances aleman escandinavo) method(mlmv) ylags(1 2) errorinv”

I get the error advise:

model not identified;

no paths from latent variable TRevenue70 to observed variables

I thank in advance your attention

The problem is that several of your predictor variables have names that begin with capital letters. The sem command (which is the engine for xtdpdml) presumes that variables that start with capital letters are latent variables, which totally confuses it. Change your variable names to have only lower case letters.

Dear Paul,

I am getting a similar error as TMapp.

xtdpdml HW xvar1 xvar2 , errorinv

>

model not identified;

no paths from latent variable HW13 to observed variables

<

It might help if I describe the data and variables:

y, x1 and x2 are time-varying. I have 48,494 observations (invidivduals*t) and about 5500 individuals. Time is coded from (t) 1 to 13. Data is in long format.

What might be an issue: the data is not balanced and the time-sequence has gaps in it. However, if I eliminate both the unbalanced sequences and the individuals with gaps, as well as those not starting with t1 I still get the same error message.

Do you happen to have any idea what is going on?

Best,

Christian

If you’ll send me the data set, I’ll check it out: allison@statisticalhorizons.com

Dear Paul,

Why not just include AR(1) residuals directly in the model? In other words:

mixed lwage ed fem t || id:, residuals(ar1, t(t))

Best,

Wolfgang

That can be useful, but many researchers actually want to estimate and control for the effect of a lagged dependent variable.

Hi Paul,

this is really interesting. Now, I am using xtmixed (Stata) to run a relatively simply cross-classified (firms within both industries and countries) multilevel (longitudinal) model.

The model (simplified) looks something like:

xtmixed Dvar IVfirm IVfirm IVindustry IVcountry || _all: R.concode || sic3: || concode:, var

Now, I need to instrument the country level IV – how would I do this with your approach in stata?

Sorry, but I don’t have a solution with the ML approach.

Hi Paul,

Thank you for a very clear and informative post. I am however having trouble accessing the xtdpdml command in my Stata software, is it available for the STATA MP 13 and STATA 14 version?

Regards

Hanna Lindström

Yes, it works for Stata 13 and Stata 14. You can get more information at https://www3.nd.edu/~rwilliam/dynamic/

Thanks a lot Prof. Paul for the very enlightening post. Other than on your sample dataset, each time I attemt to use the xtdpdml command, I always get the message “convergence not achieved”. What implication does this message have on my result and how can I resolve it? Thanks.

Thank you so much for writing this program.

I could not find anything in the SJ article and I was wondering if you had any suggestions about how to apply (longitudinal) weights in any way. Thank you.

Sorry, but xtdpdml currently does not allow weights.

Thank you for your response.

I was also wondering what you would suggest if my ideal model would need to account for selectivity. Specifically, I am modelling occupational status, and I would like to account for selection into employment. In a xtreg with lagged DV I would estimate a first stage probit for the probability into employment and then use the Inverse Mills Ratio from the probit in my xtreg model predicting occupational status.

What would you suggest if I want to use xtdpdml accounting for selection? Thanks in advance.

xtreg does not have any special features to account for selection. But keep in mind that when you are estimating a fixed effects model (the default in xtdpdml), selection issues are much less important. That’s because you are only using within-individual variation to estimate the model. Selection is primarily a between-person confounder.

Hi Paul,

Thanks for the post – very interesting. I’m a psychology researcher and I have never seen this issue discussed in the quantitative psych literature. I was curious your thoughts on analyzing data from a common research design I do. I conduct a lot of daily diary studies where I collect participant data every day for 3 weeks straight. I often run longitudinal mixed effect models (days nested within persons) where I use yesterday’s IV to predict today’s DV. In these mixed effect models I almost always control for yesterday’s (i.e., lagged) DV to show that my IV can predict change over time. I will person-mean (aka group-mean) center both yesterday’s IV and yesterday’s (i.e., lagged) DV so only the time-varying components of each remain. Is the lagged DV as a predictor still a problem in this case? I was wondering if it’s only a problem for time-invariant predictors.

Would love to hear any thoughts you have,

David

Even if you group-mean center the variables, putting a lagged DV in your model will still lead to severe bias.

Hi Paul,

Thanks very much for this informative post, and the related resources found on the web. I’m in psychological research and I agree with David that such discussion is rare in psychology.

I have a set of data collected via ESM/EMA (experience sampling method/ ecological momentary assessment) protocol, so that I collected data from participants repeatedly & randomly 8 times a day across 2 weeks. Then I tried to use time-series mixed model to see if the IV (e.g. stress) predicts DV (e.g. abdominal pain) across time, yet only within the same day. After studying xtdpdml, it seems that it’s not appropriate for my dataset because:

1. No. of measurement > No. of participants

The study had 57 participants (30 controls and 27 patients). Each of them contributed up to 114 measurements.

2. No carry-over effect overnight

Since xtdpdml automatically look for the DV at the earlier timepoint as LDV, it seems that I can’t manually tell the program to only do so if the measurements were taken within the same day. Previously, I’d create a column for LDV, as a new IV, copy the values from previous timepoint if it comes from the same day, and treat it as missing value if it comes from the previous day.

I’d love to hear what you think about the issue. Thanks a lot!

Yawen

xtdpdml is definitely not appropriate for the kind of data that you are working with.