This is a follow-up to last month’s post, in which I considered the use of panel data to answer questions about causal ordering: does x cause y or does y cause x? In the interim, I’ve done many more simulations to compare the two competing methods, Arellano-Bond and ML-SEM, and I’m going to report some key results here. If you want all the details, read my recent paper by clicking here. If you’d like to learn how to use these methods, check out my new seminar titled Longitudinal Data Analysis Using SEM.
Quick review: The basic approach is to assume a cross-lagged linear model, with y at time t affected by both x and y and time t-1, and x at time t also affected by both lagged variables. The equations are
yit= b1xi(t-1) + b2yi(t-1) + ci + eit
xit= a1xi(t-1) + a2yi(t-1) + fi + dit
for i = 1,…, n, and t = 1,…, T.
The terms ci and fi represent individual-specific unobserved heterogeneity in both x and y. They are treated as “fixed effects”, thereby allowing one to control for all unchanging characteristics of the individuals, a key factor in arguing for a causal interpretation of the coefficients. Finally, eit and dit are assumed to represent pure random noise, independent of any variables measured at earlier time points. Additional exogenous variables could also be added to these equations.
Conventional estimation methods are biased because of the lagged dependent variable and because of the reciprocal relationship between the two variables. The most popular solution is the Arellano-Bond (A-B) method (or one of its cousins), but I have previously argued for the use of maximum likelihood (ML) as implemented in structural equation modeling (SEM) software.
Last month I presented very preliminary simulation results showing that ML-SEM had substantially lower mean-squared error (MSE) than A-B under a few conditions. Since then I’ve done simulations for 31 different sets of parameter values and data configurations. For each condition, I generated 1,000 samples, ran the two methods on each sample, and then calculated bias, mean squared error, and coverage for confidence intervals. Since the two equations are symmetrical, the focus is on the coefficients in the first equation, b1 for the effect of x on y, and b2 for the effect of y on itself.
The simulations for ML were done with PROC CALIS in SAS. I originally started with the sem command in Stata, but it had a lot of convergence problems for the smaller sample sizes. The A-B simulations were done in Stata with the xtabond command. I tried PROC PANEL in SAS, but couldn’t find any combination of options that produced approximately unbiased estimates.
Here are some of the things I’ve learned:
Under every condition, ML showed little bias and quite accurate confidence interval coverage. That means that about 95% of the nominal 95% confidence intervals included the true value.
Except under “extreme” conditions, A-B also had little bias and reasonably accurate confidence interval coverage.
However, compared with A-B, ML-SEM always showed less bias and smaller sampling variance. My standard of comparison is relative efficiency, which is the ratio of MSE for ML to MSE for A-B. (MSE is the sum of the sampling variance plus the squared bias.) Across 31 different conditions, relative efficiency of the two estimators ranged from .02 to .96, with a median of .50. To translate, if the relative efficiency is .50, you’d need twice as large a sample to get the same accuracy with A-B as with ML.
Relative efficiency of the two estimators is strongly affected by the value of the parameter b2, the effect of yt-1 on yt. As b2 gets close to 1, the A-B estimators for both b1 and b2 become badly biased (toward 0), and the sample variance increases, which is consistent with previous literature on the A-B estimator. For ML, on the other hand, bias and variance are rather insensitive to the value of b2. Here are the numbers:
Rel Eff b1
Rel Eff b2
Relative efficiency is strongly affected by the number of time points, but in the opposite direction for the two coefficients. Thus, relative efficiency for b1 increases almost linearly as the number of time points goes from 3 to 10. But for b2, relative efficiency is highest at T=3, declines markedly for T=4 and T=5, and then remains stable.
Rel Eff b1
Rel Eff b2
Relative efficiency is also strongly affected by the ratio of the variance of ci, (the fixed effect) to the variance of eit (the pure random error). In the next table, I hold constant the variance of c and vary the standard devation of e.
Rel Eff b1
Rel Eff b2
Relative efficiency is not strongly affected by:
The value of b1
The correlation between ciand fi, the two fixed-effects variables.
Because ML is based on the assumption of multivariate normality, one might suspect that A-B would do better than ML if the distributions were not normal. To check that out, I generated all the variables using a 2-df chi-square variable, which is highly skewed to the right. ML still did great in this situation, and was still about twice as efficient as A-B.
In sum, ML-SEM outperforms A-B in every situation studied, by a very substantial margin.
Does x cause y or does y cause x? Virtually everyone agrees that cross-sectional data are of no use in answering this question. The ideal, of course, would be to do two randomized experiments, one examining the effect of x on y, and the other focused on the reverse effect. Absent this, most social scientists would say that some of kind of longitudinal data ought to do the trick. But what kinds of data are needed and how should they be analyzed?
In this post, I review some earlier work I’ve done on these questions, and I report new simulation results comparing the Arellano-Bond method with maximum likelihood (ML) using structural equation modeling (SEM) software. Arrelano-Bond is hugely popular among economists, but not widely known in other disciplines. ML with SEM is a method that I’ve been advocating for almost 15 years (Allison 2000, 2005a, 2005b, 2009). Long story short: ML rules.
I focus on panel data in which we observe yit and xit for i =1,…, n and t =1,…, T. The proposed linear model allows for reciprocal, lagged effects of these two variables on each other:
yit = b1xi(t-1) + b2yi(t-1) + ci + eit
xit = a1xi(t-1) + a2yi(t-1) + fi + dit
The terms ci and fi represent individual-specific unobserved heterogeneity in both x and y. They are treated as “fixed effects”, thereby allowing one to control for all unchanging characteristics of the individuals, a key factor in arguing for a causal interpretation of the coefficients. Finally, eit and dit are assumed to represent pure random noise, independent of any variables measured at earlier time points.
If all the assumptions are met, b1 can be interpreted as the causal effect of x on y, and a2 can be interpreted as the causal effect of y on x. This model can be elaborated in various ways to include, for example, other predictor variables, different lags, and coefficients that change over time.
Estimation of the model is not straightforward for reasons that are well known in the econometric literature. First, the presence of a lagged dependent variable as a predictor in each equation means that conventional fixed effects methods yield biased estimates of the coefficients under almost any condition. But even if the lagged dependent variables were excluded from the equations, the error term in each equation would still be correlated with all future values of both x and y. For example, e2 -> y2 -> x3. So, again, conventional fixed effects will produce biased coefficients.
Arrelano and Bond (1991) solved these problems by using earlier lagged values of x and y as instrumental variables and by applying a generalized method of moments (GMM) estimator. Several software packages now implement this method, including SAS, Stata, LIMDEP, and the plm package for R.
My solution to the problems has been to estimate each equation separately by ML using any SEM package (e.g., LISREL, Mplus, PROC CALIS in SAS, or sem in Stata). Two “tricks” are necessary. Focusing on the first equation, fixed effects are accommodated by allowing c to be correlated with all measurements of x (as well as the initial measurement of y). Second, the error term e is allowed to be correlated with all future measurements of x. Analogous methods are used to estimate the second equation. For details, see the SEM chapters in my 2005 and 2009 books.
In my 2005 paper, I presented simulation evidence that the ML-SEM method produces approximately unbiased estimates of the coefficients under a variety of conditions. For years, I’ve been promising to do a head-to-head comparison of ML with Arellano-Bond, but I’ve just now gotten around to doing it.
What I’m going to report here are some very preliminary but dramatic results. The model used to generate the data was one in which x has a positive effect on y, but y has a negative effect on x:
yit = .5xi(t-1) + .5yi(t-1) + ci + eit
xit = .5x(t-1) – .5yi(t-1) + fi + dit
All variables have normal distributions, c has a positive correlation with x, f has a positive correlation with y, and c and f are positively correlated with each other. The baseline model had 5 time points (T=5), with sample sizes of 50, 100, 400, and 1600. Then, keeping the sample size at 400, I examined T= 4, and 10. For each condition I did 1000 replications.
I focus here on the coefficient for the effect of x on y in the first equation. For each condition, I calculated the mean squared error (MSE), which is the variance of the estimator plus its squared bias. There was little bias in either estimator, so the MSE primarily reflects sampling variance.
Here are the preliminary results:
Mean Squared Error for Two Estimators
The last column, relative efficiency, is the ratio of the MSE for ML to the MSE for A-B. With 5 time points, A-B is only about half as efficient as ML-SEM, for any sample size. But the number of time points has a dramatic effect. A-B is only 29% efficient for T=4 but 79% efficient for T=10.
The next steps are to vary such things as the magnitudes of the coefficients, the variances of the error terms, and the correlations between c and f with each other and with the predictor variables.
Besides its efficiency advantage, the ML-SEM framework makes it easier than A-B to accomplish several things:
Handle missing data by FIML.
Relax various constraints, such as constant error variance or constant coefficients.
Construct a likelihood ratio test comparing fixed vs. random effects, the equivalent of the Hausman test which not infrequently breaks down.
Add an autoregressive structure to the time-specific error components.
Before concluding, I must mention that Hsiao et al. (2002) also did a simulation study to compare ML with a variety of other estimators for the panel model, including A-B. However, their approach to ML was very different than mine, and it has not been implemented in any commercial software packages. Hsiao et al. found that ML did better with respect to both bias and efficiency than any of the other estimators, under almost all conditions. Nevertheless, the differences between ML and A-B were much smaller than those reported here.
If you’re reading this post, you should definitely read next month’s follow up by clicking here.
Arellano, M. and S. Bond (1991) “Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations.” The Review of Economic Studies 58: 277-297.
Hsiao, Cheng, M. Hashem Pesaran, and A. Kamil Tahmiscioglu (2002) “Maximum likelihood estimation of fixed effects dynamic panel data models covering short time periods.”Journal of Econometrics 109: 107-150.