Using Panel Data to Infer Causal Direction: ML vs. Arellano-Bond
November 3, 2014 By Paul Allison
Does x cause y or does y cause x? Virtually everyone agrees that cross-sectional data are of no use in answering this question. The ideal, of course, would be to do two randomized experiments, one examining the effect of x on y, and the other focused on the reverse effect. Absent this, most social scientists would say that some of kind of longitudinal data ought to do the trick. But what kinds of data are needed and how should they be analyzed?
In this post, I review some earlier work I’ve done on these questions, and I report new simulation results comparing the Arellano-Bond method with maximum likelihood (ML) using structural equation modeling (SEM) software. Arrelano-Bond is hugely popular among economists, but not widely known in other disciplines. ML with SEM is a method that I’ve been advocating for almost 15 years (Allison 2000, 2005a, 2005b, 2009). Long story short: ML rules.
I focus on panel data in which we observe y_{it} and x_{it} for i =1,…, n and t =1,…, T. The proposed linear model allows for reciprocal, lagged effects of these two variables on each other:
y_{it} = b_{1}x_{i}_{(t-1)} + b_{2}y_{i}_{(t-1)} + c_{i} + e_{it}
x_{it} = a_{1}x_{i}_{(t-1)} + a_{2}y_{i}_{(t-1) }+ f_{i} + d_{it}
The terms c_{i} and f_{i} represent individual-specific unobserved heterogeneity in both x and y. They are treated as “fixed effects”, thereby allowing one to control for all unchanging characteristics of the individuals, a key factor in arguing for a causal interpretation of the coefficients. Finally, e_{it} and d_{it} are assumed to represent pure random noise, independent of any variables measured at earlier time points.
If all the assumptions are met, b_{1} can be interpreted as the causal effect of x on y, and a_{2} can be interpreted as the causal effect of y on x. This model can be elaborated in various ways to include, for example, other predictor variables, different lags, and coefficients that change over time.
Estimation of the model is not straightforward for reasons that are well known in the econometric literature. First, the presence of a lagged dependent variable as a predictor in each equation means that conventional fixed effects methods yield biased estimates of the coefficients under almost any condition. But even if the lagged dependent variables were excluded from the equations, the error term in each equation would still be correlated with all future values of both x and y. For example, e_{2} -> y_{2} -> x_{3}. So, again, conventional fixed effects will produce biased coefficients.
Arrelano and Bond (1991) solved these problems by using earlier lagged values of x and y as instrumental variables and by applying a generalized method of moments (GMM) estimator. Several software packages now implement this method, including SAS, Stata, LIMDEP, and the plm package for R.
My solution to the problems has been to estimate each equation separately by ML using any SEM package (e.g., LISREL, Mplus, PROC CALIS in SAS, or sem in Stata). Two “tricks” are necessary. Focusing on the first equation, fixed effects are accommodated by allowing c to be correlated with all measurements of x (as well as the initial measurement of y). Second, the error term e is allowed to be correlated with all future measurements of x. Analogous methods are used to estimate the second equation. For details, see the SEM chapters in my 2005 and 2009 books.
In my 2005 paper, I presented simulation evidence that the ML-SEM method produces approximately unbiased estimates of the coefficients under a variety of conditions. For years, I’ve been promising to do a head-to-head comparison of ML with Arellano-Bond, but I’ve just now gotten around to doing it.
What I’m going to report here are some very preliminary but dramatic results. The model used to generate the data was one in which x has a positive effect on y, but y has a negative effect on x:
y_{it} = .5x_{i}_{(t-1)} + .5y_{i}_{(t-1)} + c_{i} + e_{it}
x_{it} = .5x_{(t-1)} – .5y_{i}_{(t-1) }+ f_{i} + d_{it}
All variables have normal distributions, c has a positive correlation with x, f has a positive correlation with y, and c and f are positively correlated with each other. The baseline model had 5 time points (T=5), with sample sizes of 50, 100, 400, and 1600. Then, keeping the sample size at 400, I examined T= 4, and 10. For each condition I did 1000 replications.
I focus here on the coefficient for the effect of x on y in the first equation. For each condition, I calculated the mean squared error (MSE), which is the variance of the estimator plus its squared bias. There was little bias in either estimator, so the MSE primarily reflects sampling variance.
Here are the preliminary results:
Mean Squared Error for Two Estimators |
|||
Condition |
ML-SEM |
Arrelano-Bond |
Relative efficiency |
N=50,T=5 |
.0057128 |
.0110352 |
.5176833 |
N=100,T=5 |
.0027484 |
.0058433 |
.4703557 |
N=400,T=5 |
.0006348 |
.0014961 |
.4242679 |
N=1600, T=5 |
.0001556 |
.0003682 |
.4226466 |
N=400, T=4 |
.0011632 |
.0039785 |
.2923685 |
N=400, T=10 |
.0001978 |
.0002503 |
.7902897 |
The last column, relative efficiency, is the ratio of the MSE for ML to the MSE for A-B. With 5 time points, A-B is only about half as efficient as ML-SEM, for any sample size. But the number of time points has a dramatic effect. A-B is only 29% efficient for T=4 but 79% efficient for T=10.
The next steps are to vary such things as the magnitudes of the coefficients, the variances of the error terms, and the correlations between c and f with each other and with the predictor variables.
Besides its efficiency advantage, the ML-SEM framework makes it easier than A-B to accomplish several things:
- Handle missing data by FIML.
- Relax various constraints, such as constant error variance or constant coefficients.
- Construct a likelihood ratio test comparing fixed vs. random effects, the equivalent of the Hausman test which not infrequently breaks down.
- Add an autoregressive structure to the time-specific error components.
Before concluding, I must mention that Hsiao et al. (2002) also did a simulation study to compare ML with a variety of other estimators for the panel model, including A-B. However, their approach to ML was very different than mine, and it has not been implemented in any commercial software packages. Hsiao et al. found that ML did better with respect to both bias and efficiency than any of the other estimators, under almost all conditions. Nevertheless, the differences between ML and A-B were much smaller than those reported here.
If you’re reading this post, you should definitely read next month’s follow up by clicking here.
To learn more about these and other methods for panel data, check out my seminars, Longitudinal Data Analysis Using SAS and Longitudinal Data Analysis Using Stata. Both will be offered in the spring of 2015. Plus, I am offering a new, more advanced seminar titled Longitudinal Data Analysis Using SEM in Fort Myers, Florida, January 23-24.
References
Allison, Paul D. (2000) “Inferring Causal Order from Panel Data.” Paper presented at the Ninth International Conference on Panel Data, June 22, Geneva, Switzerland.
Allison, Paul D. (2005a) “Causal Inference with Panel Data.” Paper presented at the Annual Meeting of the American Sociological Association, August, Philadelphia, PA.
Allison, Paul D. (2005b) Fixed Effects Regression Methods for Longitudinal Data Using SAS. Cary, NC: SAS Institute.
Allison, Paul D. (2009) Fixed Effects Regression Models. Thousand Oaks, CA: Sage Publications.
Arellano, M. and S. Bond (1991) “Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations.” The Review of Economic Studies 58: 277-297.
Hsiao, Cheng, M. Hashem Pesaran, and A. Kamil Tahmiscioglu (2002) “Maximum likelihood estimation of fixed effects dynamic panel data models covering short time periods.”Journal of Econometrics 109: 107-150.
Thanks for this, Paul, I’ve been slowly collecting your and others’ work on this topic, including the econometricians’, and it’s surprising how easily many of their seemingly complex models can be specified in SEM software. Extensions of a wide variety of kinds are also possible depending upon the capabilities of the SEM program used (e.g., Mplus). I look forward to seeing more of your work in print. FYI, as you surely know very well, all of their instrumental variable methods can be estimated extremely in SEM simply with a modified kind of mediation model wherein X and Y residuals are allowed to covary (excluding covariance among the instrument[s]). Simultaneous causality models are also possible with instruments for both X and Y, while still allowing for residual covariance to inhibit bias in estimates of causal effects. I am surprised that economists don’t recognize how simplistic most of their models are from an SEM perspective, but their culture is so very insular that it makes sense. In my Mplus workshops that are attended by 100+ people each year, the economists are always the ones who are most impressed with SEM methods because they are the ones who know least about them. Thanks again!
Mike Zyphur
Business & Economics
University of Melbourne
Thanks Mike. Couldn’t agree more.
Dear professor Allison, that’s a very interesting topic. Thank you for that post. Searching over your publications, I found one SAS code example of dynamic panel using ML.
I was wondering, however, if you do have an example code of a full implementation of a model equivalent to Arellano-Bond, but with SEM. Preferably, would you happen to have any example in STATA or MPlus?
Best I can offer is the paper at http://statisticalhorizons.com/wp-content/uploads/ML-DynamicPanel-1SP.pdf