## Asymmetric Fixed Effects Models for Panel Data

##### October 12, 2018 By Paul Allison

Standard methods for the analysis of panel data depend on an assumption of directional symmetry that most researchers don’t even think about. Specifically, these methods assume that if a one-unit increase in variable *X* produces a change of *B *units in variable *Y*, then a one-unit *decrease* in *X* will result in a change of –*B* units in *Y*.

Does that make sense? Probably not for most applications. For example, is it plausible that the increase in happiness when a person gets married is exactly matched by the decrease in happiness when a person gets divorced? Or that a $10K increase in income has the same effect on savings as a $10K decrease (in the opposite direction).

In this post, I’m going to show you how to relax that assumption. I’ll do it for the simplest situation where there are only two time points. But I’ve also written a more detailed paper covering the multi-period situation.

Here’s the example with two-period data. The data set has 581 children who were studied in 1990 and 1992 as part of the National Longitudinal Survey of Youth. I’ll use three variables that were measured at each of the two time points:

**anti** antisocial behavior, measured with a scale from 0 to 6.

**self** self-esteem, measured with a scale ranging from 6 to 24.

**pov** poverty status of family, coded 1 for family in poverty, otherwise 0.

You can download the data here.

The goal is to estimate the causal effects of **self** and **pov** on **anti**. I’ll focus on fixed effects methods (Allison 2005, 2009) because they are ideal for studying the effects of increases or decreases over time. They also have the remarkable ability to control for all time-invariant confounders.

For two-period data, there are several equivalent ways to estimate a fixed effects model. The difference score method is the one that’s the most straightforward for allowing directional asymmetry. It works like this: for each variable, subtract the time 1 value from the time 2 value to create a difference score. Then, just estimate an ordinary linear regression with the difference scores.

Here is Stata code for a standard symmetrical model:

use nlsy.dta, clear generate antidiff=anti92-anti90 generate selfdiff=self92-self90 generate povdiff=pov92-pov90 regress antidiff selfdiff povdiff

You’ll find equivalent SAS code at the end of this post.

And here are the results:

------------------------------------------------------------------------------ antidiff | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- selfdiff | -.0391292 .0136396 -2.87 0.004 -.0659185 -.01234 povdiff | .1969039 .1326352 1.48 0.138 -.0636018 .4574096 _cons | .0403031 .0533833 0.75 0.451 -.0645458 .1451521 ------------------------------------------------------------------------------

Self-esteem has a highly significant negative effect on antisocial behavior. Specifically, for each 1-unit increase in self-esteem, antisocial behavior goes down by .039 units. But that also means that for each 1-unit decrease in self-esteem, antisocial behavior goes up by .039 units. Poverty has a positive (but non-significant) effect on self-esteem. Children who move into poverty have an estimated increase in antisocial behavior of .112. But children who move out of poverty have an estimated decrease antisocial behavior of .112.

How can we relax the constraint that these effects have to be the same in both directions? York and Light (2017) showed the way. What’s needed is to decompose the difference score for each predictor variable into its positive and negative components. Specifically, if *D* is a difference score for variable *X*, create a new variable *D*^{+} which equals *D* if *D *is greater than 0, otherwise 0. And create a second variable *D*^{–} which equals –*D* if *D* is less than 0, otherwise 0.

Here’s how to create these variables in Stata:

generate selfpos=selfdiff*(selfdiff>0) generate selfneg=-selfdiff*(selfdiff<0) generate povpos=povdiff*(povdiff>0) generate povneg=-povdiff*(povdiff<0)

The inequalities in parentheses are logical expressions that have a value of 1 if the inequality is true and 0 if the inequality is false.

Now just regress **antidiff** on all four of these variables,

regress antidiff selfpos selfneg povpos povneg

which produces the following table:

-------------------------------------------------------------------------- antidiff | Coef. Std. Err. t P>|t| [95% Conf. Interval] ---------+---------------------------------------------------------------- selfpos | -.0048386 .0251504 -0.19 0.848 -.0542362 .0445589 selfneg | .0743077 .025658 2.90 0.004 .023913 .1247024 povpos | .2502064 .2003789 1.25 0.212 -.143356 .6437688 povneg | -.126328 .1923669 -0.66 0.512 -.5041541 .2514981 _cons | -.0749517 .086383 -0.87 0.386 -.2446157 .0947123

--------------------------------------------------------------------------

What does this tell us? A 1-unit increase in self-esteem lowers antisocial behavior by .005 units (an effect that is far from statistically significant). A 1-unit *decrease *in self-esteem increases antisocial behavior by .074 units (highly significant). So it looks like decreases in self-esteem have a big effect, while increases have little impact. Note that the original estimate of -.039 was about midway between these two estimates.

Are the two effects significantly different? We can test that with the command

test selfpos=-selfneg

which yields a *p*-value of .10, not quite statistically significant.

Neither of the two poverty coefficients is statistically significant, although they are in the expected direction: moving into poverty increases antisocial behavior while moving out of poverty reduces it, but by about half the magnitude. These two effects are definitely not significantly different.

So that’s basically it for two-period data. When there are three or more periods, you have to create multiple records for each individual. Each record contains difference scores for adjacent periods. When estimating the regression model, you need to allow for negative correlations between adjacent records by using generalized least squares.

I discuss all these options in a paper that can be found here. That paper also presents a data generating model that justifies the asymmetric first difference method. The data generating model can be extended to allow for the estimation of asymmetric logistic regression models, which can’t be estimated with difference scores.

If you want to learn more about fixed effects methods, see my two books on this topic: *Fixed Effects Regression Models* and *Fixed Effects Regression Methods for Longitudinal Data Using SAS. *Or take one of my 2-day seminars on longitudinal data analysis.

References:

Allison, Paul D. *Fixed effects regression models*. Vol. 160. SAGE publications, 2009.

Allison, Paul D. *Fixed effects regression methods for longitudinal data using SAS*. SAS Institute, 2005.

York, Richard, and Ryan Light. “Directional asymmetry in sociological analyses.” *Socius* 3 (2017): 1-13.

SAS Program/* The NLSY data set can be downloaded at statisticalhorizons.com/resources/data-sets */ data nlsydiff; set my.nlsy; antidiff=anti2-anti1; selfdiff=self2-self1; povdiff=pov2-pov1; proc reg data=nlsydiff; model antidiff=selfdiff povdiff; run; data nlsydiff; set nlsydiff; selfpos=selfdiff*(selfdiff>0); selfneg=-selfdiff*(selfdiff<0); povpos=povdiff*(povdiff>0); povneg=-povdiff*(povdiff<0); proc reg data=nlsydiff; model antidiff=selfpos selfneg povpos povneg; test selfpos=-selfneg; test povpos=-povneg; run;

A question.

If the goal was just to estimate the causal effects of self on anti. So I include pov as a control but have no interest in it as a predictor, do I still have to do all of the above for both self and pov or can I just create the following:

use nlsy.dta, clear

generate antidiff=anti92-anti90

generate selfdiff=self92-self90

generate povdiff=pov92-pov90

regress antidiff selfdiff povdiff

generate selfpos=selfdiff*(selfdiff>0)

generate selfneg=-selfdiff*(selfdiff<0)

regress antidiff selfpos selfneg povdiff

and then interpret the coefficient on selfpos and selfneg in terms of how it relates to selfdiff from earlier and only consider povdiff as a control?

Put simply, must I generate positives and negatives for every control if I'm only interested in asymmetric effects for my coefficient of interest (here self)?

Thanks, James

Apologies to post twice, I entered old email information the first time.

If variables are only entered as controls, there’s no requirement to decompose them into positive and negative components. However, if those controls really do have different effects in different directions, decomposing their effects could improve their performance as controls.

Hi Paul,

Thank you for your response. I was also interested in your use of a Wald test. When, as you demonstrate for self-esteem and poverty, the two effects are not significantly different for the positive and negative components of the asymmetric model, does this provide evidence to support use instead of the standard symmetric model?

Thanks,

James

Yes, it does.

Thank you.

A final question, why would the above method cause the number of observations to increase after generating positives and negatives for controls? I’ve applied the above in a panel dataset to learn more about how it works. I’ve noticed that a conditional logit with controls decomposed into positive and negative components has more observations, and apparently more individuals based on clustering at the id, then without. Surely no new individuals can be added to the analysis, so where do these new observations come from?

Best,

James

I can’t think of any reason why this would happen.