Skip to content

Asymmetric Fixed Effects Models for Panel Data

Paul Allison
October 12, 2018

Standard methods for the analysis of panel data depend on an assumption of directional symmetry that most researchers don’t even think about. Specifically, these methods assume that if a one-unit increase in variable X produces a change of B units in variable Y, then a one-unit decrease in X will result in a change of –B units in Y.

Does that make sense? Probably not for most applications. For example, is it plausible that the increase in happiness when a person gets married is exactly matched by the decrease in happiness when a person gets divorced? Or that a $10K increase in income has the same effect on savings as a $10K decrease (in the opposite direction).

LEARN MORE IN A SEMINAR WITH PAUL ALLISON

In this post, I’m going to show you how to relax that assumption. I’ll do it for the simplest situation where there are only two time points. But I’ve also written a more detailed paper covering the multi-period situation.

Here’s the example with two-period data. The data set has 581 children who were studied in 1990 and 1992 as part of the National Longitudinal Survey of Youth.  I’ll use three variables that were measured at each of the two time points:

anti     antisocial behavior, measured with a scale from 0 to 6.
self      self-esteem, measured with a scale ranging from 6 to 24.
pov      poverty status of family, coded 1 for family in poverty, otherwise 0.

You can download the data here.

The goal is to estimate the causal effects of self and pov on anti. I’ll focus on fixed effects methods (Allison 2005, 2009) because they are ideal for studying the effects of increases or decreases over time. They also have the remarkable ability to control for all time-invariant confounders.

For two-period data, there are several equivalent ways to estimate a fixed effects model. The difference score method is the one that’s the most straightforward for allowing directional asymmetry. It works like this: for each variable, subtract the time 1 value from the time 2 value to create a difference score. Then, just estimate an ordinary linear regression with the difference scores.

Here is Stata code for a standard symmetrical model:

use nlsy.dta, clear
generate antidiff=anti92-anti90
generate selfdiff=self92-self90
generate povdiff=pov92-pov90
regress antidiff selfdiff povdiff

You’ll find equivalent SAS code at the end of this post.

And here are the results:

------------------------------------------------------------------------------
    antidiff |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    selfdiff |  -.0391292   .0136396    -2.87   0.004    -.0659185     -.01234
     povdiff |   .1969039   .1326352     1.48   0.138    -.0636018    .4574096
       _cons |   .0403031   .0533833     0.75   0.451    -.0645458    .1451521
------------------------------------------------------------------------------

Self-esteem has a highly significant negative effect on antisocial behavior. Specifically, for each 1-unit increase in self-esteem, antisocial behavior goes down by .039 units. But that also means that for each 1-unit decrease in self-esteem, antisocial behavior goes up by .039 units. Poverty has a positive (but non-significant) effect on self-esteem. Children who move into poverty have an estimated increase in antisocial behavior of .112. But children who move out of poverty have an estimated decrease antisocial behavior of .112.

How can we relax the constraint that these effects have to be the same in both directions?  York and Light (2017) showed the way. What’s needed is to decompose the difference score for each predictor variable into its positive and negative components. Specifically, if D is a difference score for variable X, create a new variable D+ which equals D if D is greater than 0, otherwise 0. And create a second variable D which equals –D if D is less than 0, otherwise 0.

Here’s how to create these variables in Stata:

generate selfpos=selfdiff*(selfdiff>0)
generate selfneg=-selfdiff*(selfdiff<0)
generate povpos=povdiff*(povdiff>0)
generate povneg=-povdiff*(povdiff<0)

The inequalities in parentheses are logical expressions that have a value of 1 if the inequality is true and 0 if the inequality is false.

Now just regress antidiff on all four of these variables,

regress antidiff selfpos selfneg povpos povneg

which produces the following table:

--------------------------------------------------------------------------
antidiff |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------+----------------------------------------------------------------
 selfpos |  -.0048386   .0251504    -0.19   0.848    -.0542362    .0445589
 selfneg |   .0743077    .025658     2.90   0.004      .023913    .1247024
  povpos |   .2502064   .2003789     1.25   0.212     -.143356    .6437688
  povneg |   -.126328   .1923669    -0.66   0.512    -.5041541    .2514981
   _cons |  -.0749517    .086383    -0.87   0.386    -.2446157    .0947123
--------------------------------------------------------------------------

What does this tell us?  A 1-unit increase in self-esteem lowers antisocial behavior by .005 units (an effect that is far from statistically significant). A 1-unit decrease in self-esteem increases antisocial behavior by .074 units (highly significant). So it looks like decreases in self-esteem have a big effect, while increases have little impact. Note that the original estimate of -.039 was about midway between these two estimates.

Are the two effects significantly different? We can test that with the command

test selfpos=-selfneg

which yields a p-value of .10, not quite statistically significant.

Neither of the two poverty coefficients is statistically significant, although they are in the expected direction: moving into poverty increases antisocial behavior while moving out of poverty reduces it, but by about half the magnitude. These two effects are definitely not significantly different.

So that’s basically it for two-period data. When there are three or more periods, you have to create multiple records for each individual. Each record contains difference scores for adjacent periods. When estimating the regression model, you need to allow for negative correlations between adjacent records by using generalized least squares.

I discuss all these options in an article that can be found here. That paper also presents a data generating model that justifies the asymmetric first difference method. The data generating model can be extended to allow for the estimation of asymmetric logistic regression models, which can’t be estimated with difference scores.

If you want to learn more about fixed effects methods, see my two books on this topic:  Fixed Effects Regression Models and Fixed Effects Regression Methods for Longitudinal Data Using SAS.

References:

Allison, Paul D. Fixed effects regression models. Vol. 160. SAGE publications, 2009.
Allison, Paul D. Fixed effects regression methods for longitudinal data using SAS. SAS Institute, 2005.
York, Richard, and Ryan Light. “Directional asymmetry in sociological analyses.” Socius 3 (2017): 1-13.


SAS Program

/* The NLSY data set can be downloaded at statisticalhorizons.com/resources/data-sets */
data nlsydiff;
 set my.nlsy;
 antidiff=anti2-anti1;
 selfdiff=self2-self1;
 povdiff=pov2-pov1;
proc reg data=nlsydiff;
  model antidiff=selfdiff povdiff;
run;
data nlsydiff;
 set nlsydiff;
 selfpos=selfdiff*(selfdiff>0);
 selfneg=-selfdiff*(selfdiff<0);
 povpos=povdiff*(povdiff>0);
 povneg=-povdiff*(povdiff<0);
proc reg data=nlsydiff;
  model antidiff=selfpos selfneg povpos povneg;
  test selfpos=-selfneg;
  test povpos=-povneg;
run;
Share

Comments

  1. Hi,

    I was wondering how the asymmetric models generalize to other nonlinear cases like poisson or negbin models (using Stata)? And how would one proceed with ordinal dependent variable?

    Best,

    Tomi

    1. In my original paper there’s a section called “An Asymmetric Logistic Model for Multiperiod Dichotomous Data”. A similar aproach would be used for Poisson, using a conditional Poisson model for the fixed effects. And, in principle, it could also be used for negative binomial or ordered logit. The problem is that you first need a method for estimating a conditional fixed effects model for these kinds of outcomes. Neither ordered logit nor negative binomial have a conditional likelihood. For negative binomial, I’ve shown that just using dummy variables for the fixed effects will work well. And here’s a reference for how to do it for ordered logit.

      1. Thank you for your reply. I would like to ask an additional question. If I would like to use only asymmetric first-difference model (not the equivalent of the mean deviation etc. model) and focus only on the immediate effects, how should I proceed when dependent variable is binary or ordinal? As you stated in the paper a differenced binary variable will produce a three category variables (-1, 0, 1) so a logistic model would not be suitable. In case of binary original variable (-1,0,1 differeced) would a ordinal or multinomial model work? And if so, would the model(s) be like (using Stata): ologit povdiff selfpos selfneg antipos antineg OR mlogit povdiff selfpos selfneg antipos antineg? For ordinal variable i would assume a ordinal logistic would be suitable?

        Best,

        Tomi

        1. To the best of my knowledge, there’s no good way to do this by differencing the outcome. You need to cumulate the positive and negative changes on the predictors, as explained in my article.

  2. Dear Paul: Not sure you’ll see this years after the initial post, but I found your entry as I was looking for literature on a slightly related “asymmetry” scenario. In current research, I have a key variable that reports a level 2 Gini Coefficient for all population increases observed across level 1 units from T1 to T2. I’d like to run a FE model across my level 2 units using the Gini Coefficient as a difference score. Specifically, the T2 values would be the observed Gini Coefficient, while the T1 values would all be defined as 0.

    In some sense, the “asymmetry” here is similar to what you write because all values are constrained to be greater than zero. But I was wondering if you would recommend against this approach for any other obvious reasons, or if you could point to relevant literature/discussions. I’d be grateful. Many thanks in advance,

    Samuel

    1. Hi Samuel:

      If all T1 values are defined as 0, then you don’t really have a difference score and fixed effects doesn’t make sense.
      On the other hand sounds it like your Gini coefficient is itself defined on changes from T1 to T2. So it might be reasonable to regress the T2 Gini on difference scores for the predictors (although that wouldn’t have the mathematical foundation of the classic FE model). And, of course, that’s effectively what you’re proposing.

  3. When I read the paper you mentioned in this blog I was confused as to why the cumulative effect of x on y is considered when studying more than 2 periods, what is the benefit of studying the cumulative number of positive changes and the cumulative number of negative changes rather than just this over just decomposing the difference score for each predictor variable into its positive and negative components and using these in the analysis, as you do in this blog?

    Best,

    Jonathan

    1. In the blog, I worked with difference scores as the outcome. In that case, you can just decompose the predictor into positive and negative changes, and you don’t have to worry about cumulating. But if you work with the original variable as your outcome, it turns out that cumulation is necessary to get the same answer as the difference score method. That’s because the level of a variable at time t depends on the past history of its changes in one direction or the other. When you use difference scores, that history cancels out.

      There are two principle advantages to working with levels rather than differences: 1. You may lose fewer cases when there is missing data, and 2. the model can be easily extended to categorical dependent variables.

      1. Thank you for your response.

        I’m still not sure I understand.

        I take an example from your data:

        /*

        id year spousediff spouse spousepos spouseneg spousecumpos spousecumneg
        906 1 0 0 0 0 0 0
        906 2 1 1 1 0 1 0
        906 3-1 0 0 1 1 1
        906 4 0 0 0 0 1 1
        906 5 1 1 1 0 2 1

        */

        clogit pov mother spousecumpos spousecumneg inschoolcumpos ///
        inschoolcumneg hourscumpos hourscumneg i.year, group(id) robust

        My question is:

        Is the cumulative approach taken so that the coefficient on spousecumpos will reflect the consequences of all the unique changes in the predictor from 0 -> 1 on the probability of the outcome being true? So, this ensures the coefficient reports a cumulative effect of the predictor on the outcome that is a consequence of all the unique times the predictor went from not true to true and doesn’t accidently double count those times it was still true from an earlier period?

        Thank you

        1. The cumulative approach is taken so that, for any time t, we have a count of the number of previous occasions on which a person has changed from being unmarried to married, and a count of the number of previous occasions on which a person has changed from being married to unmarried. The outcome Y(t) is assumed to be a function of those two counts. Thus, each marriage is presumed to have changed Y(t) by b units, and each marital dissolution is presumed to have changed Y(t) by c units (most likely in the opposite direction).

          1. Thank you,

            However, I wonder though, how applicable this is to 3 period data with binary predictors?

            For t=1, in which case Xit-1 is not observed, Xit+ and Xit- are both set to 0. Thus if an individual can only change once to positive and once to negative, what is the relevance of the cumulative approach here?

            i.e. in a fixed effects logistic regression they will only have time to go from 0 -> 1 -> 0 or 0 -> 1 -> 1.

            “Thus, Z+ is the accumulation up to time t of all previous positive changes in X, and Z– is the accumulation of all previous negative changes in X. When X is a dummy variable, Z+ is just the number of previous changes from 0 to 1, and Z– is the number of previous changes from 1 to 0. For example, Z+ might be the number of previous marriages, and Z– might be the number of previous divorces.”

  4. A question.

    If the goal was just to estimate the causal effects of self on anti. So I include pov as a control but have no interest in it as a predictor, do I still have to do all of the above for both self and pov or can I just create the following:

    use nlsy.dta, clear
    generate antidiff=anti92-anti90
    generate selfdiff=self92-self90
    generate povdiff=pov92-pov90

    regress antidiff selfdiff povdiff

    generate selfpos=selfdiff*(selfdiff>0)
    generate selfneg=-selfdiff*(selfdiff<0)

    regress antidiff selfpos selfneg povdiff

    and then interpret the coefficient on selfpos and selfneg in terms of how it relates to selfdiff from earlier and only consider povdiff as a control?

    Put simply, must I generate positives and negatives for every control if I'm only interested in asymmetric effects for my coefficient of interest (here self)?

    Thanks, James

    Apologies to post twice, I entered old email information the first time.

    1. If variables are only entered as controls, there’s no requirement to decompose them into positive and negative components. However, if those controls really do have different effects in different directions, decomposing their effects could improve their performance as controls.

      1. Hi Paul,

        Thank you for your response. I was also interested in your use of a Wald test. When, as you demonstrate for self-esteem and poverty, the two effects are not significantly different for the positive and negative components of the asymmetric model, does this provide evidence to support use instead of the standard symmetric model?

        Thanks,

        James

          1. Thank you.

            A final question, why would the above method cause the number of observations to increase after generating positives and negatives for controls? I’ve applied the above in a panel dataset to learn more about how it works. I’ve noticed that a conditional logit with controls decomposed into positive and negative components has more observations, and apparently more individuals based on clustering at the id, then without. Surely no new individuals can be added to the analysis, so where do these new observations come from?

            Best,

            James

Leave a Reply

Your email address will not be published. Required fields are marked *