Asymmetric Fixed Effects Models for Panel Data

October 12, 2018

Standard methods for the analysis of panel data depend on an assumption of directional symmetry that most researchers don’t even think about. Specifically, these methods assume that if a one-unit increase in variable X produces a change of B units in variable Y, then a one-unit decrease in X will result in a change of –B units in Y.

Does that make sense? Probably not for most applications. For example, is it plausible that the increase in happiness when a person gets married is exactly matched by the decrease in happiness when a person gets divorced? Or that a $10K increase in income has the same effect on savings as a $10K decrease (in the opposite direction).

LEARN MORE IN A SEMINAR WITH PAUL ALLISON

In this post, I’m going to show you how to relax that assumption. I’ll do it for the simplest situation where there are only two time points. But I’ve also written a more detailed paper covering the multi-period situation.

Here’s the example with two-period data. The data set has 581 children who were studied in 1990 and 1992 as part of the National Longitudinal Survey of Youth. I’ll use three variables that were measured at each of the two time points:

anti     antisocial behavior, measured with a scale from 0 to 6.
self      self-esteem, measured with a scale ranging from 6 to 24.
pov      poverty status of family, coded 1 for family in poverty, otherwise 0.

You can download the data here.

The goal is to estimate the causal effects of self and pov on anti. I’ll focus on fixed effects methods (Allison 2005, 2009) because they are ideal for studying the effects of increases or decreases over time. They also have the remarkable ability to control for all time-invariant confounders.

For two-period data, there are several equivalent ways to estimate a fixed effects model. The difference score method is the one that’s the most straightforward for allowing directional asymmetry. It works like this: for each variable, subtract the time 1 value from the time 2 value to create a difference score. Then, just estimate an ordinary linear regression with the difference scores.

Here is Stata code for a standard symmetrical model:

use nlsy.dta, clear
generate antidiff=anti92-anti90
generate selfdiff=self92-self90
generate povdiff=pov92-pov90
regress antidiff selfdiff povdiff

You’ll find equivalent SAS code at the end of this post.

And here are the results:

------------------------------------------------------------------------------
    antidiff |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    selfdiff |  -.0391292   .0136396    -2.87   0.004    -.0659185     -.01234
     povdiff |   .1969039   .1326352     1.48   0.138    -.0636018    .4574096
       _cons |   .0403031   .0533833     0.75   0.451    -.0645458    .1451521
------------------------------------------------------------------------------

Self-esteem has a highly significant negative effect on antisocial behavior. Specifically, for each 1-unit increase in self-esteem, antisocial behavior goes down by .039 units. But that also means that for each 1-unit decrease in self-esteem, antisocial behavior goes up by .039 units. Poverty has a positive (but non-significant) effect on self-esteem. Children who move into poverty have an estimated increase in antisocial behavior of .112. But children who move out of poverty have an estimated decrease antisocial behavior of .112.

How can we relax the constraint that these effects have to be the same in both directions? York and Light (2017) showed the way. What’s needed is to decompose the difference score for each predictor variable into its positive and negative components. Specifically, if D is a difference score for variable X, create a new variable D⁺ which equals D if D is greater than 0, otherwise 0. And create a second variable D^– which equals –D if D is less than 0, otherwise 0.

Here’s how to create these variables in Stata:

generate selfpos=selfdiff*(selfdiff>0)
generate selfneg=-selfdiff*(selfdiff<0)
generate povpos=povdiff*(povdiff>0)
generate povneg=-povdiff*(povdiff<0)

The inequalities in parentheses are logical expressions that have a value of 1 if the inequality is true and 0 if the inequality is false.

Now just regress antidiff on all four of these variables,

regress antidiff selfpos selfneg povpos povneg

which produces the following table:

--------------------------------------------------------------------------
antidiff |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------+----------------------------------------------------------------
 selfpos |  -.0048386   .0251504    -0.19   0.848    -.0542362    .0445589
 selfneg |   .0743077    .025658     2.90   0.004      .023913    .1247024
  povpos |   .2502064   .2003789     1.25   0.212     -.143356    .6437688
  povneg |   -.126328   .1923669    -0.66   0.512    -.5041541    .2514981
   _cons |  -.0749517    .086383    -0.87   0.386    -.2446157    .0947123

--------------------------------------------------------------------------

What does this tell us? A 1-unit increase in self-esteem lowers antisocial behavior by .005 units (an effect that is far from statistically significant). A 1-unit decrease in self-esteem increases antisocial behavior by .074 units (highly significant). So it looks like decreases in self-esteem have a big effect, while increases have little impact. Note that the original estimate of -.039 was about midway between these two estimates.

Are the two effects significantly different? We can test that with the command

test selfpos=-selfneg

which yields a p-value of .10, not quite statistically significant.

Neither of the two poverty coefficients is statistically significant, although they are in the expected direction: moving into poverty increases antisocial behavior while moving out of poverty reduces it, but by about half the magnitude. These two effects are definitely not significantly different.

So that’s basically it for two-period data. When there are three or more periods, you have to create multiple records for each individual. Each record contains difference scores for adjacent periods. When estimating the regression model, you need to allow for negative correlations between adjacent records by using generalized least squares.

I discuss all these options in an article that can be found here. That paper also presents a data generating model that justifies the asymmetric first difference method. The data generating model can be extended to allow for the estimation of asymmetric logistic regression models, which can’t be estimated with difference scores.

If you want to learn more about fixed effects methods, see my two books on this topic: Fixed Effects Regression Models and Fixed Effects Regression Methods for Longitudinal Data Using SAS.

References:

Allison, Paul D. Fixed effects regression models. Vol. 160. SAGE publications, 2009.
Allison, Paul D. Fixed effects regression methods for longitudinal data using SAS. SAS Institute, 2005.
York, Richard, and Ryan Light. “Directional asymmetry in sociological analyses.” Socius 3 (2017): 1-13.


SAS Program

/* The NLSY data set can be downloaded at statisticalhorizons.com/resources/data-sets */
data nlsydiff;
 set my.nlsy;
 antidiff=anti2-anti1;
 selfdiff=self2-self1;
 povdiff=pov2-pov1;
proc reg data=nlsydiff;
  model antidiff=selfdiff povdiff;
run;
data nlsydiff;
 set nlsydiff;
 selfpos=selfdiff*(selfdiff>0);
 selfneg=-selfdiff*(selfdiff<0);
 povpos=povdiff*(povdiff>0);
 povneg=-povdiff*(povdiff<0);
proc reg data=nlsydiff;
  model antidiff=selfpos selfneg povpos povneg;
  test selfpos=-selfneg;
  test povpos=-povneg;
run;

Ethan says:

April 27, 2025 at 2:58 pm

Seven years have passed. Can we estimate your asymmetric FE model on multiperiod data (outcome is continuous) in Stata now? In your Socius paper’s supplementary materials, you said Table 8 (showing the output of the asymmetric FE model on multiperiod data) cannot be estimated using Stata. Thanks!

Reply
1. Paul Allison says:
  
  April 28, 2025 at 12:24 pm
  
  You can certainly estimate an asymmetric FE model for multiperiod data with any recent version of Stata. It’s just that in the version I was using at that time, you could not impose the constraint that would make the correlation between adjacent error terms equal to -0.5. The reason that constraint was desirable was to make the results for symmetric models exactly equal to conventional FE. But there’s nothing wrong with leaving those error correlations unconstrained. You just might lose some efficiency.
  
  Is it possible to impose those constraints in the latest version of Stata? I don’t know. I checked the documentation for the mixed command in Stata 19.5, but I didn’t see any obvious way to do it. You might want to check with the people at Stata.
  
  Reply
Jessica Mongilio says:

December 16, 2024 at 8:05 pm

Hi Paul,

Thank you for the explanation about asymmetric fixed-effects; it’s something I had recently been thinking about but didn’t know there was already published literature on! I do have a question about the difference variables: Is there any reason that a categorical variable could not be used to indicate 0) negative change, 1) no change, and 2) positive change? My concern is that the two binary indicators reference “all else,” but that might not be the comparison of interest (i.e., negative change vs either no change or positive change), while a categorical variable would allow the for a comparison of either negative change vs no change or positive change vs no change, for example. Does this method still allow for asymmetry or must the predictors be split into binary indicators of change?

Thanks, Jessica

Reply
1. Paul Allison says:
  
  December 27, 2024 at 3:55 pm
  
  I don’t understand the difference between what you’re proposing and what my method implies. In any case, you do need two binary variables. Yes, they will each be comparisons with no change, but you can also test and compare the coefficients for those two variables.
  
  Reply
Tomi says:

November 8, 2024 at 12:32 pm

Hi,

I was wondering how the asymmetric models generalize to other nonlinear cases like poisson or negbin models (using Stata)? And how would one proceed with ordinal dependent variable?

Best,

Tomi

Reply
1. Paul Allison says:
  
  November 10, 2024 at 9:57 pm
  
  In my original paper there’s a section called “An Asymmetric Logistic Model for Multiperiod Dichotomous Data”. A similar aproach would be used for Poisson, using a conditional Poisson model for the fixed effects. And, in principle, it could also be used for negative binomial or ordered logit. The problem is that you first need a method for estimating a conditional fixed effects model for these kinds of outcomes. Neither ordered logit nor negative binomial have a conditional likelihood. For negative binomial, I’ve shown that just using dummy variables for the fixed effects will work well. And here’s a reference for how to do it for ordered logit.
  
  Reply
  1. Tomi says:
    
    November 11, 2024 at 11:08 am
    
    Thank you for your reply. I would like to ask an additional question. If I would like to use only asymmetric first-difference model (not the equivalent of the mean deviation etc. model) and focus only on the immediate effects, how should I proceed when dependent variable is binary or ordinal? As you stated in the paper a differenced binary variable will produce a three category variables (-1, 0, 1) so a logistic model would not be suitable. In case of binary original variable (-1,0,1 differeced) would a ordinal or multinomial model work? And if so, would the model(s) be like (using Stata): ologit povdiff selfpos selfneg antipos antineg OR mlogit povdiff selfpos selfneg antipos antineg? For ordinal variable i would assume a ordinal logistic would be suitable?
    
    Best,
    
    Tomi
    
    Reply
    1. Paul Allison says:
      
      November 18, 2024 at 1:50 pm
      
      To the best of my knowledge, there’s no good way to do this by differencing the outcome. You need to cumulate the positive and negative changes on the predictors, as explained in my article.
      
      Reply
Samuel says:

January 26, 2024 at 6:19 pm

Dear Paul: Not sure you’ll see this years after the initial post, but I found your entry as I was looking for literature on a slightly related “asymmetry” scenario. In current research, I have a key variable that reports a level 2 Gini Coefficient for all population increases observed across level 1 units from T1 to T2. I’d like to run a FE model across my level 2 units using the Gini Coefficient as a difference score. Specifically, the T2 values would be the observed Gini Coefficient, while the T1 values would all be defined as 0.

In some sense, the “asymmetry” here is similar to what you write because all values are constrained to be greater than zero. But I was wondering if you would recommend against this approach for any other obvious reasons, or if you could point to relevant literature/discussions. I’d be grateful. Many thanks in advance,

Samuel

Reply
1. Paul Allison says:
  
  January 29, 2024 at 2:14 pm
  
  Hi Samuel:
  
  If all T1 values are defined as 0, then you don’t really have a difference score and fixed effects doesn’t make sense.
  On the other hand sounds it like your Gini coefficient is itself defined on changes from T1 to T2. So it might be reasonable to regress the T2 Gini on difference scores for the predictors (although that wouldn’t have the mathematical foundation of the classic FE model). And, of course, that’s effectively what you’re proposing.
  
  Reply
  1. Samuel says:
    
    February 8, 2024 at 7:56 pm
    
    Thank you, Paul! Very helpful and much appreciated.
    
    Reply
Jonathan says:

February 25, 2021 at 12:02 pm

When I read the paper you mentioned in this blog I was confused as to why the cumulative effect of x on y is considered when studying more than 2 periods, what is the benefit of studying the cumulative number of positive changes and the cumulative number of negative changes rather than just this over just decomposing the difference score for each predictor variable into its positive and negative components and using these in the analysis, as you do in this blog?

Best,

Jonathan

Reply
1. Paul Allison says:
  
  February 25, 2021 at 12:10 pm
  
  In the blog, I worked with difference scores as the outcome. In that case, you can just decompose the predictor into positive and negative changes, and you don’t have to worry about cumulating. But if you work with the original variable as your outcome, it turns out that cumulation is necessary to get the same answer as the difference score method. That’s because the level of a variable at time t depends on the past history of its changes in one direction or the other. When you use difference scores, that history cancels out.
  
  There are two principle advantages to working with levels rather than differences: 1. You may lose fewer cases when there is missing data, and 2. the model can be easily extended to categorical dependent variables.
  
  Reply
  1. Jonathan says:
    
    February 25, 2021 at 2:19 pm
    
    Thank you for your response.
    
    I’m still not sure I understand.
    
    I take an example from your data:
    
    /*
    
    id year spousediff spouse spousepos spouseneg spousecumpos spousecumneg
    906 1 0 0 0 0 0 0
    906 2 1 1 1 0 1 0
    906 3-1 0 0 1 1 1
    906 4 0 0 0 0 1 1
    906 5 1 1 1 0 2 1
    
    */
    
    clogit pov mother spousecumpos spousecumneg inschoolcumpos ///
    inschoolcumneg hourscumpos hourscumneg i.year, group(id) robust
    
    My question is:
    
    Is the cumulative approach taken so that the coefficient on spousecumpos will reflect the consequences of all the unique changes in the predictor from 0 -> 1 on the probability of the outcome being true? So, this ensures the coefficient reports a cumulative effect of the predictor on the outcome that is a consequence of all the unique times the predictor went from not true to true and doesn’t accidently double count those times it was still true from an earlier period?
    
    Thank you
    
    Reply
    1. Paul Allison says:
      
      February 25, 2021 at 3:21 pm
      
      The cumulative approach is taken so that, for any time t, we have a count of the number of previous occasions on which a person has changed from being unmarried to married, and a count of the number of previous occasions on which a person has changed from being married to unmarried. The outcome Y(t) is assumed to be a function of those two counts. Thus, each marriage is presumed to have changed Y(t) by b units, and each marital dissolution is presumed to have changed Y(t) by c units (most likely in the opposite direction).
      
      Reply
      1. Jonathan says:
        
        February 25, 2021 at 3:28 pm
        
        Thank you,
        
        However, I wonder though, how applicable this is to 3 period data with binary predictors?
        
        For t=1, in which case Xit-1 is not observed, Xit+ and Xit- are both set to 0. Thus if an individual can only change once to positive and once to negative, what is the relevance of the cumulative approach here?
        
        i.e. in a fixed effects logistic regression they will only have time to go from 0 -> 1 -> 0 or 0 -> 1 -> 1.
        
        “Thus, Z+ is the accumulation up to time t of all previous positive changes in X, and Z– is the accumulation of all previous negative changes in X. When X is a dummy variable, Z+ is just the number of previous changes from 0 to 1, and Z– is the number of previous changes from 1 to 0. For example, Z+ might be the number of previous marriages, and Z– might be the number of previous divorces.”
      2. Paul Allison says:
        
        March 1, 2021 at 8:51 am
        
        What you say is correct, but it still all works out.
James Moore says:

October 1, 2020 at 3:17 pm

A question.

If the goal was just to estimate the causal effects of self on anti. So I include pov as a control but have no interest in it as a predictor, do I still have to do all of the above for both self and pov or can I just create the following:

use nlsy.dta, clear
generate antidiff=anti92-anti90
generate selfdiff=self92-self90
generate povdiff=pov92-pov90

regress antidiff selfdiff povdiff

generate selfpos=selfdiff*(selfdiff>0)
generate selfneg=-selfdiff*(selfdiff<0)

regress antidiff selfpos selfneg povdiff

and then interpret the coefficient on selfpos and selfneg in terms of how it relates to selfdiff from earlier and only consider povdiff as a control?

Put simply, must I generate positives and negatives for every control if I'm only interested in asymmetric effects for my coefficient of interest (here self)?

Thanks, James

Apologies to post twice, I entered old email information the first time.

Reply
1. Paul Allison says:
  
  October 2, 2020 at 7:52 am
  
  If variables are only entered as controls, there’s no requirement to decompose them into positive and negative components. However, if those controls really do have different effects in different directions, decomposing their effects could improve their performance as controls.
  
  Reply
  1. James Moore says:
    
    October 6, 2020 at 2:51 pm
    
    Hi Paul,
    
    Thank you for your response. I was also interested in your use of a Wald test. When, as you demonstrate for self-esteem and poverty, the two effects are not significantly different for the positive and negative components of the asymmetric model, does this provide evidence to support use instead of the standard symmetric model?
    
    Thanks,
    
    James
    
    Reply
    1. Paul Allison says:
      
      October 7, 2020 at 12:16 pm
      
      Yes, it does.
      
      Reply
      1. James Moore says:
        
        October 14, 2020 at 10:27 am
        
        Thank you.
        
        A final question, why would the above method cause the number of observations to increase after generating positives and negatives for controls? I’ve applied the above in a panel dataset to learn more about how it works. I’ve noticed that a conditional logit with controls decomposed into positive and negative components has more observations, and apparently more individuals based on clustering at the id, then without. Surely no new individuals can be added to the analysis, so where do these new observations come from?
        
        Best,
        
        James
      2. Paul Allison says:
        
        October 19, 2020 at 8:17 am
        
        I can’t think of any reason why this would happen.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Asymmetric Fixed Effects Models for Panel Data

Comments

Leave a Reply Cancel reply