## Problems with the Hybrid Method

##### September 2, 2014 By Paul Allison

For several years now, I’ve been promoting something I called the “hybrid method” as a way of analyzing longitudinal and other forms of clustered data. My books *Fixed Effects Regression Methods for Longitudinal Data Using SAS *(2005) and *Fixed Effects Regression Models *(2009) both devoted quite a few pages to this methodology. However, recent research has led me to be a little more cautious about this approach, especially for logistic regression models.

Like other fixed effects methods, the hybrid method provides a way of controlling for *all* cluster-level covariates, whether observed or not. It is sometimes called the “poor man’s conditional likelihood” (Neuhaus and McCulloch 2006) or the “between-within (BW) method” (Sjölander et al. 2013). Actually, I like that second name so much that I’m going to start using it in all future teaching and writing.

First introduced by Neuhaus and Kalbfleisch (1998), the BW method embeds a fixed effects estimator within the framework of a random effects (mixed) model, thereby enabling one to reap several benefits of both approaches. It’s especially attractive for models like ordinal logistic regression or negative binomial regression where more standard fixed effects methods (like conditional likelihood) are not available.

I’ll review the BW method in a moment, but first let me tell you why I’m writing this post. Despite the many attractions of the method, I’ve long been bothered by the fact that there was no proof that it did what was claimed—that is, provide consistent (and, hence, approximately unbiased) estimates of the causal parameters of interest. For linear models, the BW method produces exactly the same results as the standard fixed effects method, so that’s reassuring. But for logistic regression, the BW estimates are not identical to those produced by conditional likelihood (the gold standard)—close, but no cigar.

I’m glad to report that there’s been some real progress on this front in recent years, but not all of it is encouraging. Goetgeluk and Vansteelandt (2008) proved that the BW method produces consistent estimates in the linear case—something that was already pretty apparent. However, they also showed that the BW method could yield *biased* estimates for some nonlinear models, including logistic and Poisson regression. Fortunately, they also observed that the biases were typically small in most practical settings.

Brumback et al. (2010) also showed by simulation that the BW method, when applied to binary logistic regression, could produce inconsistent estimates of the within-cluster causal parameters, although, again, the biases were small. But in a later paper, Brumback et al. (2012) presented a simulation in which the downward bias was as high as 45%.

So, does that mean that we should no longer use the BW method for logistic regression? I don’t think so. To explain that, I need to get a little more technical. Let *Y _{ij }*be a binary (0,1) dependent variable for the

*j*th individual in the

*i*th cluster. Let

*X*be a column vector of predictor variables, again, for individual

_{ij }*i*in cluster

*j*. (If you’re not comfortable with vectors, just think of

*X*as a single variable). We postulate the model

logit(Pr(*Y _{ij}* = 1|

*X*)) =

_{ij}, α_{i}*μ*+

*βX*(1)

_{ij}+ α_{i}where *β* is a row vector of coefficients, and *α _{i}* is an unobserved “effect” of being in cluster

*i*. It can be thought of as representing the combined effects of all cluster-level variables that are not included in the model.

How can we consistently estimate *β*? Well, we could estimate a generalized linear mixed model, treating *α* as a normally distributed random variable with mean 0 and constant variance. This can be done in SAS with PROC GLIMMIX or in Stata with the **melogit** command. But standard implementations assume that *α* is independent of *X*, which means that we aren’t really *controlling* for *α* as a potential confounder. If *α* is actually correlated with *X*, you’ll get biased estimates of *β *using this method, and the bias could be severe.

A second approach is to use conditional likelihood, which is known to produce consistent estimates of *β* even if *α* is correlated with *X*. You can do this with the **clogit** command in Stata or with PROC LOGISTIC using the STRATA statement.

A third approach is the BW method, which says to decompose *X* into a within-cluster component and a between-cluster component. Letting *X _{i}* be a vector of cluster-specific means (sorry to put the bar underneath), we estimate the following generalized linear mixed model

logit(Pr(*Y _{ij}* = 1|

*X*,

_{ij}*X*)) =

_{i}, α_{i}*μ + β*(

_{W}*X*) +

_{ij}–*X*_{i}*β*

_{B}*+*

*X*_{i }*α*(2)

_{i}using standard software like GLIMMIX or **melogit**.

It’s long been believed that the estimate of *β _{W}*, because it depends only on within-cluster variation, controls for confounding with

*α*and produces consistent estimates of

*β*in equation (1). We now know that that’s not true, at least not in general. But there is one condition in which it

*is*true. Specifically, it’s true when

*α*in (1) is a linear function of

_{i}*X*, plus a random error that is independent of

_{i}*X*(Brumback et al. 2010)

_{ij}_{. }It’s reasonable to suppose, then, that the performance of the BW method for logistic regression depends, at least in part, on how closely this linearity assumption is satisfied. I’ll make use of that idea shortly.

Why not just do conditional likelihood, which we know will give consistent estimates under more general conditions? First, there are lots of non-linear models for which conditional likelihood is just not possible. Those models include binary probit, binary complementary log-log, cumulative (ordered) logit, and negative binomial regression.

Second, even for binary logit, there are many attractions of the BW method that conditional likelihood can’t provide: random slopes, a test for confounding, the ability to include cluster-level covariates, and the ability to allow for higher-levels of clustering.

As previously mentioned, all the available evidence suggests that the bias of the BW method is small in most situations. The simulation by Brumback et al. that yielded 45% downward bias was very extreme: the cluster-level effect was highly correlated with the *x*’s, it had an dominating effect on the outcome, and it depended on the *maximum* of the *x*’s across individuals, not their mean. Also, there were only two individuals per cluster, which tends to magnify any other sources of bias.

Here’s a modest proposal for how to reduce any biases that may occur. First, it’s easy to show that the following equation is the exact equivalent of model (2):

logit(Pr(*Y _{ij}* = 1|

*X*,

_{ij}*X*)) =

_{i}, α_{i}*μ + β*

_{W}*X*+ (

_{ij}*β*–

_{B}*β*)

_{W}*+*

*X*_{i }*α*(3)

_{i}This tells us that as long as the means are in the equation, you don’t have to express *X _{ij}* as deviations from those means.

To check or allow for nonlinearity, I suggest adding polynomial functions of the means to equation (3), such as

logit(Pr(*Y _{ij}* = 1|

*X*,

_{ij}*X*)) =

_{i}, α_{i}*μ + β*

_{W}*X*+

_{ij}*θ*

_{1}

*+*

*X*_{i }*θ*

_{2}

*X*_{i}^{2}+

*θ*

_{3}

*X*_{i}^{3}+

*α*(4)

_{i }You could even include other cluster-level functions of the *X _{ij}*s, such as their standard deviation. If the additional terms are not statistically significant, and if the estimate of

*β*doesn’t change much, then you can feel more confident that bias is not an issue.

_{W}To sum up, the BW/hybrid method is not quite as good as I once thought, at least not for non-linear models. But it’s certainly better than doing conventional random effects/mixed models, and usually comes pretty close to the mark. And, lastly, there may be ways to improve it.

**References**

Allison, PD (2005) *Fixed Effects Regression Methods for Longitudinal Data Using SAS. *Cary, NC: The SAS Institute.

Allison, PD (2009) *Fixed Effects Regression Models*. Thousand Oaks, CA: Sage Publications.

Brumback BA, AB Dailey, LC Brumback, MD Livingston, Z He (2010) “Adjusting for confounding by cluster using generalized linear mixed models.” *Statistics and Probability Letters* 80:1650–1654.

Brumback BA, HW Zheng, and AB Dailey (2012) “Adjusting for confounding by neighborhood using generalized linear mixed models and complex survey data.” *Statistics in Medicine *32: 1313–1324.

Goetgeluk, S and S Vansteelandt (2008) “Conditional generalized estimating equations for the analysis of clustered and longitudinal data.” *Biometrics* 64: 772-780.

Neuhaus, JM and JD Kalbfleisch (1998) “Between- and within-cluster covariate effects in the analysis of clustered data.” *Biometrics *54: (2) 638-645.

Neuhaus, JM and CE McCulloch (2006) “Separating between- and within-cluster covariate effects by using conditional and partitioning methods.” *Journal of the Royal Statistical Society Series B* 68: 859–872.

Sjölander, A, P Lichtenstein, H Larsson and Y Pawitan (2013) “Between–within models for survival analysis.” *Statistics in Medicine* 32: 3067–3076

Thank you for an excellent post. In your Fixed Effects Regression Models book you indicate that the hybrid models in the context of survival outcomes did not work so well in some simulations you performed. What types of hybrid survival models did you evaluate? Was it similar to the models Sjölander et al. used in their paper, or was it different? Their model does seem to be somewhat atypical in how the between and within effects are decomposed (i.e., they simply use the group mean in the model and interpret that as the between effect).

Thanks,

Guy

Frankly, I don’t recall under what conditions the hybrid model didn’t work well for survival models. I haven’t had a chance to study the Sjolander paper carefully, but there’s no problem with just putting in the group mean, as I explain toward the end of the blog post. In that case, however, the coefficient for the group mean is the DIFFERENCE between the within and between effects.

It might not directly relate to your post. I read your SAGE green book on fixed effect model. I read the hybrid model as well from chapter 2. I understand that the trick is to make mean and sd of independent variables. But how if some of the independent variables are dummies (categorical variables) that vary within the panel?

Thank you

Dummy variables get treated just like any other variable–subtract the individual-specific mean to get a deviation score. Then include both the mean and deviation score in the model (but not the standard deviation). Despite this transformation, the interpretation of their coefficients is the same as if they were dummy variables.

Thank you Prof. Allison for your reply. I could apply the hybrid model on dummies now.

How about interaction effects? How to specify between-interaction effects and within-interaction effects? Thank you.

Interactions should be treated as just another variable. Let x and z be the two time-dependent variables. Before doing any transformations, create the product of x and z. Calculate the individual specific mean of xz–that’s your between component. Then subtract that mean from xz. That’s your within component.

What if you are interested in examining the interaction between a time-varying and a time-invariant predictor? Can you just include the cross-products in the model (e.g., deviationtimevarying x time-invariant meantimevarying x time-invariant)? Or do you similarly need to first create the product and then calculate the specific mean and deviation?

First create the product, then the means and deviations of that product.