For several years now, I’ve been promoting something I called the “hybrid method” as a way of analyzing longitudinal and other forms of clustered data. My books Fixed Effects Regression Methods for Longitudinal Data Using SAS (2005) and Fixed Effects Regression Models (2009) both devoted quite a few pages to this methodology. However, recent research has led me to be a little more cautious about this approach, especially for logistic regression models.
Like other fixed effects methods, the hybrid method provides a way of controlling for all cluster-level covariates, whether observed or not. It is sometimes called the “poor man’s conditional likelihood” (Neuhaus and McCulloch 2006) or the “between-within (BW) method” (Sjölander et al. 2013). Actually, I like that second name so much that I’m going to start using it in all future teaching and writing.
First introduced by Neuhaus and Kalbfleisch (1998), the BW method embeds a fixed effects estimator within the framework of a random effects (mixed) model, thereby enabling one to reap several benefits of both approaches. It’s especially attractive for models like ordinal logistic regression or negative binomial regression where more standard fixed effects methods (like conditional likelihood) are not available.
I’ll review the BW method in a moment, but first let me tell you why I’m writing this post. Despite the many attractions of the method, I’ve long been bothered by the fact that there was no proof that it did what was claimed—that is, provide consistent (and, hence, approximately unbiased) estimates of the causal parameters of interest. For linear models, the BW method produces exactly the same results as the standard fixed effects method, so that’s reassuring. But for logistic regression, the BW estimates are not identical to those produced by conditional likelihood (the gold standard)—close, but no cigar.
I’m glad to report that there’s been some real progress on this front in recent years, but not all of it is encouraging. Goetgeluk and Vansteelandt (2008) proved that the BW method produces consistent estimates in the linear case—something that was already pretty apparent. However, they also showed that the BW method could yield biased estimates for some nonlinear models, including logistic and Poisson regression. Fortunately, they also observed that the biases were typically small in most practical settings.
Brumback et al. (2010) also showed by simulation that the BW method, when applied to binary logistic regression, could produce inconsistent estimates of the within-cluster causal parameters, although, again, the biases were small. But in a later paper, Brumback et al. (2012) presented a simulation in which the downward bias was as high as 45%.
So, does that mean that we should no longer use the BW method for logistic regression? I don’t think so. To explain that, I need to get a little more technical. Let Yij be a binary (0,1) dependent variable for the jth individual in the ith cluster. Let Xij be a column vector of predictor variables, again, for individual i in cluster j. (If you’re not comfortable with vectors, just think of X as a single variable). We postulate the model
logit(Pr(Yij = 1| Xij, αi)) = μ + βXij + αi (1)
where β is a row vector of coefficients, and αi is an unobserved “effect” of being in cluster i. It can be thought of as representing the combined effects of all cluster-level variables that are not included in the model.
How can we consistently estimate β? Well, we could estimate a generalized linear mixed model, treating α as a normally distributed random variable with mean 0 and constant variance. This can be done in SAS with PROC GLIMMIX or in Stata with the melogit command. But standard implementations assume that α is independent of X, which means that we aren’t really controlling for α as a potential confounder. If α is actually correlated with X, you’ll get biased estimates of β using this method, and the bias could be severe.
A second approach is to use conditional likelihood, which is known to produce consistent estimates of β even if α is correlated with X. You can do this with the clogit command in Stata or with PROC LOGISTIC using the STRATA statement.
A third approach is the BW method, which says to decompose X into a within-cluster component and a between-cluster component. Letting Xi be a vector of cluster-specific means (sorry to put the bar underneath), we estimate the following generalized linear mixed model
logit(Pr(Yij = 1| Xij, Xi, αi)) = μ + βW(Xij–Xi) + βBXi + αi (2)
using standard software like GLIMMIX or melogit.
It’s long been believed that the estimate of βW, because it depends only on within-cluster variation, controls for confounding with α and produces consistent estimates of β in equation (1). We now know that that’s not true, at least not in general. But there is one condition in which it is true. Specifically, it’s true when αi in (1) is a linear function of Xi, plus a random error that is independent of Xij (Brumback et al. 2010). It’s reasonable to suppose, then, that the performance of the BW method for logistic regression depends, at least in part, on how closely this linearity assumption is satisfied. I’ll make use of that idea shortly.
Why not just do conditional likelihood, which we know will give consistent estimates under more general conditions? First, there are lots of non-linear models for which conditional likelihood is just not possible. Those models include binary probit, binary complementary log-log, cumulative (ordered) logit, and negative binomial regression.
Second, even for binary logit, there are many attractions of the BW method that conditional likelihood can’t provide: random slopes, a test for confounding, the ability to include cluster-level covariates, and the ability to allow for higher-levels of clustering.
As previously mentioned, all the available evidence suggests that the bias of the BW method is small in most situations. The simulation by Brumback et al. that yielded 45% downward bias was very extreme: the cluster-level effect was highly correlated with the x’s, it had an dominating effect on the outcome, and it depended on the maximum of the x’s across individuals, not their mean. Also, there were only two individuals per cluster, which tends to magnify any other sources of bias.
Here’s a modest proposal for how to reduce any biases that may occur. First, it’s easy to show that the following equation is the exact equivalent of model (2):
logit(Pr(Yij = 1| Xij, Xi, αi)) = μ + βWXij + (βB–βW)Xi + αi (3)
This tells us that as long as the means are in the equation, you don’t have to express Xij as deviations from those means.
To check or allow for nonlinearity, I suggest adding polynomial functions of the means to equation (3), such as
logit(Pr(Yij = 1| Xij, Xi, αi)) = μ + βWXij + θ1Xi + θ2Xi2 +θ3Xi3 + αi (4)
You could even include other cluster-level functions of the Xijs, such as their standard deviation. If the additional terms are not statistically significant, and if the estimate of βW doesn’t change much, then you can feel more confident that bias is not an issue.
To sum up, the BW/hybrid method is not quite as good as I once thought, at least not for non-linear models. But it’s certainly better than doing conventional random effects/mixed models, and usually comes pretty close to the mark. And, lastly, there may be ways to improve it.
Allison, PD (2005) Fixed Effects Regression Methods for Longitudinal Data Using SAS. Cary, NC: The SAS Institute.
Allison, PD (2009) Fixed Effects Regression Models. Thousand Oaks, CA: Sage Publications.
Brumback BA, AB Dailey, LC Brumback, MD Livingston, Z He (2010) “Adjusting for confounding by cluster using generalized linear mixed models.” Statistics and Probability Letters 80:1650–1654.
Brumback BA, HW Zheng, and AB Dailey (2012) “Adjusting for confounding by neighborhood using generalized linear mixed models and complex survey data.” Statistics in Medicine 32: 1313–1324.
Goetgeluk, S and S Vansteelandt (2008) “Conditional generalized estimating equations for the analysis of clustered and longitudinal data.” Biometrics 64: 772-780.
Neuhaus, JM and JD Kalbfleisch (1998) “Between- and within-cluster covariate effects in the analysis of clustered data.” Biometrics 54: (2) 638-645.
Neuhaus, JM and CE McCulloch (2006) “Separating between- and within-cluster covariate effects by using conditional and partitioning methods.” Journal of the Royal Statistical Society Series B 68: 859–872.
Sjölander, A, P Lichtenstein, H Larsson and Y Pawitan (2013) “Between–within models for survival analysis.” Statistics in Medicine 32: 3067–3076