For Causal Analysis of Competing Risks, Don’t Use Fine & Gray’s Subdistribution Method
March 24, 2018 By Paul Allison
Competing risks are common in the analysis of event time data. The classic example is death, with distinctions among different kinds of death: if you die of a heart attack, you can’t then die of cancer or suicide. But examples also abound in other fields. A marriage can end either by divorce or by the death of a spouse, but not both. If a loan is terminated by prepayment, it is no longer at risk of a default.
In this post, I argue that one of the more popular methods for regression analysis of competing risks—the analysis of subdistribution hazards, introduced by Fine and Gray (1999)—is not valid for causal inference. In fact, it can produce completely misleading results. Instead, I recommend the analysis of causespecific hazards, a longstanding and easily implemented method.
Let’s review that classic method first. The estimation of causespecific hazard functions (Kalbfleish and Prentice 2002) can be accomplished with standard methods for single kinds of events. You simply treat all competing events as though the individual were right censored at the time the competing event occurs. For example, if you want to study the effect of obesity on the risk of death due to heart disease, just estimate a Cox proportional hazards model in which all causes of death other than heart disease are treated as censoring. Or if you want to estimate the effect income on divorce, estimate a Cox model in which spousal death is treated as censoring.
By contrast, the method of Fine and Gray (1999) does not treat competing events in the same way as censored observations. Based on cumulative incidence functions, their method estimates a proportional hazards model for something they call the subdistribution hazard.
The definition of the subdistribution hazard is similar to that for a causespecific hazard, with one key difference: the causespecific hazard removes an individual from the risk set when any type of event occurs; the subdistribution hazard removes an individual from the risk set when an event of the focal type occurs or when the individual is truly censored. However, when a competing event occurs, the individual remains in the risk set. Fine and Gray acknowledge that this is “unnatural” because, in fact, those who experience competing events are no longer actually at risk of the focal event. But it’s necessary in order to get a model that correctly predicts cumulative incidence functions. (More on that later).
According to Google Scholar, the Fine and Gray paper has been cited more than 5,000 times. It is now widely available in most major software packages, including Stata (with the stcrreg command), SAS (with the EVENTCODE option in PROC PHREG) and R (with the ‘cmprsk’ package). In some fields, it has become the standard method for analyzing competing risks. In the minds of many researchers, it is the only proper way to analyze competing risks.
But there’s one big problem: the subdistribution method doesn’t isolate distinct causal effects on the competing risks. In fact, it confounds them in predictable and alarming ways. Specifically, any variable that increases the causespecific risk of event A will appear to decrease the subdistribution hazard for event B. Why? Because whenever a type A event occurs, it eliminates the possibility that a type B event will happen.
Here’s a simple simulation that demonstrates this phenomenon. I generated 10,000 observations on each of two event types, labeled A and B. Event times for type A were generated by a Weibull regression model (a parametric version of the proportional hazards model). The only predictor was variable X, which had a standard normal distribution.
Event times for type B were generated by an identical Weibull regression model in which the only predictor was variable Z, also standard normal. X and Z had a correlation of .50.
If event A occurred before event B, the event time for B was not observed. Similarly, if B occurred before A, the event time for A was not observed. Any event times greater than 8 were treated as right censored. SAS code for generating the data and performing the analysis is appended below.
In the resulting data set, there were 4350 type A events, 4277 type B events, and 1391 truly censored observations (neither A nor B was observed). Given the data generating process, any attempt to estimate causal parameters should find no effect of X on B and no effect of Z on A.
In Table 1 below, I show the results from applying the two different methods: causespecific Cox regression models with competing events treated as censoring; and subdistribution proportional hazards models. For type A, the causespecific estimate of .494 for the effect of X is close to the true value of .500 and highly significant. The coefficient of .008 for Z is close to the true value of 0 and far from statistically significant. Overall, pretty good performance.
The subdistribution estimates for type A, on the other hand, are clearly unsatisfactory. The coefficient for X appears to be biased downward by about 10%. The coefficient for Z (.216) is far from the true value of 0, and highly significant. Thus, the subdistribution estimates would lead one to conclude that an increase in Z reduced the risk of event A. What it actually did is reduce the likelihood that type A events would be observed because it increased the risk of event B.
The results for event B in the lower panel of Table 1 are the mirror image of those for event A. For both X and Z, the causespecific estimates are close to the true values. The subdistribution estimates are biased, and would lead to incorrect causal inferences.
Table 1. Results from Two Methods for Estimating Competing Risks Models, NonInformative Censoring.
CauseSpecific Hazards  Subdistribution Hazards  
Type A  Estimate  S.E.  p  Estimate  S.E.  p 
x  .490  .018  <.0001  .420  .018  <.0001 
z  .009  .018  .601  .214  .017  <.0001 
Type B 






x  .021  .018  .248  .187  .017  <.0001 
z  .488  .018  <.0001  .433  .018  <.001 
So why would anyone seriously consider the subdistribution method? Well, there’s one big problem with the causespecific hazards approach. Virtually all methods based on causespecific hazards implicitly assume that censoring is noninformative. Roughly speaking, that means that if an individual is censored at a particular point in time, that fact tells you nothing about the individual’s risk of the event.
If the censoring times are determined by the study design (as when all event times beyond a certain calendar time are censored), that’s not usually an issue. But if censoring times are not under the control of the investigator, the censoring may be informative. And that can lead to bias.
If competing risks are treated as a form of censoring, then we need to be concerned about whether that censoring is informative or noninformative. How might competing events be informative? If a spouse dies, does that tell us anything about the risk that the couple would have divorced? Maybe, but probably not. Does the fact that a person dies of heart disease tell us anything about that person’s risk of dying of cancer? Maybe so. That would definitely be the case if cancer and heart disease had common risk factors, and those factors were not included in the regression model.
Unfortunately, there’s no way to test the assumption that censoring is noninformative (Tsiatis 1975). And even if there were, there’s no good method available for estimating causal effects when censoring is informative.
By contrast, the subdistribution hazard method does not explicitly assume that competing risks are noninformative. And that has been one of its major attractions. However, as I now show, the subdistribution method does no better (and actually somewhat worse) than the causespecific method when competing events are informative.
In the previous simulation, the assumption of noninformative censoring was satisfied by the data generating process. To model informative censoring, I added an unobserved common risk factor to the regression equations for the two event times. This was simply a normally distributed random variable with a standard deviation of 2. This “random intercept” induced a correlation of .28 between the uncensored event times for type A and type B. I then reestimated the models, with results shown in Table 2 below.
The message of Table 2 is this: Yes, informative censoring leads to causespecific estimates that are biased. And unlike in the previous table, the causespecific estimates might lead one to conclude, incorrectly, that Z affects the risk for event A and that X affects the risk for event B. But the subdistribution estimates are also biased. And, with one minor exception, the biases are worse for the subdistribution method than for the causespecific method.
Table 2. Results from Two Methods for Estimating Competing Risks Models, Informative Censoring.
 CauseSpecific Hazards  Subdistribution Hazards  
Type A  Estimate  S.E.  p  Estimate  S.E.  p 
x  .347  .019  <.0001  .349  .019  <.0001 
z  .067  .019  .0005  .185  .019  <.0001 
Type B 






x  .102  .019  <.0001  .212  .019  <.0001 
z  .372  .019  <.0001  .368  .019  <.0001 
To repeat my earlier question: Why would anyone ever seriously consider using the subdistribution method? To be fair to Fine and Gray, they never claimed that subdistribution regression would accurately estimate causal parameters. Instead, they introduced the method as a way to model the impact of covariates on the cumulative incidence functions. These functions are often preferable to KaplanMeier estimates of the survivor function because they do a better job of describing the empirical distribution of events rather than some hypothetical distribution that would apply only in the absence of competing risks.
Cumulative incidence functions are particularly useful for prediction. Suppose you have a cohort of newly diagnosed cancer patients. Based on the experience of earlier patients, you want to predict in five years what percentage will have died of cancer, what percentage will have died of other causes, and what percentage will still be alive. Cumulative incidence functions will give you that information. KaplanMeier survivor functions will not.
The Fine and Gray method provides a way to introduce covariate information into those predictions, potentially making them more accurate for individual patients. It’s important to note, however, that one can also calculate cumulative incidence functions based on causespecific hazard functions. Most commercial packages for Cox regression don’t have that capability, but there are downloadable SAS macros that will accomplish the task.
In sum, the subdistribution method may be useful for generating predicted probabilities that individuals will be in particular states at particular times. But it is not useful for estimating causal effects of covariates on the risks that different kinds of events will occur. For that task, the analysis of causespecific hazards is the way to go. Unfortunately, both methods are vulnerable to competing events that are informative for each other. The only effective way to deal with that problem is to estimate causespecific hazard models that include common risk factors as covariates.
SAS Code for Simulation
data finegraytest; do i=1 to 10000; *Generate a common risk factor. To include it in the model, change 0 to 1; comrisk=0*rannor(0); *Generate x and z, bivariate standard normal with r=.5; x=rannor(0); z=.5*x + sqrt(1.25)*rannor(0); *Generate w with Weibull distribution depending on x; logw=2 + .75*x + 1.5*log(ranexp(0)) + 2*comrisk; w=exp(logw); *Generate y with Weibull distribution depending on z; logy=2 + .75*z + 1.5*log(ranexp(0)) + 2*comrisk; y=exp(logy); *Allow events to censor each other; if y>w then do; type=1; t=w; end; else if w>y then do; type=2; t=y; end; *Censor all event times at 8; if t>8 then do; type=0; t=8; end; output; end; run; proc freq; table type; run; /*Estimate causespecific hazard regressions */ *Model for type1 event; proc phreg data=finegraytest; model t*type(0 1)=x z; run; *Model for type2 event; proc phreg data=finegraytest; model t*type(0 2)=x z; run; /*Estimate subdistributions hazard regressions */ *Model for type1 event; proc phreg data=finegraytest; model t*type(0)=x z / eventcode=1; run; *Model for type2 event; proc phreg data=finegraytest; model t*type(0)=x z / eventcode=2; run;
Very clear and informative, as always. I have seen commonly use of Fine Gray in “causal questions” with weird results, even in man avenues (main stream journals). Then the authors tried to explain the unexpected results, without realising it is, actually, artificial.
Prof. Paul Allison, another concern is the use of Fine & Gray analysing randomised controlled trials. In my view, when you randomise an intervention, we are concerned about the causal effect of this intervention. Ok, predictions might be of interest in a different perspective in RCTs, but not for testing efficacy. However, the majority of trials use Fine&Gray… in this article discussed by Austin/Fine (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5347914/), which although discussed both methods, I had the feeling that they are suggesting subdistribution models for the primary outcome analysis. What do you think? Thank you.
Thanks for bringing this article to my attention. Yes, they are suggesting subdistribution models for RCTs. That can be useful if all you care about is the incidence of events. But as I argue in my post, a treatment can reduce the incidence of the target event by increasing the incidence of competing events. Thus, you can reduce the incidence of cancer deaths by increasing the incidence of deaths from heroin overdose. That doesn’t mean, however, that the treatment actually affects the causal mechanism that produces the target event. So I basically disagree with Austin & Fine.
Thank you for your reply, Prof Paul.
I agree that subdistribution hazards models provide some information for incidence or prediction, but all readers and investigators must be aware that artefacts may occur, or, about what actually the intervention is doing. This might be used in the future as a clear quote “That doesn’t mean, however, that the treatment actually affects the causal mechanism that produces the target event”.
Thank you,
Otavio
Useful related paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5489237/
Thanks. This is a good article.
Here is another article that suggests maybe you shouldn’t interpret any hazard ratio whatsoever causally (or etiologically or whatever other euphemism for “causally”): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3653612/
Good article. But there are potential problems with interpreting any estimand causally, even in a randomized experiment.
Dear Professor,
I would like to quote this discussion. Is there a reference for this?
thanks,
John
I haven’t published anything on this, but see the article linked in one of the comments above. Or you can just cite this blog post, as explained in http://www.easybib.com/guides/howtociteablogmlaapachicago/
Thanks Paul for a very informative article. I always felt that cumulative incidence was somewhat overused and misinterpreted. I am interested that you say that the data generation mechanism in your example leads to uniformative censoring. As an aside, my intuition is that the censoring would be informative. Z predicts for event B, and Z is correlated with X wich predicts for event A, thus patients who experience event B (censoring event) are also at higher risk of event A. I created a simulator in R which I think supports this. Happy to share it with you.
Thanks!
The assumption that competing risks are noninformative is conditional on the covariates. If you don’t control for covariates, the competing risks may very well be informative. That’s why the most important way to increase the plausibility of the assumption is to include common causes of all event types.
Nice post! For competing events, could I use the cause specific cox model with stablized inverse probability weighting (SIPW) for competing events?Basically, a logistic model was performed on competing events with the same covariates in Cox model. The predicted probability was computed.
SIPW = overall probability of competing events / predicted probability
Do you think this will somewhat attenuate the bias due to informative censoring from competing events?
This is an interesting idea, but I have no idea if it would have any benefit. Have you seen any literature on this?