Competing risks are common in the analysis of event time data. The classic example is death, with distinctions among different kinds of death: if you die of a heart attack, you can’t then die of cancer or suicide. But examples also abound in other fields. A marriage can end either by divorce or by the death of a spouse, but not both. If a loan is terminated by pre-payment, it is no longer at risk of a default.

In this post, I argue that one of the more popular methods for regression analysis of competing risks—the analysis of subdistribution hazards, introduced by Fine and Gray (1999)—is not valid for causal inference. In fact, it can produce completely misleading results. Instead, I recommend the analysis of cause-specific hazards, a long-standing and easily implemented method.

LEARN MORE IN A SEMINAR WITH PAUL ALLISON

Let’s review that classic method first. The estimation of cause-specific hazard functions (Kalbfleish and Prentice 2002) can be accomplished with standard methods for single kinds of events. You simply treat all competing events as though the individual were right censored at the time the competing event occurs. For example, if you want to study the effect of obesity on the risk of death due to heart disease, just estimate a Cox proportional hazards model in which all causes of death other than heart disease are treated as censoring. Or if you want to estimate the effect income on divorce, estimate a Cox model in which spousal death is treated as censoring.

By contrast, the method of Fine and Gray (1999) does not treat competing events in the same way as censored observations. Based on cumulative incidence functions, their method estimates a proportional hazards model for something they call the subdistribution hazard.

The definition of the subdistribution hazard is similar to that for a cause-specific hazard, with one key difference: the cause-specific hazard removes an individual from the risk set when any type of event occurs; the subdistribution hazard removes an individual from the risk set when an event of the focal type occurs or when the individual is truly censored. However, when a *competing* event occurs, the individual remains in the risk set. Fine and Gray acknowledge that this is “unnatural” because, in fact, those who experience competing events are no longer actually at risk of the focal event. But it’s necessary in order to get a model that correctly predicts cumulative incidence functions. (More on that later).

According to Google Scholar, the Fine and Gray paper has been cited more than 5,000 times. It is now widely available in most major software packages, including Stata (with the **stcrreg** command), SAS (with the EVENTCODE option in PROC PHREG) and R (with the ‘cmprsk’ package). In some fields, it has become the standard method for analyzing competing risks. In the minds of many researchers, it is the *only* proper way to analyze competing risks.

But there’s one big problem: the subdistribution method doesn’t isolate distinct causal effects on the competing risks. In fact, it confounds them in predictable and alarming ways. Specifically, any variable that increases the cause-specific risk of event A will appear to *decrease* the subdistribution hazard for event B. Why? Because whenever a type A event occurs, it eliminates the possibility that a type B event will happen.

Here’s a simple simulation that demonstrates this phenomenon. I generated 10,000 observations on each of two event types, labeled A and B. Event times for type A were generated by a Weibull regression model (a parametric version of the proportional hazards model). The only predictor was variable *X*, which had a standard normal distribution.

Event times for type B were generated by an identical Weibull regression model in which the only predictor was variable *Z*, also standard normal. *X* and *Z* had a correlation of .50.

If event A occurred before event B, the event time for B was not observed. Similarly, if B occurred before A, the event time for A was not observed. Any event times greater than 8 were treated as right censored. SAS code for generating the data and performing the analysis is appended below.

In the resulting data set, there were 4350 type A events, 4277 type B events, and 1391 truly censored observations (neither A nor B was observed). Given the data generating process, any attempt to estimate causal parameters should find no effect of *X* on B and no effect of *Z* on A.

In Table 1 below, I show the results from applying the two different methods: cause-specific Cox regression models with competing events treated as censoring; and subdistribution proportional hazards models. For type A, the cause-specific estimate of .494 for the effect of *X* is close to the true value of .500 and highly significant. The coefficient of -.008 for *Z* is close to the true value of 0 and far from statistically significant. Overall, pretty good performance.

The subdistribution estimates for type A, on the other hand, are clearly unsatisfactory. The coefficient for *X* appears to be biased downward by about 10%. The coefficient for *Z* (-.216) is far from the true value of 0, and highly significant. Thus, the subdistribution estimates would lead one to conclude that an increase in *Z* reduced the risk of event A. What it actually did is reduce the likelihood that type A events would be *observed* because it increased the risk of event B.

The results for event B in the lower panel of Table 1 are the mirror image of those for event A. For both *X* and *Z*, the cause-specific estimates are close to the true values. The subdistribution estimates are biased, and would lead to incorrect causal inferences.

**Table 1. Results from Two Methods for Estimating Competing Risks Models, Non-Informative Censoring.**

Cause-Specific Hazards | Subdistribution Hazards | |||||

Type A | Estimate | S.E. | p | Estimate | S.E. | p |

x |
.490 | .018 | <.0001 | .420 | .018 | <.0001 |

z |
-.009 | .018 | .601 | -.214 | .017 | <.0001 |

Type B | ||||||

x |
.021 | .018 | .248 | -.187 | .017 | <.0001 |

z |
.488 | .018 | <.0001 | .433 | .018 | <.001 |

So why would anyone seriously consider the subdistribution method? Well, there’s one big problem with the cause-specific hazards approach. Virtually all methods based on cause-specific hazards implicitly assume that censoring is non-informative. Roughly speaking, that means that if an individual is censored at a particular point in time, that fact tells you nothing about the individual’s risk of the event.

If the censoring times are determined by the study design (as when all event times beyond a certain calendar time are censored), that’s not usually an issue. But if censoring times are not under the control of the investigator, the censoring may be informative. And that can lead to bias.

If competing risks are treated as a form of censoring, then we need to be concerned about whether that censoring is informative or non-informative. How might competing events be informative? If a spouse dies, does that tell us anything about the risk that the couple would have divorced? Maybe, but probably not. Does the fact that a person dies of heart disease tell us anything about that person’s risk of dying of cancer? Maybe so. That would definitely be the case if cancer and heart disease had common risk factors, and those factors were not included in the regression model.

Unfortunately, there’s no way to test the assumption that censoring is non-informative (Tsiatis 1975). And even if there were, there’s no good method available for estimating causal effects when censoring is informative.

By contrast, the subdistribution hazard method does not explicitly assume that competing risks are non-informative. And that has been one of its major attractions. However, as I now show, the subdistribution method does no better (and actually somewhat worse) than the cause-specific method when competing events are informative.

In the previous simulation, the assumption of non-informative censoring was satisfied by the data generating process. To model informative censoring, I added an unobserved common risk factor to the regression equations for the two event times. This was simply a normally distributed random variable with a standard deviation of 2. This “random intercept” induced a correlation of .28 between the uncensored event times for type A and type B. I then re-estimated the models, with results shown in Table 2 below.

The message of Table 2 is this: Yes, informative censoring leads to cause-specific estimates that are biased. And unlike in the previous table, the cause-specific estimates might lead one to conclude, incorrectly, that *Z *affects the risk for event A and that X* *affects the risk for event B. But the subdistribution estimates are also biased. And, with one minor exception, the biases are worse for the subdistribution method than for the cause-specific method.

**Table 2. Results from Two Methods for Estimating Competing Risks Models, Informative Censoring.**

Cause-Specific Hazards | Subdistribution Hazards | |||||

Type A | Estimate | S.E. | p | Estimate | S.E. | p |

x |
.347 | .019 | <.0001 | .349 | .019 | <.0001 |

z |
-.067 | .019 | .0005 | -.185 | .019 | <.0001 |

Type B | ||||||

x |
-.102 | .019 | <.0001 | -.212 | .019 | <.0001 |

z |
.372 | .019 | <.0001 | .368 | .019 | <.0001 |

To repeat my earlier question: Why would anyone ever seriously consider using the subdistribution method? To be fair to Fine and Gray, they never claimed that subdistribution regression would accurately estimate causal parameters. Instead, they introduced the method as a way to model the impact of covariates on the cumulative incidence functions. These functions are often preferable to Kaplan-Meier estimates of the survivor function because they do a better job of describing the empirical distribution of events rather than some hypothetical distribution that would apply only in the absence of competing risks.

Cumulative incidence functions are particularly useful for prediction. Suppose you have a cohort of newly diagnosed cancer patients. Based on the experience of earlier patients, you want to predict in five years what percentage will have died of cancer, what percentage will have died of other causes, and what percentage will still be alive. Cumulative incidence functions will give you that information. Kaplan-Meier survivor functions will not.

The Fine and Gray method provides a way to introduce covariate information into those predictions, potentially making them more accurate for individual patients. It’s important to note, however, that one can also calculate cumulative incidence functions based on cause-specific hazard functions. Most commercial packages for Cox regression don’t have that capability, but there are downloadable SAS macros that will accomplish the task.

In sum, the subdistribution method may be useful for generating predicted probabilities that individuals will be in particular states at particular times. But it is not useful for estimating causal effects of covariates on the risks that different kinds of events will occur. For that task, the analysis of cause-specific hazards is the way to go. Unfortunately, both methods are vulnerable to competing events that are informative for each other. The only effective way to deal with that problem is to estimate cause-specific hazard models that include common risk factors as covariates.

__SAS Code for Simulation__

data finegraytest; do i=1 to 10000; *Generate a common risk factor. To include it in the model, change 0 to 1; comrisk=0*rannor(0); *Generate x and z, bivariate standard normal with r=.5; x=rannor(0); z=.5*x + sqrt(1-.25)*rannor(0); *Generate w with Weibull distribution depending on x; logw=2 + -.75*x + 1.5*log(ranexp(0)) + 2*comrisk; w=exp(logw); *Generate y with Weibull distribution depending on z; logy=2 + -.75*z + 1.5*log(ranexp(0)) + 2*comrisk; y=exp(logy); *Allow events to censor each other; if y>w then do; type=1; t=w; end; else if w>y then do; type=2; t=y; end; *Censor all event times at 8; if t>8 then do; type=0; t=8; end; output; end; run; proc freq; table type; run; /*Estimate cause-specific hazard regressions */ *Model for type1 event; proc phreg data=finegraytest; model t*type(0 2)=x z; run; *Model for type2 event; proc phreg data=finegraytest; model t*type(0 1)=x z; run; /*Estimate subdistributions hazard regressions */ *Model for type1 event; proc phreg data=finegraytest; model t*type(0)=x z / eventcode=1; run; *Model for type2 event; proc phreg data=finegraytest; model t*type(0)=x z / eventcode=2; run;

## Comments

Very clear and informative, as always. I have seen commonly use of Fine Gray in “causal questions” with weird results, even in man avenues (main stream journals). Then the authors tried to explain the unexpected results, without realising it is, actually, artificial.

Prof. Paul Allison, another concern is the use of Fine & Gray analysing randomised controlled trials. In my view, when you randomise an intervention, we are concerned about the causal effect of this intervention. Ok, predictions might be of interest in a different perspective in RCTs, but not for testing efficacy. However, the majority of trials use Fine&Gray… in this article discussed by Austin/Fine (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5347914/), which although discussed both methods, I had the feeling that they are suggesting sub-distribution models for the primary outcome analysis. What do you think? Thank you.

Thanks for bringing this article to my attention. Yes, they are suggesting subdistribution models for RCTs. That can be useful if all you care about is the incidence of events. But as I argue in my post, a treatment can reduce the incidence of the target event by increasing the incidence of competing events. Thus, you can reduce the incidence of cancer deaths by increasing the incidence of deaths from heroin overdose. That doesn’t mean, however, that the treatment actually affects the causal mechanism that produces the target event. So I basically disagree with Austin & Fine.

Thank you for your reply, Prof Paul.

I agree that subdistribution hazards models provide some information for incidence or prediction, but all readers and investigators must be aware that artefacts may occur, or, about what actually the intervention is doing. This might be used in the future as a clear quote “That doesn’t mean, however, that the treatment actually affects the causal mechanism that produces the target event”.

Thank you,

Otavio

Useful related paper: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5489237/

Thanks. This is a good article.

Here is another article that suggests maybe you shouldn’t interpret any hazard ratio whatsoever causally (or etiologically or whatever other euphemism for “causally”): https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3653612/

Good article. But there are potential problems with interpreting any estimand causally, even in a randomized experiment.

Dear Professor,

I would like to quote this discussion. Is there a reference for this?

thanks,

John

I haven’t published anything on this, but see the article linked in one of the comments above. Or you can just cite this blog post, as explained in http://www.easybib.com/guides/how-to-cite-a-blog-mla-apa-chicago/

Thanks Paul for a very informative article. I always felt that cumulative incidence was somewhat overused and misinterpreted. I am interested that you say that the data generation mechanism in your example leads to uniformative censoring. As an aside, my intuition is that the censoring would be informative. Z predicts for event B, and Z is correlated with X wich predicts for event A, thus patients who experience event B (censoring event) are also at higher risk of event A. I created a simulator in R which I think supports this. Happy to share it with you.

Thanks!

The assumption that competing risks are non-informative is conditional on the covariates. If you don’t control for covariates, the competing risks may very well be informative. That’s why the most important way to increase the plausibility of the assumption is to include common causes of all event types.

Nice post! For competing events, could I use the cause specific cox model with stablized inverse probability weighting (SIPW) for competing events?Basically, a logistic model was performed on competing events with the same covariates in Cox model. The predicted probability was computed.

SIPW = overall probability of competing events / predicted probability

Do you think this will somewhat attenuate the bias due to informative censoring from competing events?

This is an interesting idea, but I have no idea if it would have any benefit. Have you seen any literature on this?

Dr. Allison, I read your post with strong interest. I wonder, do you have source code for cause-specific models in Stata or R? Another question I wonder is what if the two events are not competing exactly–say diabetes and all-cause mortality? Since you can pass through the death state with or without diabetes is this considered a competing risk framework?

Well, all cause mortality is definitely a competing risk for diabetes. If you die, you can’t then get diabetes. The implication is that when modeling the event of a diabetes diagnosis, you need to be concerned about the possibility that prior death may constitute informative censoring. That is, those who die may have been at higher risk of developing diabetes that those who did not die. The best way to deal with this possibility is to include in the model common risk factors for death and diabetes. On the other hand, diabetes is not a competing risk for death. In modeling death, you would probably want to include diabetes as a time-dependent covariate.

Unfortunately, I don’t have Stata or R code available for this particular example.

Hello Dr. Allison,

Thanks for the compelling and clear article about the interpretation of CIFs. I have a question about your usage of the term “non-informative”. The way you used it (” Roughly speaking, that means that if an individual is censored at a particular point in time, that fact tells you nothing about the individual’s risk of the event.”) seems, to my current limited understanding, to describe independence of the censoring time and event time distributions. In contrast, others (e.g., https://stats.stackexchange.com/questions/22497/problem-with-informative-censoring/22605#22605) have defined non-informative censoring as meaning that the censoring time and event time distributions share no parameters. Any clarification would be greatly appreciated.

Thanks,

Eric

It’s been a long time since I studied this, but I learned that independent censoring was a special case of non-informative censoring. Parameter distinctness was just presumed in both cases. I certainly don’t think that you’d want to call the censoring non-informative if you ONLY had parameter distinctness and yet the event times and censoring times were highly correlated. In my view, that would be highly misleading.

It’s a lot like missing data. Missing at random plus parameter distinctness is necessary for the missing data mechanism to be ignorable. You’ve got to have both. But of the two, missing at random is far more likely to be violated.

Hi Dr. Allison,

Thank you for the informative article. I’ve seen the use of empirical cumulative distribution function. I wonder if it is applicable in the survival analysis setting, and if it has anything to do with cumulative hazard function or cumulative incidence function (in the presence of competing risks). Thank you!

If there’s no censoring and no competing risks, the empirical cumulative distribution function is identical to the cumulative incidence function. (And both are equal to 1 minus the survivor function). Otherwise, they are not the same. Here’s an article on the empirical cumulative distribution function: https://en.wikipedia.org/wiki/Empirical_distribution_function#:~:text=In%20statistics%2C%20an%20empirical%20distribution,of%20the%20n%20data%20points.

Thank you Dr Allison, I think that in the view of the latest finding that sum of all probabilities can be >1 for Fine-Gray model (https://onlinelibrary.wiley.com/doi/full/10.1002/sim.9023) its advantage for individual risk prediction may be even less. However, I wonder, is it the case that Cox model can never produce such an effect (sum of probabilities for all competing risks >1 when each probability estimated from its own cause-specific Cox model)? Would you suggest any alternative to either of the approaches? Thank you

This is an important result that I was not aware of. It further calls into question the value of the Fine-Gray method. Unfortunately, I think that the Cox model can produce the same problematic result.

Dear Prof. Allison,

I agree with your main points and I think in fact the situation might be more alarming than what your writing suggests. Allow me to give my own take on the subject. For simplicity I will ignore the possibility of true censoring.

The empirical data for a competing risks model is a pair (T,I) of random objects, with T the observed failure time and I in {1,…n} the failure type. Those are often (Tsiatis, 1975; David, 1976) regarded as arising from an n-tuple of random times (T_1,…,T_n) by setting T = min_i T_i and I = argmin_i T_i. I think it is important to realize that the random times T_i here are typically not times of anything that happens in the real world, but they are defined in terms of counterfactual worlds — T_i is the time of an i-th type failure in a counterfactual reality in which some hypothetical intervention eliminated all other risks of failure, but without affecting directly the i-th risk. Given that the various T_i arise from events that exist in distinct counterfactual realities, one might wonder what the joint distribution of (T_1,…,T_n) means. A possible answer is that we may regard (T_1,…,T_n) to be independent upon conditioning on an idealized sufficiently large set of covariates X_max and then we can define the unconditional joint distribution of (T_1,…,T_n) by averaging over X_max. The (T_1,…,T_n) could then also not be independent upon conditioning only on the subset X of the covariates X_max that is actually included by the researcher in the model.

It should be noted that, without assumptions on the joint distribution of (T_1,…,T_n), such distribution is not identifiable from the real world data (T,I,X).

Given this set up, we have the following possible definitions of a hazard function (following the standard terminology from David, 1976):

(a) the i-th net hazard function, lambda_i^net, which is the hazard function for the counterfactual random time T_i, i.e., lambda_i^net(t) = P(T_i in [t,t+dt] | T_i >= t)/dt;

(b) the i-th crude hazard function, lambda_i^crude, which is defined from the real world data (T,I) as lambda_i^crude(t) = P(T in [t,t+dt] and I=i | T >= t)/dt;

(c) the i-th subdistribution hazard, lambda_i^sub, which is also defined from the real world data (T,I) as lambda_i^sub(t) = P(T in [t,t+dt] and I=i | not(T = t or I is not i). This is the same as the hazard function associated to the random time T*_i which equals T if I=i and equals +infinity if I is not i. Note that the cdf of T*_i is precisely the cumulative incidence function for failures of type i.

The hazards lambda_i^net and lambda_i^crude are equal if the counterfactual times (T_1,…,T_n) are independent (and their versions lambda_i^net|X, lambda_i^crude|X conditional on some set of covariates X are equal if (T_1,…,T_n) are conditionally independent on X, of course). I think both lambda_i^net and lambda_i^crude are reasonable measures of the risk of type i failure among individuals that are at risk. Now lambda_i^sub is a completely different animal and I don’t think it has a very natural interpretation — it includes in the denominator individuals that have already failed due to a failure type distinct from i and that are therefore not at risk. It has some weird properties, for instance, it most often decreases to zero as t–>+infinity, although there is no real decrease in the risk of type i failure — it is just an artificial decrease due to the fact that individuals that have already failed keep piling up in the denominator as time passes.

Fine & Gray method correctly estimates the parameters involved in a model of lambda_i^sub|X, as long as the model is correctly specified. Thus Fine & Gray method is not wrong in the sense that it estimates what it is supposed to estimate. The only problem is that the thing that it is supposed to estimate — subdistribution hazard ratios — are perhaps not worth of estimating because they don’t have clear interpretations or at the very least don’t have the interpretation that most researchers expect.

The net hazard lambda_i^net|X is not estimable without assumptions (that cannot be empirically verified directly) on the conditional joint distribution of (T_1,…,T_n)|X. Finally, the parameters involved in a model of lambda_i^crude|X can be easily estimated using conventional survival analysis techniques like Cox regression — one just have to treat failures that are not of type i as right censoring times. It is important to note that no assumptions of independence of (T_1,…,T_n)|X are required here. Such assumptions are required to establish that lambda_i^net|X = lambda_i^crude|X, but if you are really interested in lambda_i^crude|X then the conventional methods always work (provided other assumptions of the models are satisfied, of course).

To conclude the exposition, let me give another example of a simulation in which Fine & Gray gives nonsensical results. Consider a control group (X=0) in which type 1 failure has a constant hazard of 1 and type 2 failure has a constant hazard of 2, i.e., sample T_1 and T_2 independently from exp(1) and exp(2), respectively. Now consider a treated group (X=1) in which type 1 failure has a constant hazard of 3 and type 2 failure has a constant hazard of 6, i.e., sample T_1 and T_2 independently from exp(3) and exp(6), respectively. Now set T = min{T_1,T_2} and I=1 if T_1<T_2. Note that the treatment multiplies the failure rates of both types by 3 so you would expect a decent estimation method to yield an HR around 3. Yet, if you run Fine & Gray and estimate the coefficient of X, you will see that its exponential (i.e., the subdistribution HR) for any of the failure types has a point estimate around 1.2.

What is going on? It's simple: in this example you can easily compute explicitly the subdistribution hazard functions for the control and treated groups as functions of the time t and also compute their quotient. What you get is that the quotient starts with value 3 at t=0 but then it quickly drops and tends to zero as t increases. So the proportional subdistribution hazards assumption is violated — though the true hazards are proportional — and the method estimates some average over t of the time dependent subdistribution HR. The contribution of the small times pulls the result towards 3 and the contribution of the larger times pulls the result towards zero and you end up with something in the middle like 1.2.

This makes good sense. You should write a paper on this.