Logistic Regression for Rare Events

February 13, 2012

Prompted by a 2001 article by King and Zeng, many researchers worry about whether they can legitimately use conventional logistic regression for data in which events are rare. Although King and Zeng accurately described the problem and proposed an appropriate solution, there are still a lot of misconceptions about this issue.

The problem is not specifically the rarity of events, but rather the possibility of a small number of cases on the rarer of the two outcomes. If you have a sample size of 1000 but only 20 events, you have a problem. If you have a sample size of 10,000 with 200 events, you may be OK. If your sample has 100,000 cases with 2000 events, you’re golden.

LEARN MORE IN A SEMINAR WITH PAUL ALLISON

There’s nothing wrong with the logistic model in such cases. The problem is that maximum likelihood estimation of the logistic model is well-known to suffer from small-sample bias. And the degree of bias is strongly dependent on the number of cases in the less frequent of the two categories. So even with a sample size of 100,000, if there are only 20 events in the sample, you may have substantial bias.

What’s the solution? King and Zeng proposed an alternative estimation method to reduce the bias. Their method is very similar to another method, known as penalized likelihood, that is more widely available in commercial software. Also called the Firth method, after its inventor, penalized likelihood is a general approach to reducing small-sample bias in maximum likelihood estimation. In the case of logistic regression, penalized likelihood also has the attraction of producing finite, consistent estimates of regression parameters when the maximum likelihood estimates do not even exist because of complete or quasi-complete separation.

Unlike exact logistic regression (another estimation method for small samples but one that can be very computationally intensive), penalized likelihood takes almost no additional computing time compared to conventional maximum likelihood. In fact, a case could be made for always using penalized likelihood rather than conventional maximum likelihood for logistic regression, regardless of the sample size. Does anyone have a counter-argument? If so, I’d like to hear it.

Reference:
Gary King and Langche Zeng. “Logistic Regression in Rare Events Data.” Political Analysis 9 (2001): 137-163.

Lila says:

January 30, 2025 at 11:26 am

Hi Dr. Allison,
I am a PhD student in management, and I am currently analyzing a dataset where my dependent variable is binary. However, my dataset presents a severe class imbalance:
1. The majority class (0) has over 12,000 cases.
2. The minority class (1) has only 21 cases.
3. The data is cross-sectional.
4. Many control variables are binary (dummy) variables (0/1).
Given this extreme imbalance, I initially used Firth logistic regression to address the rare-event bias and potential separation issues. The residual degrees of freedom (DF) after running Firth logistic regression is quite large (over 12,000).
Would you recommend proceeding with Firth logistic regression, or would exact logistic regression be a better alternative?
Thank you in advance!

Reply
1. Paul Allison says:
  
  February 4, 2025 at 8:18 pm
  
  I think exact logistic would be a better alternative.
  
  Reply
  1. Lila says:
    
    February 6, 2025 at 11:17 am
    
    Hi, Dr. Allison, I still have a question. When I used exact logistic regression, many of the dummy variables were omitted in Stata. For example, “0.drs != 1 predicts failure perfectly;
    0.drs omitted and 191 obs not used.”
    That resulted in the drop of 1000 cases in a sample with 13000 cases. But in the Firth logistic regression, there is no such a problem. What other approaches I should take to deal with this issue?
    
    Reply
    1. Paul Allison says:
      
      February 10, 2025 at 1:36 pm
      
      Are you using the exlogistic command? What you’re describing shouldn’t happen with that command.
      
      Reply
      1. Lila says:
        
        February 11, 2025 at 9:57 am
        
        Hi, Dr. Allison. I used the logistic command, sorry for the confusion. However, since my total example is now over 70000, and the minority class has 130 cases, when I used the exlogistic command, it showed “exceeded memory limit of 100.0M bytes” even when I added the memory () option. Should I switch to firthlogit? Thank you so much for your help.
      2. Paul Allison says:
        
        February 17, 2025 at 1:21 pm
        
        Yes, I would go with firthlogit.
Sachin says:

December 6, 2024 at 8:42 pm

Hi Dr. Allison,
Thank you for this helpful note. I want to request your advice.

I have a situation with a sample of 1.1 million observations, in which the binary predictor of interest occurs only 6% of the time (i.e., 6% of the data have X=1). The rare event occurs with probability 0.000168 when X=0 and probability 0.000244 when X=1. There are two additional complexities. One, I have repeated events. Two, ideally I would like to control for six categorical covariates as well.

My questions:
1. Is this a rare events situation?
2. In SAS Proc Logistic has the Firth option but does not allow repeated measures. And Proc Genmod has repeated measures but does not allow the Firth option. How can I handle both?

Many thanks for your advice.

Reply
1. Paul Allison says:
  
  December 11, 2024 at 2:37 pm
  
  Yes, your outcome events are rare, but as I said in my post, what matters is the number of events not the percentage. Seems like you have about 200 events, which should be sufficient for conventional ML with one treatment predictor and 6 covariates. So, given that your events are repeated, I’d probably go with GENMOD. That said, if any of your covariates are also rare events, you could potentially run into separation problems with those. Try it and see.
  
  Reply
Paula says:

October 16, 2024 at 4:49 pm

Hi, Dr. Allison, I would be grateful if you could weigh in a bit on my issue. This article here sort of directs people to balanced data. (https://doi.org/10.1016/j.ecolind.2017.10.030). However, my sample is so large that I may get away with it. I am using census data, completed data is over 45 million observations, IV (0=91%, 1=9%), after subsetting by Age>15, I get an N=32 million with IV (1=92.3%, 0=6.7%). Should I consider my initial Logit results sufficiently robust, or should I go through a different route? My geographical variables resulted in such high significance that my advisor wants me to try a Multilevel Logistic, but that would still leave me with doubts about the unbalanced IV. I am using R. Thank you!

Reply
1. Paul Allison says:
  
  October 21, 2024 at 7:44 pm
  
  With this many observations, there is no need to be concerned about the fact that your event is rare. However, the large sample size also means that even effects of trivial magnitude may have very low p-values. So be suspicious of your geographical effects, and evaluate them by the magnitude of the odds ratios (or some other measure of effect size).
  
  Reply
Gustav Sørensen says:

September 15, 2024 at 7:09 pm

Dear Dr. Allison,

I hope this message finds you well.

I would greatly appreciate your input on a problem regarding chronic disease analysis using national data.

We are working with a dataset of approximately 4 million individuals, which includes socioeconomic variables such as age, educational level, and occupation. The dataset also contains information on 16 chronic diseases, where each individual either has (1) or does not have (0) each condition.

We define a “condition portfolio” as a unique combination of these chronic diseases, and there are over 6,000 distinct portfolios. I need to perform logistic regression to estimate the effects for each portfolio.

However, some portfolios are extremely rare—some are observed in only 1 or a handful of individuals. I am tasked with determining a lower threshold for the number of individuals in a portfolio before conducting logistic regression for the portfolios

I have reviewed literature on events per variable (EPV) and power curves, but I am still uncertain about how best to set this lower limit in the context of such a large dataset with rare occurrences.

Could you offer any guidance or advice on how to approach this?

Thank you for your time and expertise.

Kind regards,
Gustav Sørensen, Student

Reply
1. Paul Allison says:
  
  September 16, 2024 at 7:37 pm
  
  It sounds like your outcome variable is whether or not a person has a particular portfolio. Is that right? Ignoring the rarity problem for a moment, is this a realistic research question? Who is going to interpret 6,000 regression models, and how would they make sense of the results? If you’re going to do it, the ideal would be to estimate a multinomial logit model for all 6,000 portfolios. But I don’t know of any software that could handles something like that. The alternative is to estimate a separate regression model for each portfolio, but there’s a particular way that should be done. To make the results equivalent to a multinomial logit model, each binary model should be a comparison between the target category and a reference category, which would most reasonably be the most common portfolio. All the other portfolios would be excluded from this regression.
  
  If you really want to go this route, here’s my recommendation for dealing with the rarity problem. Do a tabulation of your portfolios from the most common to the least common. I’m guessing that less than 20% of your portfolios will contain 80% of the cases. Just stick with those, comparing each with the most common category.
  
  Reply
Eunkwang Seo says:

September 12, 2024 at 9:13 pm

Dear Dr. Allison,

Thank you very much for this page. Since you highlighted the small-sample bias in maximum likelihood estimation, I am wondering if this issue is less problematic when we use a linear probability model, or OLS, (instead of Logit or Probit) for a binary dependent variable. Additionally, I am not sure if we should take into account the number of covariates in the regression model? For example, my state-level staggered DID regression analysis have 130,000 observations and 400 binary-outcome events with 50 state fixed effects and 150 year-industry fixed effects. In this case, do you think I should be concerned about the rarity of the outcome?
Best regards!

Reply
1. Paul Allison says:
  
  September 16, 2024 at 7:20 pm
  
  Yes, the issue would be less problematic with the linear probability model. But does that model really make sense for your outcome?
  If you stick with logit, 400 events is ordinarily plenty to avoid small-sample bias. On the other hand, you’ve got a lot of fixed effects, and that could cause problems. I recommend doing conditioanl logit to deal with the 150 year-industry fixed effects, and then using dummies for the state effects. Or maybe doing separate industry and year fixed effects to reduce the number.
  
  Reply
Blessing S Ofori-Atta says:

June 14, 2023 at 6:43 pm

Dear, Dr. Paul,
I have a statistical question regarding the appropriate method for estimating risk ratios for a binary outcome with the Poisson family. Specifically, I am interested in knowing if it is statistically correct to use Firth regression with robust standard errors (using coeftest in R) for this purpose.

The reason I am considering Firth regression is that my dataset contains rare percentages. I want to ensure that I am using the appropriate statistical method to obtain accurate risk ratio estimates.

Thank you for your time and expertise.

Reply
1. Paul Allison says:
  
  June 19, 2023 at 12:42 pm
  
  Yes, Firth regression should be fine.
  
  Reply
Alicia says:

April 9, 2023 at 2:20 pm

Dear Dr. Allison,
Thank you for your and helpful post.
I have a dataset with 84,361 observation with 760 events and 8 predictor variables. So the percentage of events is 0.9%. I have already run the firth method and the regular logistic regression using R Studio, and there is such no different in the result. But firth model has smaller AIC and smaller standard error. Which method would you recommend?

Thank you in advance

Reply
1. Paul Allison says:
  
  April 10, 2023 at 12:53 pm
  
  Either is fine, but I’d probably go with Firth. Although your events are rare in terms of percentage, they are not rare in terms of absolute numbers. Also, you can’t compare AIC’s across these two methods. The log-likelihoods are not comparable.
  
  Reply
  1. Alicia says:
    
    April 10, 2023 at 1:15 pm
    
    Thank you very much for your answer Prof.
    In this data, is it considered as a rare event data?
    
    Reply
    1. Paul Allison says:
      
      April 18, 2023 at 12:28 pm
      
      It’s rare in terms of percentages but it’s not rare in terms of absolute numbers. And what matters for maximum likelihood estimation is the number of events, not the percentage.
      
      Reply
Ridza Aryanata says:

March 20, 2023 at 2:53 am

Dear Dr. Allison,
Thank you very much for your helpful post and comments. I have a dataset with 169,172 observations and 9 predictor variables. Only 3,605 (2%) of the dependent variables take the value of 1. The remaining is 0. Would you recommend the Firth model or a regular logistic regression will be enough?

Reply
1. Paul Allison says:
  
  March 20, 2023 at 1:04 pm
  
  Even though the percentage of events is small, the large NUMBER of events should make regular logistic regression work fine. There’s no harm in trying the Firth method, but I doubt that you’ll find much difference in the results.
  
  Reply
  1. Ridza Aryanata says:
    
    March 22, 2023 at 2:38 am
    
    Thank you very much for your answer prof
    On this problem, can i use multilevel logistic regression to analyze hierarchical data?
    
    Reply
    1. Paul Allison says:
      
      March 27, 2023 at 12:07 pm
      
      It should be OK.
      
      Reply
Srijana says:

February 21, 2023 at 10:31 am

Dear Allison,
I have total sample size 214 , outcome was found to be only 2(0.9%) which is very rare. So for this can i use firth logistic regression? Even the logistic curve is not sigmoid

Reply
1. Paul Allison says:
  
  February 23, 2023 at 1:45 pm
  
  With only 2 events, you really can’t do anything useful. Unfortunately, sometimes you just have to give up.
  
  Reply
  1. Srijana says:
    
    February 25, 2023 at 4:21 am
    
    Dear Allison,
    Do you suggest any option for analytical study of such event with 2(0.9%). As this is my academic thesis work so giving up may not be possible but i could explain there is no any useful result by showing certain statistical result .
    
    Reply
    1. Paul Allison says:
      
      February 27, 2023 at 1:34 pm
      
      I understand your disappointment, but two events is just too few to say anything reliable.
      
      Reply
Barbara says:

September 7, 2022 at 6:07 pm

Dear Dr. Allison,
Thank you very much for your helpful post and comments. I have a dataset with 2,193,067 observations and 13 predictor variables. Only 5852 of the dependent variables take the value of 1. The remaining is 0. I used logistic regression and also I tried Fifth. I got similar results, about 64.76% accuracy, 68.43% sensitivity, and 64.75% specificity. Do you suggest another method to improve my results? Such as undersampling for example. Thank you in advance.

Reply
1. Paul Allison says:
  
  September 26, 2022 at 2:14 pm
  
  I think what you’ve done is fine.
  
  Reply
Jonash says:

July 13, 2022 at 2:43 pm

Dear Mr Allison,

I have a short (3 year) unbalanced panel data set with approximately 70000 observations and around 4000 events. I am under the impression that this is likely enough to not count as “rare events”. I’m using conditional logit with individual and time fixed effects (with X correlated error components). For one level of one of my categorical predictors, the model fails to estimate the coefficient, giving me this warning: “Loglik converged before variable 6 ; beta may be infinite” and NAs for the coefficient and SEs. This seems to me a case of perfect separation, however when I cross tabulate my response with this predictor by year, there are numerous cases in both outcomes 0 and 1 in all three waves. Do you have any ideas what could be the reason for this issue or how to investigate ? This happens even if I include only the variable causing the problem as a predictor. I know this only touches the topic here peripherally but I would greatly appreciate your input.
Thanks.

Reply
1. Paul Allison says:
  
  September 26, 2022 at 2:23 pm
  
  Good question. I’m guessing that the problem arises because you’re doing conditional logit. That method essentially breaks the data down into small groups that have the same number of events. Within those groups, separation can occur even if it doesn’t occur for the whole sample.
  
  Reply
Getahun M Awoke says:

July 6, 2022 at 8:39 pm

Hello Dears,
In my case, I have 151 events out of 1867 (repeated measure of unbalanced visits of 278 subjects, 21 events), I want to fit a preliminary model, like logit, taking events into account while disregarding time (survival time). I have fitted logit for 26 predictor variables, the challenge was high odd ration and wide CI, and one of my predictor variable has undue influence on the effect of other variables when I remove it others becomes non significant (not all but it affect about 5 variables), but practically that variable is not such influancial. what is your recommendations regarding my estimation method (firth or defualt one) and my disturbance predictor variable, please.

Reply
1. Paul Allison says:
  
  September 21, 2022 at 1:09 pm
  
  Use Firth.
  
  Reply
najibullah baeradeh says:

April 24, 2022 at 9:18 am

Dear Dr. Allison,
Thank you for your answer. If the number of predictors is more than 100, can firth’s logistic regression be used?
I have a dataset with 10,000 observations and over 100 predictor variables. Only 800 of the dependent variables take the value of 1. The remaining is 0. Would you recommend the Firth model or a regular logistic regression will be enough?

Reply
1. Paul Allison says:
  
  September 4, 2022 at 7:59 pm
  
  That’s a lot of predictors. I’d probably go with Firth. With that many predictors there are bound to be some that cause convergence problems.
  
  Reply
najibullah baeradeh says:

April 17, 2022 at 8:00 pm

Dear Dr. Allison,
Thank you very much for your helpful post and comments. I have a dataset with 10,000 observations and 20 predictor variables. Only 800 of the dependent variables take the value of 1. The remaining is 0. Would you recommend the Firth model or a regular logistic regression will be enough?

Reply
1. Paul Allison says:
  
  April 22, 2022 at 1:12 pm
  
  Regular logistic should work fine. If you have 800 events and 20 predictors, that’s 40 cases per event. A common rule of thumb is that you should have at least 10 of the less frequent cases per coefficient estimated.
  
  Reply
Peym says:

January 5, 2022 at 10:48 am

Dear Dr. Allison,
Thank you very much for your helpful post and comments. I have a dataset with 400,000 observations and 9 predictor variables. Only 289 of the dependent variables take the value of 1. The remaining is 0. Would you recommend the Firth model or a regular logistic regression will be enough?
Thank you.

Reply
1. Paul Allison says:
  
  January 5, 2022 at 12:09 pm
  
  Regular logistic regression should be fine, but it’s probably worth trying Firth as well.
  
  Reply
André says:

December 10, 2021 at 12:22 am

Dear Dr. Allison,

I am trying to perform multiple logistic regression analysis with a binary outcome variable (A=110, B=49) and 13 independent variables that I already identified through univariate logistic regression models. Is it legitimate to do a backwards stepwise procedure? I would be “violating” the one in ten rule in the first models but I would end up with a final model with just 3-5 predictors.

Thanks in advance,
André

Reply
1. Paul Allison says:
  
  December 10, 2021 at 9:14 am
  
  I wouldn’t have a problem with that.
  
  Reply
  1. André says:
    
    December 10, 2021 at 10:22 am
    
    Thanks! Eventually, if the reviewers don’t like that backwards stepwise procedure (because of the high number of predictors in relation to the small sample) I was considering trying another approach. What I am trying to do with this analysis is to see if some particular symptom is an independent predictor of bad prognosis. From the 13 independent variables, 3 are already known independent predictors from the literature and 10 are specific symptoms i want to test. So I was considering developing 10 logistic regression models, each one with the 3 already known predictors and then one of the symptoms at a time. This way I would lose the interaction between all the variables but I would adjust each symptom for the already known predictors and answer my question. Do you think this would be a reasonable option? Thanks again, André
    
    Reply
    1. Paul Allison says:
      
      March 18, 2022 at 6:10 pm
      
      Again, I would be OK with this, but your reviewers may not like it either.
      
      Reply
Samia says:

July 16, 2021 at 1:56 pm

I’d like to use logistic regression with binary outcome. I have 7 explicative variables. I’m asking if my analysis is not biased because only 405 events are recorded in one of the explicative variables.
Thanks in advance.

Reply
1. Paul Allison says:
  
  July 23, 2021 at 9:32 am
  
  If I understand you correctly, I don’t think this should be a problem.
  
  Reply
Richa says:

July 11, 2021 at 3:22 am

In my data there are total 629 observations. The dependent variable is binary with 594 observations falling in one category and only 35 falling in other. Which logistic regression should I use? please suggest

Reply
1. Paul Allison says:
  
  July 11, 2021 at 4:36 pm
  
  A lot depends on how many predictors you have, and how the cases are distributed within any categorical predictors. I’d probably go with Firth logit, but I’d also try exact logistic regression just for reassurance.
  
  Reply
  1. Richa says:
    
    May 24, 2022 at 12:09 am
    
    Thank you so much for your help sir. I have six predictors (3 categorical and 3 non-categorical), 35 events and 594 non-events. For 2 categorical predictors having 2 categories each, there are at least 10 cases for “event” in each of the category. However, for the third categorical predictor with 4 categories one category has no event and the other three categories have at least 4 cases for event. Can I still use Firth Logistic Regression? Please suggest.
    
    Reply
    1. Paul Allison says:
      
      September 21, 2022 at 1:06 pm
      
      Yes, you can use Firth.
      
      Reply
Peter says:

June 17, 2021 at 4:49 am

Dear Dr. Allison,

In my data (sample size = 30), participants all have each task accuracy (1 = accuracy; 0 = inaccuracy) and their accuracy is around 60% (total event is 60). I want to use one index to examine the relationship between this index and the odds of accuracy task. In this small within-subjects study, is better to use the exact logistic regression or Firth method?

Reply
1. Paul Allison says:
  
  June 17, 2021 at 1:07 pm
  
  I’d go with exact logistic regression. Firth is good for reducing small-sample bias in coefficient estimates, but it’s less trustworthy for p-values and confidence intervals.
  
  Reply
Dina says:

April 18, 2021 at 6:35 am

Dear Dr. Allison
I am using unbalanced panel data and a binary dependent variable. I have 1573 observations, only 64 have positive outcomes (takes the value of 1). I did not use firth logit as I understood it is not related to proportion but the no. of events themselves. Instead I used fixed effect logistic model. However, the reviewer told me to use logistic regression for rare events. Should I use firth logit or should I argue in my manuscript that it is not applicable to my case?

Many thanks

Reply
1. Paul Allison says:
  
  April 18, 2021 at 3:15 pm
  
  Fixed effects logit will take care of your rare events problem. However, it will effectively delete any individuals who have no events, leaving you with a sample of 64 individuals, at most. That’s OK if you have a well-specified model. But your power will be low, and I expect your reviewers would give you a hard time about excluding such a large fraction of your sample. If you decide to abandon fixed effects, it’s probably worth using firth logit. 64 is not a lot of events, and there’s little cost in doing firth.
  
  Reply
EC says:

January 21, 2021 at 11:43 am

Hi Dr. Allison,

I conducted a Firth analysis with 15 predictor variables on two samples: the first has a binary dependent variable with 1113 observations with 82 events, and the second has 864 observations with 34 events. I am looking to demonstrate the power of the test, is there a specific power analysis that can be done for Firth?

Many thanks,
EC

Reply
1. Paul Allison says:
  
  January 28, 2021 at 9:10 am
  
  Not that I am aware of.
  
  Reply
Efren Aza says:

January 15, 2021 at 9:17 am

I am running a firthlogit model with binary dependent variable, 40 observations and 8 independent variables. Four of the independent variables are statistically significant and 4 not. However the P-value of the model is 0.36, which does not make sense. How reliable is the P- value of firthlogit?
Is there any other way to evaluate the P-value for this model?

In advance thank you very much for your support.

Best regards,

Reply
1. Paul Allison says:
  
  January 15, 2021 at 2:19 pm
  
  What software are you using? P-values for firthlogit should be calculated by (penalized) likelihood ratio tests, not by the usual Wald tests. Some software packages make this easy. Others not so much. Also, you might try deleting the four variables that are not significant and see what happens. Finally, of the 40 observations, how many of those are 1’s and how many 0’s. If either of those numbers is small, even firthlogit might not be adequate. You may need to do exact logistic regression (and probably should try it in any case).
  
  Reply
bashar says:

November 27, 2020 at 8:30 am

Dear prof,
First of all, Many Thanks for your post and your replays.
Please, I would like to take from you some advices related to my case.
I have a panel data with 410 companies and 3300 observation. My Binary variables like that (158 yes) and the rest are no. I want to use max 14 independent variables in different model specifications. Some model only 2 independent variable. I want to cluster the standard error at company level.
Is it ok for my case (rare event) to use the standard logistic model and then go for roubtness to random effect logit model ? if yes , you have any refrences to use in my paper Thank you so much

Reply
1. Paul Allison says:
  
  December 3, 2020 at 9:19 am
  
  I think you can reasonably go with standard logistic with robust standard errors. But be on the lookout for possible quasi-complete separation. Some software will warn you about this, but if not, it’s usually indicated by very large coefficients and extremely large standard errors.
  
  Reply
Wayne Kenney says:

September 24, 2020 at 6:33 pm

You say that the number of rare events is what is important, not the proportion. My data has 216 million rows with 86 thousand of the minority class. The peer reviewers for my paper tell me to undersample, but I don’t think I should do it.

Do you have a paper or something I can reference for this?

Reply
1. Paul Allison says:
  
  September 24, 2020 at 8:10 pm
  
  Well, you certainly don’t have to sample to get valid results. But do you really want to do your analysis on 218 million cases? If you keep the 86,000 minority cases and take a sample of 86,000 from the majority cases, you’ll have a much more manageable data set, and you’ll see hardly any increase in the standard errors.
  
  Sorry but I don’t have a reference handy–I haven’t been to my office in months.
  
  Reply
Christine says:

July 28, 2020 at 8:54 pm

Dr. Allison,

My data has 130 observations; the event is not very rare (59 out of 130) but one of my categorical variables (5 categories 4 coefficients) perfectly fails to predict the DV (in all 9 cases the DV is 0). In my case would you recommend the Firth method?

Thank you very much

Reply
1. Paul Allison says:
  
  July 30, 2020 at 12:28 pm
  
  Well, my initial strategy would be to combine some of the 5 categories, if that seems reasonable. Otherwise, firth is worth trying.
  
  Reply
John K says:

July 20, 2020 at 1:18 pm

Hi Dr. Allison,

I have two separate cross-sectional samples, the first has a binary dependent variable with 1113 observations with 82 events, and the second has 864 observations with 34 events. We are wondering whether it’s feasible to use a Firth logistic regression model for these two samples. We have 15 predictor variables in each model.

Thank you,
John

Reply
1. Paul Allison says:
  
  July 21, 2020 at 7:00 am
  
  Firth could work for these data sets. But I’d also try exact logistic regression.
  
  Reply
Anjali says:

June 11, 2020 at 5:39 am

Hello Sir,

Sir,I am faced with the following issue.

In the panel data setting,the binary dependent variable is skewed towards positive outcomes such that there are 1330 events (ones) and 25 non events (zeros) in my sample. Can I use firth logit regression here? If not, what should be done to correct this bias? Please guide.

Ps. I used random effects logit because of multiple positive outcomes. Variation on Y was missing and lead to loss of 1000 or more observations.Used Stata for my analysis.

Kind Regards,
Anjali

Reply
1. Paul Allison says:
  
  June 11, 2020 at 8:23 am
  
  Yes, you could use firth logit. But there’s no easy way to combine that with random effects, at least not in Stata.
  
  Reply
sylvia says:

May 15, 2020 at 8:16 am

Hi Dr Allison,

I have a sample of 202 observations with 17 events. I am conducting some logistic regression independently for each potential predictors. I am using exact logistic regression due to small numbers of events observed some of the factors.
I wonder if you could give me your advise on a couple of things:
– Stata has the option to choose the test (sufficient, score and probability). Which option one would be more advisable?
– Would it be wrong to report exact logistic regression results for a binary predictor which has no events in one of the two categories?

Many thanks

Reply
1. Paul Allison says:
  
  May 15, 2020 at 8:44 am
  
  -I’d go with sufficient, which is the default.
  -No it would not be wrong to report those results. What you are describing is quasi-complete separation, and that’s one of the things that exact logistic regression is designed to deal with.
  
  Reply
MUKESH KUMAR says:

May 14, 2020 at 8:51 am

I have a sample size of 27500 observations.
Three dependent, categorical variables.

> in the first case 21 percent observations have an event happened or (response, 1).

> in the second case 26 percent observations have an event happened (response, 1).

> in the third case 14 percent observations have an event happened (response, 1).

which logistic regression model should I use?

Reply
1. Paul Allison says:
  
  May 14, 2020 at 8:59 am
  
  In all three cases, I would use standard maximum likelihood. Even in the third case, you have over 3,000 events, so I wouldn’t expect any problems.
  
  Reply
Xiaobo Yang says:

April 20, 2020 at 3:44 am

In a cross-sectional study,I surveyed about 2500 people and only 9 of them were positive for a specific disease.My intention was to explore 2 risk factors. When exact logistic was used, OR of risk factor1 was 578 [95%CI, 77-5876], and OR of risk factor2 was 0.29 [95%CI 0-0.81]. When Firth Logit was used, OR of risk factor1 was 520 [95%CI, 95-2837], and OR of risk factor2 was 0.22 [95%CI 0.05-1.00]. When Poisson regression was used, IRR of risk factor1 was 247 [95%CI, 71-864], and IRR of risk factor2 was 0.00005 [95%CI 0.00003-0.00006]. My question is which model should I choose?

Reply
1. Paul Allison says:
  
  April 20, 2020 at 1:26 pm
  
  I’d go with the exact logistic regression. With so few events, this is the only one you can really trus.
  
  Reply
Leander Höhne says:

March 20, 2020 at 6:56 am

Dear Dr. Allison,

thanks for your helpful post.
I would have a follow-up question for which I couldn’t find an answer so far, maybe you can help with that:
I have a binary response variable with 76 observations in total, but only 9 of them are events (i.e., “ones”). I would like to fit a mixed-effect model to this data using the glmer() function in R and penalized likelihood estimation, and then run a stepwise backward variable selection.
My question is what the max. level of complexity is that is reasonable to include in the “full” model? In particular, does the low number of “positive” outcomes affect the number of predictors that can be included in a logistic model? (So far, I am only aware of a rule-of-thumb that would allow one predictor to be included every 10 observations. But is this affected by the number of events in logistic models?)
Side-info: One random-effect shall be included.

Looking forward to reading your answer!
Leander

Reply
Katie Kim says:

March 18, 2020 at 8:55 pm

I have sample size: n=7,695(100%) / Yes 245(3.2%), No 7455(96.8%). I am tried to run Binary logistic regression analysis in SPSS. Is it possible to analyze?

Reply
1. Paul Allison says:
  
  April 20, 2020 at 1:41 pm
  
  Yes, not a problem.
  
  Reply
Emmet Kelly says:

January 11, 2020 at 12:33 pm

Dear Professor Allison,

I have a data set of 949 observations sampled from 19 locations with 48 disease postive samples. I have included the location as a random effect in a glmer(R package lme4)logistic regression in R with approxiamtely 4 predictor variables in the final multivariate. What worries me is that the estiamtes and standard errors are high in this model and as there was no disease some of the loactions I wonder is the model suffering from quasi complete separation. So my questions are; would I be better to use Firth’s in this situation (R package logistf) and if so what do I do about the random effect of location? As I understand a random effect cannot be used with Firth’s but I could be mistaken as I am not hugely familiar with this method?

Reply
1. Paul Allison says:
  
  January 11, 2020 at 5:09 pm
  
  Even if some of the locations have no events, it shouldn’t cause quasi-complete separation. Maybe you’ve just got low power. Try running it without the random effect, both with and without the Firth correction.
  
  Reply
HARDEEP SINGH says:

December 23, 2019 at 3:19 am

Respected Dr. Allison,

I am a recently joined Phd student. I am analyzing an adoption model where my dependent variable is a binary variable. I have total 30107 observations out of which 2956 observations with positive responses on adoption. Given the above mentioned observations, can you please inform me which model should I use: simple logistic regression model or frith logit model?
Looking forward to hearing from you.

Regards,
Hardeep

Reply
1. Paul Allison says:
  
  December 23, 2019 at 8:57 am
  
  Either would be fine, but there’s really no reason for Firth logit in this situation.
  
  Reply
Zung says:

November 11, 2019 at 11:46 pm

Dear Dr. Allison,

I am analyzing an adoption model where the adoption rate of improved varieties is very high (913 events out of 949 samples). This is kind of the opposite to rare events and also causes separation when using logistic regression. I can use Firth logit for this case, right? Is there any differently potential bias that should be aware of in this case?
Look forward to your response. Thank you very much!
Best regards,

Thanks

Reply
1. Paul Allison says:
  
  November 13, 2019 at 9:01 am
  
  Absolutely, you can use Firth in this situation. The logistic method doesn’t know which outcome is the “event” and which is the “non-event”. What matters is the number of cases in the less frequent category of the dependent variable. I know of no reason for any special concern about bias in your example.
  
  Reply
Rania says:

October 30, 2019 at 7:12 pm

I have sample size of 215 with 10 of them having disease positive and I have about 20 covariables that I want to examine. Can I do univariate logistic Regression for each and those with p- value more than 0.1 are to be included in multivariate logistic regression ?
if so, this will gives me a 10 covariables to start with in backwards stepwise logistic regression .
some covariables appear to have a high OR and very low p value in univariate and then appears to be non significant in multivariate . What does mean ?

is there another way to do my statistical analysis ??

Reply
1. Paul Allison says:
  
  October 31, 2019 at 12:39 pm
  
  With only 10 events, you really don’t want more than two predictors in the final model, and even that is pushing it. Given that situation, it’s not surprising that things change dramatically with the exclusion or exclusion of particular variables.
  
  Reply
  1. Rania says:
    
    November 17, 2019 at 2:25 pm
    
    Shall I use the firth logistic regression ??
    
    Reply
    1. Paul Allison says:
      
      November 18, 2019 at 7:40 am
      
      In this case, the safest way to go is exact logistic regression. But I’d try Firth too.
      
      Reply
Gaspar says:

October 19, 2019 at 3:21 pm

What method do you recommend me to use when I have 3600 observations and 80 events? Logit, scobit, Firth logistic method?

Thank you!

Kind Regards

Reply
1. Paul Allison says:
  
  October 31, 2019 at 12:44 pm
  
  I’d probably go with Firth.
  
  Reply
Yueting Li says:

September 11, 2019 at 9:33 am

Dear Prof. Allison:
Thank you very much for your helpful post. It really helps a lot.
Do we now have any method to do a sensitivity analysis for this Firth model? If so, is there any existed R package or statistical software to do it?

Regards,
Yueting Li

Reply
1. Paul Allison says:
  
  October 31, 2019 at 12:44 pm
  
  Not that I’m aware of.
  
  Reply
Haider Mannan says:

July 17, 2019 at 9:23 am

Dear Prof Dr Paul Allison
I have to run logistic regression for complex survey
data when the outcome is rare. So either Firth’s
correction or Prof Gary King’s corrective approach
needs to be used. In SAS, Proc surveylogistic
doesn’t have options to estimate either Firth’s or Gary kings
method. Where can I find sas codes for
fitting my abovementioned models? Someone in the
Web suggested to use proc glimmix but how? Your assistance
would be of great benefit to me. If sas codes are unavailable,
Stata is fine also.
Regards
Haider Mannan

Reply
1. Paul Allison says:
  
  September 5, 2019 at 2:34 pm
  
  Sorry, but I don’t know how to do this in either SAS or Stata.
  
  Reply
f says:

July 16, 2019 at 2:44 pm

Dear Professor Allison,
first of all, thank you for the work you are doing with this blog.
I have a general question. Let’s say we have survey data with samples ranging from 1,000 to 2,000 cases.
How many cases would you indicate as a threshold to consider an event not-rare and run conventional logistic regression?
Thank you very much in advance. Best wishes
f

Reply
1. Paul Allison says:
  
  October 31, 2019 at 12:48 pm
  
  Overall number of cases doesn’t matter. It’s the number of events (or the number of the less frequent outcome) that matters.
  
  Reply
Tyler R says:

July 11, 2019 at 5:06 pm

Dr. Allison,

What logistic method do you recommend for dealing with 12 events when I have 210 observations? Is this enough to use a Firth logistic method?

Thanks

Reply
1. Paul Allison says:
  
  July 11, 2019 at 8:41 pm
  
  In that case, I’d probably go with exact logistic regression.
  
  Reply
tshering says:

July 10, 2019 at 3:58 am

Professor Paul,
I have a situation where my sample size is 1500. The outcome event (proportion) is very small. I am trying to assess the association between the outcome variable(binary) and independent variable (categorical). But the problem here is with the small event(low prevalence) that there are zero value in some cells in the contingency table. logistic regression seems not appropriate in this case because of very small number in some cells(zero). would firth logistic regression be appropriate? or i should scrap the idea of multivariate analysis?

Reply
1. Paul Allison says:
  
  July 11, 2019 at 8:49 pm
  
  As I point on in the post, what matters is not the proportion of events but the actual number of events. In any case, the fact that you have zeroes in some cells of the contingency table means that you’ve got quasi-complete separation, and that’s a big problem for conventional logistic regression. Firth might work OK. But I’d also try exact logistic regression.
  
  Reply
Jamie F. says:

May 14, 2019 at 8:35 am

Hi Dr. Allison, In your post above you note that, “Unlike exact logistic regression… penalized likelihood takes almost no additional computing time compared to conventional maximum likelihood. In fact, a case could be made for always using penalized likelihood rather than conventional maximum likelihood for logistic regression, regardless of the sample size.”

What is the argument for always using penalized likelihood rather than ML? and do you have any sources to this effect?

Background: I’ve estimated a couple of firth logistic models but arguably have enough events in the dependent variables (12 IV, N=4112, # DV events = 10% (411) and 12 IV n=2860, # DV events = 17% (486)) to justify using ML logistic regression, but get somewhat/slightly different results when performing Firth (using the R extension in SPSS). Also, I lose data (in a random fashion) when running the models. I consider you my stat guru and based on what you had said above (a case can be made to always use a penalized approach), I reported the Firth results, but a reviewer is questioning and wants me to further explain and justify the method and wants me to report the chi-squares and Nagelkerke R squared -which are not produced with Firth.

BTW: I’ve taken your classes before on SEM and really appreciate your teaching approach.

Anything you could offer here is appreciated. Thanks Much!

Best,
Jamie

Reply
1. Paul Allison says:
  
  May 14, 2019 at 8:59 am
  
  Well, the argument for always using the Firth method is that there is always some bias in conventional ML estimation, no matter how large the sample. So if it’s cost-free to reduce the bias (even it it’s trivial), why not do it? The Firth method has a Bayesian justification (with a Jeffreys prior), although alternative priors have been proposed. See, e.g.,
  
  Greenland, Sander, and Mohammad Ali Mansournia. “Penalization, bias reduction, and default priors in logistic and related categorical and survival regressions.” Statistics in medicine 34.23 (2015): 3133-3143.
  
  As for randomly losing cases, there’s no reason that should happen. Better check your software.
  
  Reply
  1. Jamie F. says:
    
    May 14, 2019 at 9:56 am
    
    Thanks so much!
    
    (The missing cases are from listwise deletion)
    
    You’re the best!
    
    -Jamie
    
    Reply
Florian Kadow says:

April 21, 2019 at 11:10 am

Dear Professor Allison,

Thank you very much for this blogpost. I am currently working on my master’s thesis on the extent to which the level of the number of war-injured (independent variables) influences bilateral ODA payments (dependent variable). My dataset consists of approximately 190,000 observations of which 6,760 incidents existed. Neither my dependent nor my independent variable is binary, but both variables are logarithmized. Do I have to pay attention to “rare events” in this case as well?

Many thanks for your help

Reply
1. Paul Allison says:
  
  April 22, 2019 at 10:28 am
  
  I’m a little confused. You have 6760 incidents, but you say your dependent variable is not binary. Are you estimating your model only for the incidents? If not, are payments 0 for the non-incidents? And if so, how are logging 0? In any case, you have a sufficient number of incidents that there should be no concern about rare events.
  
  Reply
Alonso Bussalleu says:

April 5, 2019 at 10:38 am

Thank you for the insights. I would appreciate your thoughts about this case:
I am modelling ecological spatially explicit data where the response variable is binomial. I have both continuous and categorical explanatory variables. The sample size is small (less than 200), the number of successes is very small and all of them are in one of the categories of an explanatory variable (9 or 2 successes depending on how stringent the threshold to consider a success is). I was thinking on using a generalized additive model approach as I don’t expect the spatial relationships to be linear (and I hope not to get spatially correlated residuals) and this approach was also suggested by Zuur et al (2007).
I was wondering if the Penalized Maximum Likelihood Estimation method from Firth has being implemented in generalized additive models. Would a Bayesian-MCMC approach (R package stan can deal with binomial GAMs) be an interesting option? Is the Firth method implemented for SAR or SMA models? I didn’t intend to use this methods (Bayesian, SAM, SMA) as I have a limited understanding of them and no practical experience. Considering the sample size, should I just do an exact logistic regression instead? It would be very helpful too if you could refer to me a discussion about sample size vs number of predictors for logistic regression as I want to avoid over-fitting (I am getting more data very soon).
Thank you for your answer

Reply
1. Paul Allison says:
  
  April 11, 2019 at 1:01 pm
  
  This is a very problematic sample size. Not only is the number of events small, but you also have quasi-complete separation. I would only consider exact logistic regression. That will do the job but you will have very little power to test your hypotheses.
  
  Reply
  1. Alonso Bussalleu says:
    
    April 12, 2019 at 4:51 am
    
    Dear Dr. Allison,
    
    Thank you very much for your answer.
    I wanted to explain a bit more about the sampling design:
    
    A bottom trawler fishes through a transect line. By overlapping this line with a substrate map (5 substrate categories) using GIS, it might have cross one or more substrate types. I am measuring the distance to all five substrate types as 5 continuous explanatory variables. If you crossed the substrate the distance value you get is 0. I included a categorical explanatory variable called substrate type to account for these 0s (which are qualitative different, like in smoker (yes/no) and number of cigarette packs per month). The response variable is the presence absence of a size category of a particular fish.
    
    Larger size categories are not very common and only occur in one substrate type. I can see now that if this size category only occurs in one particular substrate, perhaps I should focus the analysis only on that substrate type.
    
    This would mean:
    – dropping the categorical variable out
    – dropping the continuous variable representing the distance to that particular substrate (as it only would be 0)
    – only analyse observations taken in that substrate
    
    Is this correct?
    
    Since I am modelling the distribution of multiple size categories of the same species, I wanted to account for the same variability sources using the same variables for all models to be able to compare them. However, perhaps using a variable that perfectly separates the data for a particular size category might not be useful.
    
    Would this restriction prohibit to compare results between the different size class models?
    
    This association of larger size classes with a specific substrate type might be an artefact of the unbalanced sampling effort between substrate types (due to limitations in the methodology and the differences in the extent covered by each substrate type). In fact, larger size classes are only found in the substrate type most heavily sampled and covering most of the study area. One of the main objectives of this project was to try to use data collected for other purposes to study this species distribution from which there is very little information in this particular spatial scale.
    
    Do you consider a multinomial logistic regression would be a better approach?
    
    Any comments regarding this are very welcome and would be greatly appreciated.
    
    Thank you very much for your help.
    
    Best regards,
    
    Alonso
    
    Reply
    1. Paul Allison says:
      
      April 12, 2019 at 1:44 pm
      
      Sorry, but I just don’t have time to worth through all the detail. But I’m pretty confident in saying that multinomial logistic regression is unlikely to be better. A Bayesian approach could possibly be beneficial, but I have no experience doing that with logistic regression.
      
      Reply
Nino says:

March 18, 2019 at 11:21 am

Dear Prof Allison,

Thank you very much for this very useful blog post. I have 850 observations in the cross-sectional study, of which 38 are violence deaths (yes or no), my outcome variable. I have regrouped my predictor variables, so as I have at least 10 events in each category of predictor variables (all categorical). I used firthlogit regression and ended up with 4 predictors to be significant, although 2 of them were not significan in univariate analysis, but with p<0.2. Is my approach correct and how can I perform postestimation analysis? Thank you very much for your response.

Reply
1. Paul Allison says:
  
  March 19, 2019 at 11:20 am
  
  This sounds good to me. What kind of postestimation analysis do you want to do?
  
  Reply
  1. Nino says:
    
    March 19, 2019 at 8:39 pm
    
    Dear Prof Allison,
    
    Thank you so much for your response. Regarding your question, this is exactly my question too. I am using STATA V.14, so I do not know which postestimetion tests I should use with firthlogit and what are their syntax. I would be very greatful if you could help me with this. Thank you very much in advance.
    
    Reply
    1. Paul Allison says:
      
      March 20, 2019 at 8:00 am
      
      I’m sorry but I really don’t have any knowledge or experience with postestimation after firthlogit. Try what’s available with logit and see if it works.
      
      Reply
2. Omar Aza says:
  
  January 12, 2021 at 9:53 am
  
  Dear Nino.
  
  I have the same problem. I am running a model with 40 observations 8 independent variables using firthlogit regression. Four of the independent variables are statistically significant and 4 not. however the P value is 36. I checked the data for multicolinearity issues.
  Please let me know how did you perform the postestimation analysis?
  
  Reply
Kathleen says:

February 21, 2019 at 5:12 pm

Hi Dr. Allison,
Thank you for this blog post. I am a PhD student working with a dataset of about 60,000 people. My ‘rare’ event occurs almost 2000 times. although it only occurs 3% of the time, based on your explanation, it seems that is an adequately high number, so I referenced your post in my article. Reviewers were unhappy with my reference of your blog post, so I have looked everywhere but have been unable to find similar advice from a published article. Would you be able to direct me to something that you or others have published which supports your statements above in deciding whether an event is “rare” or not? Thank you so much.

Reply
1. Paul Allison says:
  
  February 26, 2019 at 2:58 pm
  
  I don’t know of any articles that provide exactly what you want. But there are many papers that investigate rules of thumb that involve the number of events per variable. Below is a citation to a recent paper that evaluates such rules and generally finds them wanting. However, in all this literature, there is an implicit presumption that it is not that the percentage of events that matters, but rather the absolute frequency of those events.
  
  van Smeden, Maarten, Joris AH de Groot, Karel GM Moons, Gary S. Collins, Douglas G. Altman, Marinus JC Eijkemans, and Johannes B. Reitsma. “No rationale for 1 variable per 10 events criterion for binary logistic regression analysis.” BMC medical research methodology 16, no. 1 (2016): 163.
  
  Reply
Amelia says:

February 1, 2019 at 2:12 pm

Dear Professor Allison,

I am performing a logistic regression for rare frequencies across age. I have 6 age bins with about 1000 individuals in each. The frequencies range from 0.01 to 0.1 and so I have a small number of cases and hence do not have a lot of power to detect changes in frequency across age bins.

I was thinking the Firth could be helpful? Or is there a better method than logistic regression in this case?

Thank you!

Reply
1. Paul Allison says:
  
  February 26, 2019 at 2:40 pm
  
  Firth could definitely be helpful. But I would also try exact logistic regression. I don’t have any other recommendations.
  
  Reply
Robbie says:

January 16, 2019 at 3:18 pm

Dear Professor Allison,

I am not sure if this is a different or related problem than the one your post addresses. I am doing a logistic regression with a binary X binary interaction. The interaction is significant, after calculating AMEs and second differences. The sample is 221. The problem is that one of the cells in the cross-tab between the two terms in the interaction is very small (n=5). In this case, the rare event isn’t the outcome, but in the interaction. Is all hope lost?

Thanks for all you do!
Robbie

Reply
1. Paul Allison says:
  
  January 25, 2019 at 12:51 pm
  
  Yes, that can also produce problems. The standard solutions for rare events on the dependent variable also work in this case: exact logistic regression or the Firth method.
  
  Reply
Philipp says:

December 12, 2018 at 8:15 am

Dear Prof. Allison,

I am working with binary time-series cross-section data of news websites (N=19 and T=197) where the event of interest only appears 47 times. Theoretically, I am interested whether the reporting on specific topic proportions (for a given website/day) increases the likelihood of the event.

Furthermore, I included newspaper and week fixed effects as both should influence the reporting on a topic and the likelihood of the event. Is it appropriate to use the Firth correction in such situations or is the number of events to small to gain reliable results?

Thanks a lot!

Reply
1. Paul Allison says:
  
  December 28, 2018 at 11:35 am
  
  I think Firth could be helpful in this situation. Also, I usually would recommend against using newspaper dummies to do the fixed effects because it yields biased estimates. But with T=197 that shouldn’t be a problem.
  
  Reply
Surajit Chakraborty says:

July 25, 2018 at 10:35 pm

Dear Prof Allison,

I am trying to build a logistic regression model on with a event rate of 0.6% (374/61279).

In this scenario, is Firth method recommended or stratified sampling (taking all events and a proportion of non events such that number of non-events is greater than number of events).

If stratified sampling, then what should be the desirable event to non-event ratio (obviously not 50-50 as the objective would be to consider more non-events in building the model) ?

Thanks
Surajit

Reply
1. Paul Allison says:
  
  October 2, 2018 at 4:09 pm
  
  There’s no benefit to sampling in this case. Just use all the observations.
  
  Reply
Malika says:

May 29, 2018 at 7:49 pm

Dear Prof Allison
I would like to know out of 180 records, how ‘off’ the ratio of zeros and ones for the binomial dependent variable needs to be to have to use something different than a normal logistic regression? Considering I will be having 5 predictors, two of which being in interaction with each other.
I have been simulating data and running binomial models to see how they behave in extreme cases, but I am still unsure which cutoff should be used.
Any advice ?
Thanks and best wishes
Malika

Reply
1. Paul Allison says:
  
  May 30, 2018 at 9:56 am
  
  The standard rule of thumb is that one should have at least 5 and preferably 10 “events” per each coefficient in the model. This is only a rough guide, however.
  
  Reply
sanjeewa dayarathne says:

April 29, 2018 at 2:26 am

Hello Professor Allison ,

I have credit files of 1000 , which has 300 not credit worthy and 700 credit worthy and 15 other predictors. if i am have credit worthiness as my dependent variable to run a binary logistic regression.. Do you think my data are imbalance ? since the probably is 300/700 ? if it is a issue can i randomly select 300 from credit worthy and run the binary logistic regression on 600 sample or can i artificially double the not credit worthy 300 to 600 and run the regression for 600+700 = 1300 sample ? thank you.

Reply
1. Paul Allison says:
  
  May 25, 2018 at 12:32 pm
  
  There is no problem here to solve. Just run your logistic regression in the usual way.
  
  Reply
Kostas says:

April 28, 2018 at 10:45 am

Hello Professor Allison,

I have a question regarding binary logistic regression on which I would like your insight, if possible.

I have a binary dependent variable and I face rare events. My sample is comprised of 317 companies for the period 2005-2016, namely 3804 firm-year observations from which only 66 firm year observations take the value of 1 while the rest 3738 firm year observations take the the value of 0. The total number of my independent variables is between 20-27 (measuring firm characteristics-panel data).

Thus, my question is whether you beleive that I face rare events problem with my logistic regression and how I can tackle it taking into account that I am ussing the SPSS program because it is the easier for me since I do noat have an econometrics background.

Thanks you in advance.

Regards,

Kostas

Reply
1. Paul Allison says:
  
  May 25, 2018 at 12:34 pm
  
  You may have a problem. For a possible solution using SPSS, see https://github.com/IBMPredictiveAnalytics/STATS_FIRTHLOG
  
  Reply
Jean-Luc Fanny says:

March 3, 2018 at 1:09 am

Hello Professor Allison.
I have a serious conflict at work.
I am dealing with a population of about 8,500 members. I work for an insurance company.
The outcome is readmission =1 and non-readmission = 0, and the ratio is 12%/88% with having rare events or zero-inflated depending how you want to look at it.
My colleague is insisting on using Zero-inflated Poisson or Negative Binomial using R
I disagree because of we do not have any counting distribution in here to justify that modeling method. Although the events are rare for readmission, we still cannot use a zero-inflated method.
I suggested a penalized likelihood logistic regression or the Lasso or ridge.
Would you please advise me.
Thank you.

Reply
1. Paul Allison says:
  
  April 2, 2018 at 2:43 pm
  
  You’re right, you can’t use zero inflated. Seems like you’ve got over 1,000 events, so unless you’ve got a very large number of predictors, you should be fine with standard logistic regression.
  
  Reply
Karthi says:

February 23, 2018 at 11:51 pm

I have a dataset of 2904 samples – 11 predictors. One of the two classes (class 1) has only 108 samples. When I perform a logistic regression on this data, the accuracy rate is low. To add to that, the performance of the prediction is even worse. All the class 1 are predicted as class 0.
Do you think this is because of the skewed data set? Is logistic regression the right test to be conducted?

Reply
1. Paul Allison says:
  
  May 25, 2018 at 1:08 pm
  
  Logistic regression is fine in this case. But if the fraction of events is small and you use .5 as your cutoff for predicting a 1 or 0, it’s quite common that all the events are predicted as non-events. This gets into the whole issue of sensitivity vs. selectivity, which is beyond the scope of this reply. But, very simply, to get more sensitivity, you can lower your cutoff for predicting events.
  
  Reply
Annette says:

February 8, 2018 at 9:32 am

Dear Dr Paul,

Sample size: 8,100
Events: 10 (or 26 if less specific)

Can I present crude logistic regression models if the exposure is present in about 450 cases? I assume I can’t use multiple predictors or adjust for anything given these numbers?

Thank you!

Reply
1. Paul Allison says:
  
  May 25, 2018 at 1:09 pm
  
  I’d be OK with that.
  
  Reply
Thomas Chu says:

February 1, 2018 at 8:40 am

I have an universe of 130K with only 700 YES (0.53% response rate). The response rate is too low to develop a good model. When I developed a logistics regression model, the validation result was not good.

Is Firth Method able to handle this situation?? Any recommendations????

Reply
1. Paul Allison says:
  
  February 7, 2018 at 9:27 am
  
  With 700 events, you should be in pretty good shape to develop a decent model. You may not be using appropriate validation criteria. You could use the Firth method, but I doubt that it will make much difference in this situation.
  
  Reply
Clarie says:

January 17, 2018 at 5:24 pm

Dear Dr. Paul,

I am working on a project to predict rare events ( one to ten events) with a sample size 280, three predictors(one has 2 categories, one has 5 categories, and one is continuous, total 10 coefficients including the interaction of the two categorical covariates).Based on the conventional rule of thumb of at least 10 events per variables, there should be at least 30 (or 100?) events in order to perform the logistic regression analysis, right? So the regression analysis with Firth or Exact method will not be appropriate for this situation either, right? Any suggestions what kind of analysis to be used?
Thank you very much!
Clarie

Reply
1. Paul Allison says:
  
  February 7, 2018 at 9:24 am
  
  The rule of 10 events per coefficient may be too stringent in many situations. In any case, you can always do exact logistic regression (although it may not always work) even if that rule is not satisfied. And the Firth method can be useful too when you don’t meet that criterion.
  
  Reply
Rohaida says:

December 28, 2017 at 6:09 am

Hi Dr. Paul,

I’m working on unbalanced panel data with small sample size, 127 observations. I got 20 events for my dependent variable (dummy 1 and 0). What would you suggest? Can I used logit/probit regression?

Many thanks.

Reply
1. Paul Allison says:
  
  January 1, 2018 at 5:26 pm
  
  Yes, you can use logit or probit. But the number of predictors should not exceed 4. And you may want to use the Firth method or exact logistic regression.
  
  Reply
Kamal Kishore says:

December 22, 2017 at 1:11 pm

Dr Paul

Shall we go for multivariable logistic regression for a sample size of 25 with three predictor variables? There is a minimum of 5 events for one variable. Shall we restrict to Fisher Exact test?

Reply
1. Paul Allison says:
  
  January 1, 2018 at 5:28 pm
  
  I would try exact logistic regression.
  
  Reply
Rizwan says:

December 19, 2017 at 9:14 am

I have a data set of 16000 cases which I drew from a loan population of 198,000. Event (Default) rate was 1.3% in the population while 1.41% in the sample of 16,000; 312 cases. While I ran the Logistic regression for cutoff point from 0.1 to 0.01, the correct classification for good loans declined from 100% to 55% while default prediction increased from 1% to 87%.

Although the fit is becoming more appropriate for the event cases but the compromise on good loans is huge; more than the benefit of avoiding the good loans. At 0.01 cut-off, for an additional correct classification of 36 default loans, it wipes off 2,196 good loans. What do you suggest I should do?

Thanks

Reply
1. Paul Allison says:
  
  February 7, 2018 at 9:35 am
  
  Not much to be done about this. You have to decide which is more important to you: high sensitivity or high selectivity.
  
  Reply
Jin says:

December 2, 2017 at 4:22 pm

Hello, Dr. Allison

Thank you for your posting. As a political science PhD student, I always confront rare event problems (or small sample bias). I am still confused about when the rare events induce bias. My total number of binary (0 or 1) dependent variable is 1,332. Here, the number of observation 0 (nay vote) is 223, and 1 (yes vote) is 1,109. It seems dependent variable’s variation itself not problematic. However, my main binary independent binary independent variable is very small. Women legislator’s yes vote is 17, and nay vote is 10 compared to Men’s yes vote is 1,092 and nay vote is 213. In this case, can we call it rare event problem? Or, should we call small sample bias of independent variable? If the latter is correct, can I still apply firthlogit estimation?

Reply
Franzel says:

October 31, 2017 at 11:36 am

Hi,

Firstly, I am amazed about the help in this thread!

I have an unbalanced panel of ca 100 000 observations. Binary outcome, with around 450 events. I use fixed effects. A LPM gives reasonable results, obviously with negative predictions.
A logit with fixed effects (i.e. conditional logit) gives huge standard errors. Should I try to fix the fixed effect model with the firth method? STATA with fixed effects has not converged in first iteration since 7 hours. logistf does not perform better. Any other exploration possibilities?

Thank you a lot!

Reply
1. Paul Allison says:
  
  November 17, 2017 at 8:08 am
  
  Sounds like a problem with quasi-complete separation. The Firth method could be helpful but it doesn’t seem to be working for you. You need to look more closely at the source of the separation problem.
  
  Reply
Yang Cao says:

September 19, 2017 at 1:37 pm

Does the Firth method work in multinominal logistic regression?

Reply
1. Paul Allison says:
  
  September 19, 2017 at 2:25 pm
  
  In principle, yes, but there’s not a lot of software available. There’s an R package called pmlr that claims to do it, but I’ve never tried it.
  
  Reply
Adel says:

July 23, 2017 at 8:40 pm

Many Thanks for your Quick Reply.
Please find below:
1: 850 cases
2: 3000 case.
3: 11500 cases.
4: 1500,000 cases.

also the data is a PANEL data.

Many thanks.
Adel

Reply
1. Paul Allison says:
  
  July 25, 2017 at 10:27 am
  
  With 850 and 3000 cases in the “rare” categories, I think you can just go with conventional maximum likelihood. Again, it’s the NUMBER of cases that you have to worry about, not the percentage. But if you really want to do something better, there’s a package for R called pmlr that implements the Firth method for multinomial logit (https://cran.r-project.org/web/packages/pmlr/pmlr.pdf).
  
  Reply
Adel says:

July 19, 2017 at 8:49 pm

Dear Dr Paul,
I am dealing with a response variable with 4 alternative (so the dependent variable is y=1,2,3 or4) and my goal is to well predict the probability of p(y=1|x) and p(y=2|x).
The issue is that the alternative 1 and 2 are rare.
Given the nature of the dependent variable, I have to use a multinomial logit (MNL),
is there an easy way to implement MNL with rare events ?

Many thanks

Reply
1. Paul Allison says:
  
  July 20, 2017 at 9:59 am
  
  How many cases in categories 1 and 2? Are the four categories ordered or unordered?
  
  Reply
Tess Halonen says:

June 27, 2017 at 1:27 pm

I’m working with a bariatric surgeon and we want to predict the likelihood of leaks post surgery (0 = no leak, 1 = leak) on a sample of 1,070 patients. We have about 7 dichotomous predictors and want to do a logistic regression. Leaks are quite rare, about .013% in our sample, which means we need to use some correction. I’ve chosen Firth’s penalized likelihood test. My concern is that we have a predictor that is quite significant per confidence intervals but it too only occurs at about .012% in our sample. Do you know if Firth’s test or perhaps another test corrects for rare events on the predictor side?

Reply
1. Paul Allison says:
  
  July 5, 2017 at 2:53 pm
  
  So you’ve only got 14 events on the dependent variable and 13 on the predictor. I would be concerned about the accuracy of the Firth method in this situation. Better to go with exact logistic regression.
  
  Reply
Ivo van der Lans says:

May 8, 2017 at 8:18 am

Dear professor Allison,

Searching the web for comparisons between different programs for Firth penalized logistic regression, I hit upon your web page several times. Therefore I guess that this web page is an appropriate forum for the question that I have.

The issue that I am struggling with is that the R logistf function gives me p = 0.005 for the parameter that I am interested in, whereas SAS’ Firth option in PROC LOGISTIC and STATA’s firthlogit give me p = 0.055/p = 0.053.

My question to you is whether you have ever seen such a difference in results before, and, if so, whether you have any idea where it comes from.

Your thoughts will be very much appreciated.

Kind regards,
Ivo van der Lans, Wageningen University

P.S.: Exact logistic regression gave p = 0.012 in the packages in which it didn’t give memory problems (R logistiX and SAS PROC LOGISTIC)

Reply
1. Paul Allison says:
  
  May 8, 2017 at 8:27 am
  
  When using the Firth method, it’s essential to use p-values and/or confidence intervals that are based on the profile likelihood method. Wald p-values can be very inaccurate. Maybe this is the source of your discrepancy.
  
  Reply
  1. Ivo van der Lans says:
    
    May 8, 2017 at 9:32 am
    
    Many thanks for your quick reply. This indeed seems to be the source. After some checks my conclusion is that:
    
    – R logistf by default gives p-values (for my parameter <0.05) and CI's (for my parameter not including 0) based on the profile likelihood method. I could easily obtain the Wald CI's (which does include 0) for comparison.
    – SAS PROC LOGISTIC gives p-values based on Wald. I managed to get both profile-likelihood and Wald CI's for comparison.
    – STATA FIRTHLOGIT by default gives Wald-based p-values and CI's based. So far, I didn't find a way to get the profile-likelihood CI's for comparison.
    
    So, I'll use the R logistf results.
    
    Thanks again,
    Ivo van der Lans
    
    Reply
Nehal says:

April 22, 2017 at 5:56 pm

Hi Dr. Allison
I’m working on my graduation project and i’m in a bad need for your help.
My response variable is binary (0: Youth are mentally healthy & 1: youth are mentally unhealthy) and the explanatory variables 10-15, almost all of them are categorical except 2 or 3 variables continuous.
The category of interest is youth are mentally unhealthy and they are 281 out of 4153 sample size which represents 6.77% of the sample.
I’m applying logistic regression model.
So i would like to ask:

1) whether this will give biased results due to this low percentage or not? and if yes how can i solve it or which model can be applied instead?

Reply
1. Paul Allison says:
  
  April 24, 2017 at 4:31 pm
  
  I think you’re OK.
  
  Reply
  1. Nehal says:
    
    April 29, 2017 at 1:33 pm
    
    Thank you so much Dr. Allison. But what if the whole model is significant while almost 90% or more of the variables is insignificant.
    
    Do you have any explanation for this issue?
    
    Thank you so much.
    
    Reply
    1. Paul Allison says:
      
      May 8, 2017 at 8:28 am
      
      This can happen if one of the variables has a very strong effect.
      
      Reply
Laura says:

March 31, 2017 at 3:13 am

Hello Dr. Allison,

I have a sample that will likely have approximately 70 participants D+ (with disease) and 1630 D- (without disease). I plan to use a multivariate logistic regression model (is proc genmod with a logit link or proc logisitc better for this use? Based upon what I’ve read they yield the same results but use different methods of parameterization to obtain the results) with age (continuous) and approximately 5-7 additional dichotomous variables with some interactions (3 at most). Do you think my sample size is adequate to perform this analysis? If not, would the sample size be sufficient if I removed the interaction terms?

Thank you.

Reply
1. Paul Allison says:
  
  April 3, 2017 at 5:04 pm
  
  PROC LOGISTIC and PROC GENMOD are both fine for basic logistic regression. But LOGISTIC can also do exact logistic regression and penalized likelihood (Firth). Your sample is probably large enough. But I would also try exact and Firth to get more confidence in your results.
  
  Reply
Somia says:

March 2, 2017 at 10:42 pm

I am trying to estimate which demographic variables are associated with smoking and alcohol drinking. The prevalence of smoking and alcohol drinking in the study sample (cross sectional study) are 15% and 2%, respectively. The sample size is 4, 900. I am using modified Poisson regression with robust stand errors to estimate the prevalence ratio. Based on the literature modified Poisson regression is recommended if the prevalence of the outcome >10%. Is this model still appropriate for to estimate the prevalence ratio to identify which demographic variables are associated with alcohol drinking, where the prevalence is 2%?
Appreciate your advice.

Reply
1. Paul Allison says:
  
  March 3, 2017 at 9:02 am
  
  Should be OK. But with only 98 alcohol cases, I’d limit the number of predictors, say, to 15 at most. Another (possibly better) way to estimate a prevalence ratio (risk ratio) model is to specify a binomial outcome with a log link in generalized linear modeling software.
  
  Reply
Bill says:

February 13, 2017 at 12:34 am

Dear Dr. Allison,

Thank you so much for your responses to these questions. I’ve found your courses and this blog to be extremely helpful. I have a slightly different data situation. I am using xtnbreg in STATA to analyze about 40 separate groups over 11 years (so roughly 440 group-year observations). The mean value for my DV is around 1.5. I am wondering about the following:

1) Do any of the characteristics of my dataset (number of groups(clusters), total number of observations, total number of “events” reflected in my DV) raise any concerns regarding inaccurate P-values?
2) Is there a rule of thumb regarding the maximum number of Independent Variables I can include in my models? Would it be based on number of groups, observations, total events reflected in DV, or some combination)?
3) Are there any references that I can cite to justify my data and analysis regarding sample size and number of Independent Variables?

Thank you very much.

Reply
1. Paul Allison says:
  
  February 14, 2017 at 1:02 pm
  
  First of all, I strongly discourage the use of xtnbreg, either for fixed or random effects models. If you want to do random effects, use menbreg. For fixed effects, use dummy variables in nbreg.
  There’s nothing in what you told me that raises concerns about p-values. I don’t have any rule of thumb for predictor variables in this situation.
  
  Reply
Ryan says:

February 9, 2017 at 11:11 pm

Hello, Dr. Allison – Can multiple imputation procedures be used with firth logit or exact logistic regression methods? If so, can this be done in stata or another software?

Reply
1. Paul Allison says:
  
  February 10, 2017 at 10:59 am
  
  In principle, yes. However, the Stata commands for these methods, exlogistic and firthlogit (a user-written command), are not supported by the mi command. I can get firthlogit to work by using “mi estimate, cmdok:”. The cmdok option is short for “command OK”. This doesn’t seem to work for exlogistic, however. I’m sure that I could get these methods to work in SAS, but haven’t tried it yet.
  
  Reply
Riaz Ahmed says:

October 12, 2016 at 7:12 am

Dear Dr. Allison,

Thanks for your wonderful recommendations and input you are continuously putting in the discussion.
I have 1.4 million household-level observations at district level (116 districts) which come from six waves (6 years survey) of a nationally representative population. My rare event is whether an individual got divorced (divorced=1) out of all married individuals. Divorced individuals are just 0.56% of total population and in some districts there were not happened any divorce. My variable of interest is whether a disaster (a dummy = 1 if the flood affected the district, 0 otherwise) can affect a marital life. Three surveys before and three after the floods, were conducted. I am running a type of quasi-natural experiment. It is a type of DID research design in which I am using district-fixed, year-fixed effects and an interaction of flooded-districts x post-disaster (my variable of interest) in the model and errors are clustered at district level. Model is estimated by ordinary logit estimators. The number of regressors in full sample is about 140. By doing that my Wald chi2 with p-value is missing. What type of model could I use for this data set? And could I use survey weights? Is there any way I could use survey weights by specifying strata, and psu in stata?
Appreciate your suggestions.

Reply
1. Paul Allison says:
  
  November 7, 2016 at 9:39 am
  
  You probably have non-convergent coefficients for some of your variables. 140 regressors is a lot in this kind of situation. You might try the user command firthlogit. But then you can’t use svyset to handle strata and psue.
  
  Reply
Kate Tkacova says:

July 13, 2016 at 7:22 pm

Could the problem with the biased estimates be solved by using sequential logit or selection model?

Reply
1. Paul Allison says:
  
  July 18, 2016 at 4:36 pm
  
  I don’t think so.
  
  Reply
Yingzhou says:

July 7, 2016 at 11:30 am

Hi Dr. Allison,

I am doing a simulation study with 400 obs + 4 cases per trt. I tried firth, but the size of test is significantly below 5%. Do you have any suggesion for this extreme situation?

Thanks

Reply
Leo says:

July 6, 2016 at 12:38 pm

Hi Dr. Allison,

I have a sample size of 1940 and 81 events. I have a large number of predictor variables (between 9-12) that I want to put into my model. I have reduced the number to 9. I think I am at the sample where I could use either firth or exact. Do you agree or would you tend towards using one of them?

Thank you

Reply
MZH says:

July 1, 2016 at 12:09 pm

Prof Allison, Greetings: by reading through earlier posts, it seems that exact logistic and firth’s modification may enable accurate estimation even when we have fewer than 10 events per parameter. Is that a correct understanding of your position?

Further,

(i) exact logistic would not entail any limit on events per parameter.
(ii) With Firth, it seems that you have suggested “5 to 10 events”.

On to specifics: I have a data set with ~1000 observations but ~100 events. However, the number of parameters I need to estimate is large: ~30 as we have a few categorical variables with many categories. Also, some of the independent variables are continuous.

We have a decent sized machine (w/ 128 GB of memory and lots of processing power) so I could try exact logistic …

But if exact logistic does not work, is there anything that can be done with Firth’s penalization to bring down the “events per parameters” so I can estimate about 30 parameters on 100 event?

Thank you for putting out this post and subsequent Q&A … it is a very useful resource.

Reply
1. Paul Allison says:
  
  August 5, 2016 at 2:53 pm
  
  I’m guessing that you will have difficulty applying exact logistic regression to a sample of this size. As for Firth, you could try bootstrapping to get standard errors and/or confidence intervals that will be good approximations in this case. But you may still run into convergence problems, and you may have low power to test hypotheses.
  
  Reply
Nadia says:

June 24, 2016 at 6:02 pm

Hi Dr. Allison,

I have a sample size of 1200 observations and only 40 events. From what you said , I decided to use firth method; However I am not sure if this method works in the case of categorical variables (I have a binary response variable and 10 categorical independent variable).

What is your suggestion?

Reply
1. Paul Allison says:
  
  July 4, 2016 at 7:44 pm
  
  Yes, the Firth method is appropriate for categorical predictors.
  
  Reply
  1. Sehba says:
    
    July 10, 2016 at 7:30 pm
    
    Sir,
    Can we repeat the same data set multiple times for a binary logistic regression to overcome the problem of few events ? Right now I have 12 predictor variables, 27 “no’ events and total 76 entries in my data. My analysis results are pretty absurd but when i copy paste the whole data set 5-6 times, they give reasonable results .Thanks.
    
    Reply
    1. Paul Allison says:
      
      July 18, 2016 at 4:37 pm
      
      This is not a valid solution.
      
      Reply
      1. wangting says:
        
        November 20, 2017 at 9:28 pm
        
        Dear Paul Allison
        Thank you in advance for all the the valuable information you had provided in this post.However, you don’t seem to approve of sampling methods for overcoming the rare problem or the unbalanced data set,please tell me why.
        
        As far as I know, undersampling and oversampling are commonly used methods for imbalanced samples.
        
        Thanks for you help.
      2. Paul Allison says:
        
        November 21, 2017 at 2:15 pm
        
        As I explained in my post, what matters is not the rarity of events, but rather how many you’ve got in your sample. If you have a sample of 50 events and 10,000 non-events, the only benefit in sampling down to 50 non-events would be a tiny reduction in computation time. And there would be a cost: larger standard errors.
Valerie says:

June 17, 2016 at 10:11 am

Dear Dr. Allison,
Maybe my question is quite naive but I wonder whether the same rule of thumb you mentioned in your post applies to weighted survey data, where each individual case weighs more than one person to represent the whole population. The relationship between the actual number of cases and the personal sampling weights involved in statistical tests is unclear to me since I’m a beginner in the area.
Also, are there options to deal with low number of events using proc surveylogistic?

Reply
1. Paul Allison says:
  
  June 17, 2016 at 2:20 pm
  
  PROC SURVEYLOGISTIC does not have an option for dealing with low numbers of events. The rule of thumb would also apply to weighted data.
  
  Reply
Shalaw says:

June 17, 2016 at 9:14 am

I am going to analyze a situation where there are 300 non-injury and only 17 injury… four categorical variables are significant according to Chi-squire, then I used Multiple logistic regression for significant variables. Three of them are significant again. does it make any sense? I would like to know whether can I use Multiple logistic regression because only 17 respondent had injured from 317 of the respondents.

Reply
1. Shalaw says:
  
  June 17, 2016 at 11:27 am
  
  I used SPSS to analysis data.
  
  Reply
  1. Paul Allison says:
    
    June 17, 2016 at 2:24 pm
    
    I don’t know what options are available in SPSS.
    
    Reply
2. Paul Allison says:
  
  June 17, 2016 at 2:22 pm
  
  With such a small number of events, I recommend using either the Firth method or exact logistic regression.
  
  Reply
Mwiza Gideon Singini says:

June 16, 2016 at 3:17 pm

Dear Professor Allison.
I am trying to analyse Fistula in Zambia. They are 98 cases reported fistula from a sample of 16148 women. The dependent variable has a biranry outcome. I would like to use a logistic regression for the analysis. Will it be wrong if i use a binary logistic regression for my analysis.
Thanks

Reply
1. Paul Allison says:
  
  June 16, 2016 at 4:29 pm
  
  Binary logistic regression would certainly be appropriate.
  
  Reply
Tim Müller says:

June 16, 2016 at 12:50 pm

Dear Dr. Allison,

I am wondering whether or not there is any way to combine a rare events logistic regression with cluster robust SEs. This would be vital for my research design and any help would be highly appreciated!
The -compared to ReLogit- more recent STATA command “firthlogit” does not allow for cluster robust SEs, which is why I am hoping that there is another way.

Kind regards
Tim

Reply
1. Paul Allison says:
  
  June 16, 2016 at 4:29 pm
  
  Sorry but I don’t know any way to do this in Stata.
  
  Reply
Chang says:

June 14, 2016 at 3:41 am

Dear Professor Allison,

You noted that “The problem is not specifically the rarity of events, but rather the possibility of a small number of cases on the rarer of the two outcomes. If you have a sample size of 1000 but only 20 events, you have a problem. If you have a sample size of 10,000 with 200 events, you may be OK. If your sample has 100,000 cases with 2000 events, you’re golden” above. Is there is any reference close to your explnation, which I can cite?

I have about 100,000 observations with 1100 events. Based on your explanation, it might be okay, although it is not golden. I am trying to estimate my models by using firthlogit, but it is extremely slow.. and I am not sure whether it can estimate my models.

Reply
1. Paul Allison says:
  
  June 14, 2016 at 7:25 am
  
  Firth logit shouldn’t be necessary in your case, unless you have one or more categorical predictors that are also very unbalanced.
  
  Reply
Marvin says:

June 7, 2016 at 5:53 pm

Dr. Allison,

Thank you in advance for all the the valuable information you had provided in this post. Just to be sure. I have a sample of 32,740 observations with 271 events. I have 12 variables but some variables have multiple categories. So let’s say I have 16 coefficients (I red a reply from you saying that what matters are the coefficients). Can I use regular logistic regression or should I use alternative methods such as Firth, penalized or exact logistic regression?

I would greatly appreciate any help.

Best,
Marvin

Reply
1. Paul Allison says:
  
  June 9, 2016 at 2:35 pm
  
  You should probably be OK. But if the Firth method is available, there’s little downside to using it.
  
  Reply
Ashby says:

May 24, 2016 at 6:14 pm

Dr. Allison
You may have covered this above so apologies in advance if so. If the response rate (i.e. dying, winning, etc.) is 10% in a sample 10,000, don’t the p-hats (scores from the logistic regression model is SAS for instance) need to be interpreted to that? So if a new observation is scored and has a value of 0.20 then that observation is twice as likely to have the response than the average observation. Thus, it should not be interpreted as the observation has a 20% probability of having the response. So often, results are described as the latter.

Reply
1. Paul Allison says:
  
  May 26, 2016 at 4:44 pm
  
  Yes and no. If we are correct in assuming that everybody’s probability is generated from the same logistic regression model, then then a predicted probability of .20 can be interpreted as a 20% probability of having the event. But suppose there is unobserved heterogeneity. That is, there are omitted predictors (independent of the included predictors) or variability in the coefficients. Then the predicted probabilities will be “shrunk” toward the overall mean in the population.
  
  Reply
Sara says:

May 22, 2016 at 8:37 pm

Hi Dr. Allison,
I’ve got a data set with 2050 observations and a variable “risk”. This variable has two values “Low Risk” and “High Risk”. Number of having “High Risk” is 340 (so about 17%) and having “Low Risk” is 1700 (about 83%). I’ve applied logistic regression by using glm to model “risk” and I found that the model predicts the “Low Risk” cases with a very good accuracy however the prediction of the “High Risk” cases is only about 50%. I then applied brglm in R which does maximum penalized likelihood estimation.
However, the miss-classification is again 50% for “High Risk” case. Since the number of occurrence of “High Risk” is a lot less than “Low Risk”, I though that there is bias in my data set and using the penalized likelihood estimation would help but there was no success. I’m wondering if there is another method can be used to deal with this issue.
Many Thanks

Reply
1. Paul Allison says:
  
  May 23, 2016 at 11:20 am
  
  The problem you describe is endemic to predicting rare events, and cannot be solved by simply changing estimation methods. If you’re using .5 as your cutoff for predicting an event vs. a non-event, you’re always going to get a much higher percentage correct for the non-events (“specificity”) than for the events (“sensitivity”). By lowering the cutoff, you can increase sensitivity but that may greatly reduce specificity.
  
  Reply
  1. Sara says:
    
    May 23, 2016 at 7:03 pm
    
    Thanks Paul for your helpful reply. I’ve changed p threshold to .2 and the confusion matrix looks a lot better but my problem is how to interpret this. Does this change approve that I can trust my model at the end?
    
    Reply
    1. Paul Allison says:
      
      May 23, 2016 at 9:02 pm
      
      No it does not. Read my earlier blog posts on R-squared and goodness of fit in logistic regression.
      
      Reply
Nadia says:

May 5, 2016 at 2:43 pm

Hi Dr. Allison,

I have a dataset of 1241 observations and only 39 cases. I have a binary response variable as well as 12 predictor variables. I am wondering which logistic regression method is suitable for my data(exact, firth, rare-event,..???) and which software I have to use to find a good fit for my data?

Reply
1. Paul Allison says:
  
  May 5, 2016 at 2:54 pm
  
  Hard to say for sure. But with only 39 events and 12 predictors, you certainly don’t meet standard recommendations. I would try both Firth and exact, although I’d put more confidence in exact. If they give similar results, that’s reassuring.
  
  Reply
  1. Nadia says:
    
    May 9, 2016 at 4:39 pm
    
    thanks for your response.
    I read some where that exact logistic regression only works when N is very small. usually less than 200;however in my problem N is 1241 which is much bigger than 200.
    
    Reply
    1. Paul Allison says:
      
      May 9, 2016 at 6:44 pm
      
      Yes, exact logistic regression can be very computationally intensive. But a lot depends on the number of cases in the smaller category. With only 39 events, it might be doable. Try it and see. You can also reduce the computation by requesting coefficients for only the predictor(s) of interest while treating the others as controls.
      
      Reply
ozge says:

April 26, 2016 at 3:29 am

Dear Paul,
I have a dataset of 10000 observations. In this data, I have 5000 positive (ownership of a product) and 5000 negative cases. And I have 5 independent variables.
2 of them are categorical having 82 and 6 unique values respectively. Rest of them are numeric having 1155, 2642 and 1212 unique values.
I am planning to use binary logistic regression as the dependent variable can only take 0 or 1. As the training set I will use %66 of this data and rest as the test set.
Is this data set proper for logit? (I can play with the number of observations or positive/negative cases.)
If it is, to evaluate this model can I use TPR? Can I say that, if TPR is close to %50, this model works well?
Thanks in advance

Reply
1. Paul Allison says:
  
  April 26, 2016 at 1:16 pm
  
  Logit should be fine, although 82 values is a lot for your categorical variable. You might want to try to collapse it in meaningful ways. TPR (true positive rate) is the same as “sensitivity”. By itself, I don’t think this is sufficient to evaluate the model. You also need to pay attention to specificity (i.e., the true negative rate) and the relationship between these two. An ROC curve is helpful for this, and the area under the curve is good summary. For any measure of predictive power, there’s no cutoff for when a model can be said to “work well.” All depends on your objectives.
  
  Reply
  1. ozge says:
    
    April 27, 2016 at 2:34 am
    
    Thank you for your answer, I have one last question. While running the tests for this model, I plan to do percentage split. %66 for training and %34 for testing. How many times should I run the test? Will doing only once be enough? Or should I run the model k times and in each time select %66-%34 of the universe randomly?
    
    Reply
Zhaoxue says:

March 28, 2016 at 2:44 am

Dear prof. Paul,
Thank you very much for your good post and other comments, I have one question,
you said that ‘what matters is the number of events, not the proportion’ ,do you know any reference, which I can have a look or cite in my paper? because recently my manuscript was declined by one journal , in this muanscript,the total sample is 2984, and the event is 156, and 17 variables were included in the model. one reviewer said that ‘in general, it is doubtfull if logistic regression is appropriate to perform in the group of younger adults, as the prevalence of the outcome <10%, resulting in flawed OR's?' so I am happy the idea that 'what matters is the number of the events', now I need one reference as a support? and can you give some suggestion to reply the reviewer? Thank you very much, best, Zhaoxue

Reply
1. Paul Allison says:
  
  March 28, 2016 at 9:52 am
  
  Try this one:
  Vittinghoff, Eric, and Charles E. McCulloch. “Relaxing the rule of ten events per variable in logistic and Cox regression.” American journal of epidemiology 165.6 (2007): 710-718.
  
  Reply
Claire says:

March 16, 2016 at 9:08 am

Hi Paul,

I have a sample size of 513 and 4 out of the 5 predictor variables I am using have over 100 events. However, there is one which only has 11.

Would you recommend that I utilise the Firth method of logistic regression for the adjusted model?

Many thanks,

Claire

Reply
1. Paul Allison says:
  
  March 16, 2016 at 1:22 pm
  
  Not sure what you mean by predictor variables having events. I’m guessing that your predictors are binary and that the less frequent category has at least 100 for 4 of the variables but only 11 for the 5th. You’re probably OK, but there’s little harm in using the Firth method to be more confident. Keep in mind, however, that statistical inference with the Firth method should be based on likelihood ratio tests and profile likelihood confidence intervals.
  
  Reply
Akis says:

March 16, 2016 at 4:51 am

Nice discussion. I have a sample size of 90000 cases but only 17 events. Is it OK to use the rare events logistic regression? I plan to make some trials. In each trial I take all events plus 170 random non-events (ratio 10:1).

Reply
1. Akis says:
  
  March 16, 2016 at 4:51 am
  
  *with 2 predictors only
  
  Reply
2. Paul Allison says:
  
  March 16, 2016 at 1:25 pm
  
  Not sure what you mean by rare events logistic regression. In your case, I’d probably go with exact logistic regression. I don’t see much value in your “trials” however. Why not just use all 90,000? Given the small number of events, I think exact logistic regression could handle this. If not, use the Firth method.
  
  Reply
  1. Akis says:
    
    March 16, 2016 at 3:00 pm
    
    Thanks for your reply! We probably mean the same thing (the method proposed by King and Zeng, 2001 for rare events). King and Zeng propose to perform stratified sampling, where the sample will include all events plus a number of non-events with ratio 10:1.By this approach a research saves much effort in data collection, as it is not required to collect all data but only 10x the events.
    
    Reply
    1. Paul Allison says:
      
      March 18, 2016 at 12:40 pm
      
      The method of King and Zeng is similar to that of Firth. The method of stratified sampling can be very helpful in reducing computational demands, but it does nothing to reduce the problems of rare events. And, necessarily, there is some loss of information. So my advice is, if you can tolerate the computational burden, use the whole sample with the Firth method to reduce bias.
      
      Reply
Jackie says:

February 29, 2016 at 11:56 am

By the way, I was also going to ask whether there is a way of calculating pseudo R-squared when we use Firth’s correction? When we use glm as logistic regression command in r, there are some packages to install for pseudo R-squared. However, I could not find any package doing the same job when we employ Firth’s correction as an estimation model.

Reply
1. Paul Allison says:
  
  March 2, 2016 at 7:48 am
  
  The question is, is it appropriate to use McFadden’s R-squared or the Cox-Snell R-squared based on the penalized likelihood? I’m not sure. Both could be easily calculated even though they’re not built in to standard Firth packages. I’d probably go with Tjur’s R-squared which is also easy to calculate based on predicted probabilities, but doesn’t depend on a likelihood (standard or penalized).
  
  Reply
Jackie says:

February 29, 2016 at 6:12 am

Dear Prof. Allison,
My dataset consists of 84 observations including 36 events. This allows me to include 3 (minimum) to 7 (maximum) independent variables in the estimations. My base model has 3 control variables any of which cannot be excluded due to theoretical arguments. I have to test 4 different hypotheses by using 4 different independent variables. My original idea was to run a hierarchical logistic regression (using Firth’s correction package in R) as it makes possible to see how model fit and coefficients change as each explanatory variable is added to the equation. However, I do not know if adding a new variable in each model will cause a distortion in the estimation of true parameter values (I am not a statistician). Would it be acceptable to test each hypothesis separately by including 3 controls and 1 independent variable in each model and to include all 7 variables in the full model at the end?
I am looking forward to hear your suggestion. Thank you very much in advance.

Reply
Nicolas says:

February 16, 2016 at 5:57 am

Dear Prof Allison,
I have a dataset with 166 observations and 55 events.
What is the variable limit for inclusion in my model?
Thank you in advance for your answer.
Regards,
Nicolas

Reply
1. Paul Allison says:
  
  February 16, 2016 at 10:56 am
  
  The conventional rule of thumb is at least 10 events per variables, implying that you could have 5 or 6 predictors. But some literature suggests that you could go as low as 5 per variable, yielding 10 predictors. It depends in part on the distributions of the predictors within the event group and the non-event group,
  
  Reply
Charline says:

February 11, 2016 at 12:14 pm

Dear Paul Allison,

I’m trying to apply a Firth method to a (individually) matched case-control study but the SAS system did not allow to combine a STRATA statement with FIRTH option. From the SAS documentation, I found a syntax as follow (with “pair” indicating the strata) :
PROC LOGISTIC data=…;
CLASS pair X1;
MODEL Y=pair X1 X2 /firth;
RUN;
I’m wonderring how is it correct to do that and if this reduce the small sample bias ?
I also tried the SAS macro CFL developped by Heinze but without result.

Thank you for you advices.

Reply
1. Paul Allison says:
  
  February 11, 2016 at 1:08 pm
  
  This is not correct and will yield upwardly biased parameter estimates. Based on the documentation for CFL, it ought to deal with your situation. But I haven’t tried it myself. Another option is to do exact logistic regression, which should work with stratification.
  
  Reply
Matthew says:

February 3, 2016 at 6:14 pm

For discrete time hazard logistic models, how would one calculate the percentage of events? I assume that you would take only the terminal events, but some have suggested that I should include all of the intermediate censored events. Wouldn’t I want to calculate the percentage as if it were cross-sectional data, as opposed to panel data?

Reply
1. Paul Allison says:
  
  February 8, 2016 at 8:32 am
  
  I would primarily want to know the percentage of INDIVIDUALS who have events. But I might also be interested in the percentage of discrete time units that have events.
  
  Reply
Kevin says:

January 28, 2016 at 12:31 pm

Dr. Allison,

I’m running a logistic regression on web events data. I have a good number of successes (at least a thousand), though the rate is abysmally low due to potentially millions of failures.
I have around 30 predictors. Being an observational study, these predictors are unbalanced, and the exposures could range from the hundreds to the millions.
My question is, for some of the predictors, the number of successes associated with them are extremely low. For instance Pred 5 could have been exposed to 50 samples out of which 2 were successes. I know that the judgment of rare events pertains to the overall data set and not to individual variables, but I can’t help thinking that variables like Pred 5 are potentially very unstable. If just 1 case had been wrongly coded and the successes became 1 instead of 2, I’d imagine the coefficient could turn out vastly different. Is there any merit in judging the number of successes per predictor as well?
Thanks.

Reply
1. Paul Allison says:
  
  February 8, 2016 at 8:38 am
  
  Yes, that’s definitely relevant. It will be reflected in high standard errors for the coefficients of such predictors, and could possibly lead to quasi-complete separation, in which the coefficient doesn’t converge at all.
  
  Reply
Sebastian says:

January 26, 2016 at 3:32 am

Dear Professor Allison,
I have a hierarchical dataset consisting of three levels (N1=146,000; N2=402; N3=16). The dependent variable has 600 events. In my models I use a maximum of 12 predictors. I wonder whether your EPV rule of thumb also applies to a multilevel setting because up to now, following your rule, I apply a simple multilevel logistic regression.
If not, are there any possibilities to correct for rare events in multilevel models (Firth-regression seems not to be available for multilevel logit, at least in Stata which I am using).

Reply
1. Paul Allison says:
  
  February 8, 2016 at 8:40 am
  
  I am also unaware of any software that does Firth logit for multi-level models. However, with 600 events and 12 predictors, you should be in reasonably good shape.
  
  Reply
Chughtai says:

January 11, 2016 at 12:16 am

Please consider reverse which is more logical

What if outcome in one arm is zero. E.g. rate of influenza after vaccine.
If vaccinated – 0% (0/100)
If not vaccinated – 15% (15/100)

How we can check significance with logistic regression?

Reply
1. Paul Allison says:
  
  January 25, 2016 at 10:49 am
  
  Well, the ML estimator does not exist in this situation. But if vaccination is the only predictor, a simple Pearson chi-square test for the 2 x 2 table should be fine. If you have other predictors, do exact logistic regression.
  
  Reply
Chughtai says:

January 11, 2016 at 12:14 am

What if outcome in one arm is zero. E.g. rate of influenza after vaccine.
If vaccinated – 15% (15/100)
If not vaccinated – 0% (0/100)

How we can check significance with logistic regression?

Reply
Jen says:

December 4, 2015 at 4:12 pm

Hello,

In my data, I have 10 events in a sample of 81. The design is a 2X2 factorial design. I am wondering if I should use firth or exact– both seem to give valid parameters but I wasn’t sure if the sample is too small for firth.

Thanks!

Reply
1. Paul Allison says:
  
  December 10, 2015 at 4:36 pm
  
  I’d probably go with exact.
  
  Reply
Georgios Nikolakaros says:

November 29, 2015 at 9:55 am

Dear Dr. Allison,

Thank you for the very useful article and subsequent posts. I have 1629 observations and a binary outcome variable. There are two predictor variables, each has three values. One of the subgroups has no observations. I run a conventional LR model with both predictors and their interaction (SAS, proc logistic). The model converges and I get p-values for all effects and p-values/CIs for ML estimates. My question is whether I can trust the p-value for the interaction term (this is the only thing I need from this model). I can use exact LR for subgroup analyses, but I cannot use exact LR for the model with all 1629 observations because of computational constrains. I understand that I could use Firth LR, but I have another model with multinomial LR with the same data and SAS does not have Firth for polynomial LR. R does, but if I can do everything with SAS that would be more convenient.
Thanking you very much in advance,

Reply
1. Paul Allison says:
  
  December 10, 2015 at 4:35 pm
  
  Depending on the split on your dependent variable, you’re probably OK. To be more confident, examine the frequency counts in the 3 x 3 x 2 table. If none of them is very small, you’re probably in good shape. Also compare results between Firth and ordinary logit in the binary case. If they are very similar, that’s reassuring for the multinomial case.
  
  Reply
  1. Georgios Nikolakaros says:
    
    January 9, 2016 at 7:08 am
    
    Thank you very much for the advice. I am sorry I was not clear enough with my question. One of the subgroups has no observations, and this is my concern. I only need the p-value of the interaction term, and I need to know if this is valid in a situation where one subgroup has no observations. If the p-value of the interaction term is valid and small enough, one can conclude that there is a significant statistical interaction which justifies subgroup analyses. These subgroups of the data are then small enough for exact LR to be used. And exact LR can give valid estimates for groups with no observations.
    
    Thanking you in advance for your attention,
    
    Georgios
    
    Reply
    1. Paul Allison says:
      
      January 25, 2016 at 10:42 am
      
      Sorry, but I’m still not sure what you mean when you say that “one of the subgroups has no observations.” What subgroups are talking about? The cells defined by the 3 x 3 table of the predictor variables? I need more detail.
      
      Reply
Asad Ali says:

November 28, 2015 at 3:52 pm

Hello Paul,

i am using a sample of 24000 observation in my data where my dependent variable (dichotomous) have only 155 treated events and rest non treated. i have used simple logistic regression but am criticized on using such a large sample of untreated observations compared to treated ones. can u suggest which method should i use here to avoid such criticism

Reply
1. Paul Allison says:
  
  December 10, 2015 at 4:28 pm
  
  Not sure what you mean by “treated events.” I need a little more detail on this study.
  
  Reply
  1. Asad Ali says:
    
    December 16, 2015 at 1:28 am
    
    My dependent variable is financial fraud. which is dummy variable including 155 observations having value of 1 and rest of approx. 24000 observations in my sample are 0. Now my question is that which method do you think will be feasible to apply in this study. I have applied simple logistic regression and firth logit and my results are significant with both the methods.
    
    Reply
    1. Paul Allison says:
      
      December 16, 2015 at 5:12 pm
      
      Unless you have a lot of predictor variables, you’re probably OK with either method.
      
      Reply
Valeria says:

November 19, 2015 at 5:53 am

Dear Dr. Allison,

thank you very much for your helpful comments. I ask if you could please provide further clarifications.

I am running a regression on the determinants of school attendance. My dependent variable is “1=individual attends school”.
The total number of observations is 420 and there are only 24 individuals not attending school.

I am trying using firthlogit. I ask if you could please clarify the following issues:
– would you suggest to limit the number of independent variables (now I am including about 10 independent variables, plus a number of fixed effects, i.e. dummy variables for age and province, so that in total I am including about 40 independent variables)
– is there a way to obtain marginal effects after the firthlogit command in Stata? I tried mfx after firthlogit, but this displays again the coefficients
– are p-values (and so significance level) from firthlogit reliable?

Thank you for your attention

Reply
1. Paul Allison says:
  
  November 19, 2015 at 8:19 pm
  
  I think you should have a lot fewer predictors.
  I don’t think there’s any built-in way to get marginal effects.
  firthlogit uses Wald tests to get p-values. These can be unreliable when there is separation or near-separation, in which case likelihood ratio tests are preferable. See the firthlogit help file for information on how to calculate these.
  
  Reply
Chris says:

November 18, 2015 at 7:53 pm

Great posts on this site! I’d like to model probabilities / proportions rather than dichotomous events. If the dependent measure is proportion of correct answers it seems that it would make sense to transform this measure into log-odds and then run a standard linear regression on this transformed variable. However there are also some cases where no or all correct answers were given and obviously the log-odds transformation doesn’t work for these cases. My thought was that I could adjust those extreme proportions slightly. For example if the maximum number of correct responses is 10 I could assign a proportion half way between 9 and 10 correct answers to those cases where all 10 correct answers were given (making the maximum adjusted proportion 0.95) and likewise assign a proportion correct of 0.05 to cases where no correct answer were given. This seems to be quite arbitrary though and there’s got to be a “standard” way to do this and/or some literature on different approaches, but I couldn’t really find much on this … any thoughts?

Thanks very much in advance!

Reply
1. Paul Allison says:
  
  November 19, 2015 at 8:26 pm
  
  Here are two non-arbitrary ways to solve the problem:
  1. Estimate a logistic regression using events-trials syntax. You can do this with PROC LOGISTIC in SAS or the glm command in Stata (using the family(binomial) option).
  2. Estimate the model p = [1/(1+exp(-bx))] + e using nonlinear least squares. You can do this with PROC NLIN in SAS or the nl command in Stata.
  
  Reply
  1. Chris says:
    
    November 21, 2015 at 9:56 am
    
    Thanks so much for the quick reply. Do you know if/how either one of these options can be implemented as a regularized model (ideally in python or R)?
    
    Reply
    1. Paul Allison says:
      
      November 21, 2015 at 10:14 am
      
      Sorry, but I don’t.
      
      Reply
Chloe says:

November 17, 2015 at 11:54 am

Hi Dr. Allison,

I am having a hard time determining what constitutes a rare number of events. Based on reading the comments here, it seems there is not set standard (e.g., N of 200 events), and that the proportion of event cases isn’t the deciding factor (e.g., more than 5% of sample).

Is it more a matter of whether your number of events exceeds the allowable number of desired predictors? For example, I am working on a project with 1528 cases, with 54 events. Given your 10 predictors to 1 event rule of thumb, would it be reasonable to run conventional logistic regression with fiver predictors? In contrast, if I wanted to use 10 predictors, would I then chose the exact logistic regression or Firth method? In other words, is it the number of predictors relative to the number of events that makes an event rare?

Reply
1. Paul Allison says:
  
  November 19, 2015 at 8:34 pm
  
  The question should not be “When are events rare?” but rather “Under what conditions does logistic regression have substantial small sample bias and/or inaccurate p-values?” The answer to the latter is when the NUMBER of events is small, especially relative to the number of predictors. You have the rule reversed. It should be 10 events per coefficient estimated. So, yes, with 10 predictors, I’d switch to Firth or exact logistic.
  
  Reply
Emmanuel E. says:

October 22, 2015 at 6:10 pm

Dear Allison.

Does the10 EPV rule also apply for ordinal logistic regression. Only get to see it discussed with a binary outcome example. Thanks.

Reply
1. Paul Allison says:
  
  November 19, 2015 at 5:05 pm
  
  Good question. I haven’t seen anything about this either. Extending this rule to the ordered case would suggest that for every coefficient there should be 10 cases in the least frequent category of the outcome variable. But that doesn’t seem right. Suppose you have three ordered categories with counts of 10/250/250. My “rule” would imply that you could only estimate one coefficient. In ordered logit, you are combining different ways of dichotomizing the dependent variable. In one dichotomization, you’ve got 10/500 and in the other it’s 260/250. Both contribute to the estimation of the coefficients. In this example, one dichotomization is problematic but the other is not. I think the 10 PV ought to be applied to the more balanced dichotomization, which would allow 25 coefficients. So my rather complicated proposed rule is this. Consider all possible ordered dichotomizations. Find the one that is most balanced. You can then estimate 1 coefficient for every 10 cases in the less frequent category of this dichotomization. But, of course, that’s just speculation. Someone should study this.
  
  Reply
Halvor Bjørntvedt says:

October 19, 2015 at 8:31 am

Dear professor Allison,

I very much appreciated this article, and the comment section is also very helpful. I work with a unbalanced longitudinal data set of 49 observations over 51 units of time, or 2499 total observations. At best there are only 22 events observed, at worst 17. My dependent variable actually have three categories, so the events are transitions between the categories. Is exact logistic regression going to work for a longitudinal data set, or do you recommend other methods?

Reply
1. Paul Allison says:
  
  November 19, 2015 at 8:43 pm
  
  The number of events in your data set is extremely small. I would recommend exact logistic regression, conditioning on the 49 observations (a kind of fixed effects model). But I would not be surprised if you run into computational difficulties. And the power may be too low to get good tests of your hypotheses.
  
  Reply
Swati says:

September 28, 2015 at 5:55 am

Is it possible to test individual parameters one by one. I am not clear about this.Should i consider only one or two predictors and leave other predictors.

Reply
1. Paul Allison says:
  
  October 2, 2015 at 10:57 am
  
  Yes, it’s possible to test individual parameters one by one. Your final model shouldn’t have more than two.
  
  Reply
  1. Swati says:
    
    October 5, 2015 at 6:49 am
    
    Dear Sir
    
    In prior conversation you said that i can’t take more than two predictors in the final model. I have 12 predictors in my study and sample size is 120 out of which the number of events are 7. How can i choose which of the two predictors are to be included in the final model out of these 12 predictors. How can i decided which of the two predictors to be included in the model.
    
    Reply
    1. Paul Allison says:
      
      October 5, 2015 at 12:55 pm
      
      You could do a forward-inclusion selection method.
      
      Reply
      1. Swati says:
        
        November 23, 2015 at 12:59 am
        
        Dear Sir
        
        I have two queries
        
        1) Is is fine to apply forward inclusion selection method using normal logistic regression for reducing the number of predictor in such type of data?
        2) Applying exact logistic regression will provide only the p values but how can check the fitness and r square value or should i only concentrate on checking the p values.
      2. Paul Allison says:
        
        November 23, 2015 at 8:18 am
        
        1. It could be helpful.
        2. Tjur’s R-square could be applied to exact logistic regression. I think that standard goodness-of-fit tests would be problematic.
Swati says:

September 19, 2015 at 2:19 am

I have sample of 120 groups with 7 events only. Is it possible to included 12 independent variables in the model. Some of the independent variables are categorical which is causing the problem of quasi complete separation. Which can i method to apply using SAS?

Reply
1. Paul Allison says:
  
  September 25, 2015 at 11:03 am
  
  With only 7 events, 12 predictors is way too many. I would use only one or two, and I would estimate the model using exact logistic regression. With so few events, firth method will not give trustworthy p-values.
  
  Reply
sam says:

September 15, 2015 at 9:33 am

Hi Dr. Allison,

Many thanks for this post. I have an analysis situation where, at best, I have 22 events out of 99 observations, and at worst (once the analysis is stratified by HIV status) 9 events out of 66 observations (and 13 events out of 33). In both cases, I would like to consider predictors that are also rarely observed, leading to quasi-complete and complete separation when considered in the same model as one another (if small cell sizes are not already present in the cross tabs). I am attempting to select the best multivariate model.

Typically, when small cell sizes are not an issue, my general protocol for MV model selection is to first run bivariate analyses, determining predictors that are eligible for a backwards elimination by using those that have a bivariate p-value of .25 or less as recommended by Hosmer and Lemishow, and to then run a backwards elimination by iteratively removing the predictor with the lowest p-value until all remaining predictors have a p-value of .05 or less to arrive at the final multivariate model.

Given my small sample size and rare number of observed events, how would you recommend model building should proceed? Do you advocate for forward selection? If so, do you have a recommendation on inclusion criteria possibly obtained during bivariate analyses? Judging from some of your comments above, it appears that you prefer the p-values obtained from exact logistic regression over those from using Firth’s penalized likelihood (and the coefficients from Firth over those from exact). Presumably it would be a bad idea to try to run a backwards elimination model with sparse data both because it would violate the number of events per predictor rule of thumb you mention as well as exact logistic would possibly never converge due to the inclusion of too many initial predictors.

I work primarily in SAS and have access to both PROC LOGISTIC with the Firth option and the user-written FL macro for performing Firth’s method for binary outcomes. I also have access to Stata and the user-written -firthlogit- command, though I prefer to work in SAS if possible.

I look forward to hearing your thoughts! Many thanks!

Reply
1. Paul Allison says:
  
  September 25, 2015 at 11:50 am
  
  I think you’d have to do a forward selection process, but I don’t have any specific recommendations on how to do it.
  
  Reply
Yalan Hu says:

September 11, 2015 at 8:55 pm

Dear Professor, I hope I didn’t waste your time to answer my previous questions. I found the answers for the two of quesions myself and found the two “weight” I mentioned are different.Now my only question left is why the proportion matters rather than number of events. I understand than small sample is not good and add number of non-events are not very helpful in terms of variance and bias, but still I cannot think through why proportion does not matter.

Reply
1. Yalan Hu says:
  
  September 11, 2015 at 8:58 pm
  
  Correction: why the number of events matters rather than proportion.
  
  Reply
  1. Paul Allison says:
    
    September 25, 2015 at 12:16 pm
    
    As King et al. note in their article, in a rare event situation the events contribute much more information to the likelihood than the non-events.
    
    Reply
Yalan Hu says:

September 10, 2015 at 10:13 pm

Dear Professor,I am in financial service industry and need to estimate the default rate, which might be very small (0.1%). I read your post which says the proportion doesn’t matter, only the counts of bad matters. However when I look at Gray King’s paper, I found the var(b)is proportion to (pi(1-pi)), pi is the porportion of bads (formular 6 in Gary’s paper)and there is no counts involved in. I have three specific questions: 1. you’ve mentioned MLE is suffer from small-sample bias. where can I find the reference. 2. Through which formular (or paper) I can know the number of bad matters rather than total numbers for the logistic regression. 3. Sometimes, when the number of bads is too small. people add weights to the bads. Gray also mentioned this in his paper (formular 9). In my eyes, this is equivalent to bootstrap the bads and reduece the bias, but you’ve mentioned in your other reply which says bootstrap will not be helpful. Bootstrap MIGHT be able to do is provide a more realistic assessment of the sampling distribution of your estimates than the usual asymptotic normal distribution.Where did I make mistake? Thank you in advance.

Reply
Talbot Katz says:

September 4, 2015 at 4:37 pm

Dear Professor Allison, thank you so much for your service to the analytics community. I’ve gotten great use from your SAS books on Logistic Regression and Survival Analysis. I’d like to revisit the idea of “zero-inflated logistic regression.” The aim of the model is to predict at inception of each record whether an event will occur during that record’s active lifetime. There is a sub-population of no longer active records, some of which had the event and some which didn’t; a record does not remain active after the event occurs, so it’s “one and done.” There is also a sub-population of still active records that have not had the event yet, but may in future; some of them have earlier inception dates than records that have had the event already. This may sound like a set-up for a “time to event” survival-type model, but let’s ignore that possibility for now (the current data has inception date, but not event date). Active records can’t just be considered non-events, right? So, is it appropriate to use a finite-mixture approach to model this? If so, is there a preferred way to implement it? Is there another approach you’d recommend? Thanks!

Reply
Swati says:

August 31, 2015 at 12:27 am

Hello Dr. Allison

I have sample of 120 groups with 7 events. There is no problem of separation. Can I use Firth method or should i go for Exact Logistic regression?

Reply
1. Paul Allison says:
  
  September 3, 2015 at 4:17 pm
  
  I’d probably go with exact logistic, especially for p-values.
  
  Reply
  1. Swati says:
    
    September 7, 2015 at 6:30 am
    
    I was reading a blog which states that you have to convert the data into collapsed dataset before applying exact logictic regression in elrm package. But there is no proper procedure mentioned to do that. Can you suggestion some method.I also want to know that is it possible to apply exact logistic if the independent variables are continuous and categorical with more than 3 or 4 categories
    
    Reply
    1. Paul Allison says:
      
      September 7, 2015 at 8:45 am
      
      Both SAS and Stata have exact logistic regression procedures that allow continuous variables and do not require that the data be collapsed in any way.
      
      Reply
      1. Swati says:
        
        September 8, 2015 at 2:30 am
        
        But I am using elrm package in r software for the analysis.Is it possible to include continous and categorical variable in elrm package
      2. Paul Allison says:
        
        September 8, 2015 at 8:49 am
        
        Sorry, but I know nothing about elrm.
Peter L says:

August 26, 2015 at 3:58 pm

Dear Professor, In database marketing we must conduct out-of-sample testwhen building predictive model. It requires setting aside a portion of the data then apply the model to it, and then compare the predicted results with the actual events. Given the methodology a rule of thumb is you need to have some ‘good’ number of event, say at least 1000, so that you can afford a cross-validation. One of my recent project can’t get me the quantity of event anywhere near that golden number, should I give up building the model? Thanks!

Reply
1. Paul Allison says:
  
  September 3, 2015 at 4:24 pm
  
  Hard to say. I’m usually not inclined to “give up”. But if you’re working for a company that insists on such cross-validation, it could certainly be a serious problem.
  
  Reply
Diego Jorrat says:

August 20, 2015 at 3:44 pm

Hi Dr Allison.
I’m estimating the effect of a police training on the likelihood of committing acts of use of force. I have data of 2900 police officers before and after treatment (monthly frequency), and the asignation to training is by alphabetical order of surname. Because the structure of the data, i am estimating a difference-in-difference model. It should be noted that the use of force are rare events (five on average per month and in the entire sample are 148 events). I estimated the ITT by OLS and Probit and gives me similar coefficients. Would you suggest me use another method, like the firth method?

Thank you

Reply
1. Paul Allison says:
  
  August 24, 2015 at 8:28 am
  
  Could you give some details on how you are estimating the DID model?
  
  Reply
GC says:

August 10, 2015 at 11:10 am

Dear Dr. Allison,

One query. I am looking at a data set with c. 1.4 million observations and c. 1000 events. One of the explanatory variables has many levels (over 40) and in some cases there are 0 positive events for certain factor levels. In this case would it be best to subset the dataset in to include only those factor levels with a certain number of events (i.e. at least 20 or similar – would leave 15-20 levels to be estimated)

Any comments would be much appreciated.
(PS superb resource above)

Reply
1. Paul Allison says:
  
  August 24, 2015 at 8:00 am
  
  If you try to estimate the model with the factor levels that have no events, the coefficients for those levels will not converge. However, the coefficients for the remaining levels are still OK, and they are exactly the same as if you had deleted the observations from factor levels with no events. A reasonable alternative is to use the Firth method, which will give you coefficients for the factor levels with no events.
  
  Reply
Meghan Grabill says:

July 8, 2015 at 10:54 am

Hello Dr. Allison,
Thank you for all the information contained in this article and especially the comments following. My dataset has about 75,000 observations (parcels) with about 1,000 events (abandoned properties). I plan to begin with 20 predictors and use the Penalized Method due to some of my predictor variables also being ‘rare’ (< 20 in some categories). My goal is to be able to use the model to predict future events of abandonment.
My major questions are about sampling. According to comments above, the full dataset should be used, so as to not lose good data but if I use stratified sampling to get the 50/50 split my coefficients will not be biased and my odds ratio will be unchanged. After trying both models using the full dataset and multiple 50/50 datasets (all 1s and a random sample of 0s) I get quite different results with the full dataset performing worse in all measures. Specifically in my AIC and SC. In the classification table With the full dataset I predict only about 10% of my abandoned and with the 50/50 I can predict about 90%. If I use the 50/50 model to try and predict future abandonment (with updated data) am I breaking principles of Logistic Regression? Thank you in advance for any insight.

Reply
1. Azzurra says:
  
  April 27, 2016 at 11:22 am
  
  Hi Meghan,
  I got the same issue and the same question. How did you deal with your analysis?
  
  Reply
Iuliia Shpak says:

July 7, 2015 at 3:34 am

Dear Dr. Allison,

Thank you so much for your article. I have a sample of 320 observations with 22 events. Is it suitable to proceed with the Conventional ML? Or would exact logistic regression be a better option? Do you know whether the rare event methods such as firth or exact logistic regression can be implemented in eViews? Thank you.

Reply
1. Paul Allison says:
  
  July 7, 2015 at 8:14 am
  
  You might be able to get by with conventional ML, depending on how many predictors you have. But in any case, I would verify p-values using exact logistic regression. Firth is probably better for coefficient estimates. I don’t know if these methods are available in eviews.
  
  Reply
  1. Paul Allison says:
    
    July 8, 2015 at 11:08 am
    
    You can’t compare AIC and SC across different data sets. Similarly, the percentage of correctly predicted events will not be comparable across the full and subsampled data sets. There’s certainly no reason to think that the model estimated from the subsampled data will be any better than the model estimated from the full data. Try using the model from the subsampled data to predict outcomes in the full data. I expect that it will do worse than the model estimated from the full data.
    
    Reply
    1. Meghan Grabill says:
      
      July 15, 2015 at 11:44 am
      
      Dr. Allison,
      
      Thank you for your helpful comments. As you suspected the subsampled model did a much worse job predicting the full data than the full data model. It hugely over predicted the 1s resulting in false positives for almost every observation. Thank you!
      
      Reply
Luke W. says:

June 28, 2015 at 3:59 pm

Thanks so much for this article. I am performing logistic regression for a sample size of 200 with only 8 events on SPSS. I believe SPSS does not offer exact logistic regression or the Firth method. The p value for my model is statistically significant (p<0.05) and one of my independent variables seems to contribute significantly to the model (p<0.05).

Without any independent variables, the model correctly classifies 96% of the cases; the model correctly classifies 98% of cases with the independent variables added. R^2 = 33%. I realize that the number of rare events is quite small, which you mentioned could be problematic. How meaningful do you believe the results are, and would you have any suggestions on improving the statistical work? Thank you!

Reply
1. Paul Allison says:
  
  June 28, 2015 at 7:04 pm
  
  With only eight events, I really think you should do exact logistic regression to get p-values that you can put some trust in. Lack of availability in SPSS is not an acceptable excuse.
  
  Reply
Sandra Roduner says:

June 24, 2015 at 5:31 am

I have a question regarding the applicability of the firthlogit command for panel data in stata:

How can I implement the penalized logistic regression for panel data? I understand that I can use the xtlogit commands for FE and RE, but how do I do this with the firthlogit command?

thank you very much for your help!

Reply
1. Paul Allison says:
  
  June 25, 2015 at 6:50 am
  
  Unfortunately, the firthlogit command does not have any options for dealing with panel data.
  
  Reply
gec says:

June 23, 2015 at 3:07 pm

Dear Dr. Allison,

Stata’s firthlogit command does not allow for clustered standard errors. Does Firth logit automatically account for clustered observations?

I am fitting a discrete hazard model, so it feels strange not to specify clustered standard errors. In any case, firthlogit has produced results nearly identical to the results from logit and rare events logit models with clustered standard errors.

Reply
1. Paul Allison says:
  
  June 23, 2015 at 4:54 pm
  
  No Firth logit does not correct for clustering. However, if you are fitting a discrete hazard with no more than one event per individual, there is no need to adjust for clustering. That may explain why all the results are so similar.
  
  Reply
Brian Z. says:

June 19, 2015 at 2:59 pm

I have a question about the recommended 5:1 ratio of events to predictors.

Is this ratio suggestion for the number of predictors you start with, or the number of predictors you ultimately find statistically significant for the final model?

p.s. fascinating discussion

Reply
1. Paul Allison says:
  
  June 19, 2015 at 9:40 pm
  
  Ideally it would be the number you start with. But that might be too onerous in some applications.
  
  Reply
Mahdiyeh says:

June 15, 2015 at 4:47 pm

Dear Paul Allison,
Thanks for this insightful article. In my research, I have an unbalanced panel of merging and non-merging firms for about 20 years, and I am investigating driving factors of the probability of merging. Among the 5000 firms in the sample, only 640 of them experience a merger. It means the dependent variable has many zeros. Based on my readings from this article, firthlogit command in Stata is your choice. Is this true for an unbalanced panel data as well? Thanks for your time and consideration.

Reply
1. Paul Allison says:
  
  June 16, 2015 at 6:40 am
  
  With that many mergers, standard logistic regression should be just fine. But if some firms contribute more than one merger, you should probably be doing a mixed model logistic regression using xtlogit or melogit.
  
  Reply
Amanda says:

May 6, 2015 at 11:15 am

Dr. Allison –

I am performing a logistic regression with 20 predictors. There are 36,000 observations. The predictor of interest is a binary variable with only 84 events that align with the dependent variable. Is firth logistic regression the best method for me to use in this case?

Regards,

Amanda

Reply
1. Paul Allison says:
  
  May 11, 2015 at 1:18 pm
  
  That’s what I would use.
  
  Reply
Caroline says:

May 4, 2015 at 5:39 am

Hi Dr Allison,

I am doing a logistical regression on 19100 cases with 18 predictors. 6 of my predictors have rare events (lowest events are 217;19100, 630;19100 etc). In the goodness of fit model, Pearson is 0 and Deviance is 1, which i know to be problematic.

Firstly, do you think this is likely to be due to the rare events? Secondly, is oversampling necessary, reading your previous comments it seems that although the predictors are proportionally unbalanced, there would be a sufficient number of events in each category.

Thanks for taking the time to reply to these comments.
Caroline

Reply
1. Paul Allison says:
  
  May 4, 2015 at 7:03 am
  
  You should be fine with conventional ML. No oversampling is necessary. The discrepant results for Pearson and Deviance are simply a consequence of the fact that you are estimating the regression on individual-level data rather than grouped data. These statistics are worthless with individual-level data.
  
  Reply
Heather says:

April 27, 2015 at 11:05 am

Are there any suggested goodness of fit tests for firth logistic as I see hosmer lemeshaw is invalid when using the firth method.
Also, are AIC values valid in firth?
Thank you.

Reply
1. Paul Allison says:
  
  April 28, 2015 at 7:45 am
  
  Well, as I’ve stated in other posts, I am not a fan of the Hosmer-Lemeshow test in any case. But I don’t see why it should be specially invalid for the Firth method. The goodness of fit tests that I discuss in my posts of 7 May 2014 and 9 April 2014 could be useful.
  
  Reply
Laura says:

April 16, 2015 at 5:21 pm

Hello Dr. Allison,

Thank you for this posting it has been very helpful.

I have a sample of 170 observations which I have run a predictive model on. As the main focus of this study is exploring gender patterns I would like to build models stratified by gender leaving me with 76 women, 94 men. There are 50 events in the women and 59 in the men.

I found that with logistic regression my CIs are very wide for my ORs so have used firth logistic instead.

I am still finding I have wide CIs, the widest for any of the predictors in the women is 1.82-15.69 and for the men is 1.01-11.56.

I am finding however variables in the model to be significant below 0.05 , and even as low as 0.001 – these variables make clinical and statistical sense…is it still reasonable to present this model, noting that there are limitations in terms of sample size?

I have read however that wide CIs are common in firth, can you speak to this?

Are there any other suggestions you may have for modelling with such small sample size?

Thank you in advance!

Reply
1. Paul Allison says:
  
  April 17, 2015 at 8:51 am
  
  If you use the Firth method, make sure that your CIs are based on the profile likelihood method rather than the usual normal approximation. The latter may be somewhat inaccurate. In any case, the fact that your CIs are wide is simply a consequence of the fact that your samples are relatively small, not the particular method that you are using. That said, there’s no reason not to present these results.
  
  Reply
  1. Laura says:
    
    April 20, 2015 at 10:44 am
    
    Thank you for your quick response!
    
    In Stata the firth model output notes a penalized log likelihood rather than a log likelihood. I am assuming this penalty ensures the CIs are not based on a normal approximation. Is this correct, or is there something else I should be looking for in my output to identify the profile likelihood method is being used?
    
    Thank you!
    
    Reply
    1. Paul Allison says:
      
      April 20, 2015 at 1:27 pm
      
      For confidence intervals, the firthlogit command uses the standard normal approximation rather than the profile likelihood method. However, you can get likelihood ratio tests that coefficients are 0 by using the set of commands shown in the example section of the help file for firthlogit.
      
      Reply
      1. Laura says:
        
        April 22, 2015 at 12:11 am
        
        Thank you for this suggestion, following the commands in the help section I have tested that the coefficients=0. The coefficients for the variables that are significant in the firth model do not = 0, while those that are not significant (my force in variables) do = 0, according to the Likelihood ratio test.
        
        Despite doing this testing my CIs for this firth logistic regression model are still not based on the profile likelihood method and are being calculated using normal approximation. It seems from your previous post testing LRT coefficients may offer an alternative to presenting these CIs based on the PLM? Would you be able to clarify this?
        
        Thank you very much!
      2. Paul Allison says:
        
        April 22, 2015 at 9:20 am
        
        The normal approximation CIs are probably OK in your case. They are most problematic when there is quasi-complete separation or something approaching that.
        Both likelihood ratio tests and profile likelihood confidence intervals are based on the same principles. Thus, if the profile likelihood CI for the odds ratio does not include 1, the likelihood ratio test will be significant, and vice versa.
David says:

April 10, 2015 at 4:53 am

Thanks for this nice post. When do you start thinking that it is not possible to perform a reliable statistical analysis? My problem is that I have around 40 events in a sample of 40000, and I also have around 10 covariates to explain the outcomes. What would you suggest? Do you rely on other implemented software? R?
Thanks in advance

Reply
1. Paul Allison says:
  
  April 10, 2015 at 8:09 am
  
  Well, I think you have enough events to do some useful analysis. And I’d probably start with conventional logistic regression. But then I’d want to corroborate results using both the Firth method and exact logistic regression. Both of these methods are available in SAS, Stata (with the user-written command firthlogit) and R. Your final model would ideally have closer to 5 covariates rather than 10. And keep in mind that while you may have enough events to do a correct analysis, your power to test hypotheses of interest may be low.
  
  Reply
  1. Lara Ruter says:
    
    April 13, 2015 at 6:37 am
    
    Dear dr Allison,
    
    Thank you for this clear explanation above.
    We are studying an event with a low incidence (0.8:1000 up to 10:1000) in a large dataset (n=1,570,635).
    
    In addition, we also performed conventional logistic regression analysis on the recurrence rate of this event in a linked dataset (n=260,000 for both time points). Roughly 30 out of 320 patients with a first event had a recurrent event compared to 184 in the remaining population (de novo event at the second timepoint of the study). We adjusted for a maximum of 5 variables in the multivariate analysis.
    Was it correct to use conventional logistic regression or should we have used Firth or exact logistic regression analysis instead?
    
    Thanks in advance,
    
    Reply
    1. Paul Allison says:
      
      April 13, 2015 at 10:31 am
      
      Well, you’re certainly OK with conventional ML for the non-recurrent analysis. For the recurrent analysis, you might want to replicate with Firth regression (downsides minimal) or possibly exact logistic (less power, more computing time).
      
      Reply
      1. Lara Ruter says:
        
        April 15, 2015 at 5:37 am
        
        Thank you for your quick answer. I’ll have a look at performing a Firth regression in SAS on the recurrent analysis and see what different results are given.
      2. Lara Ruter says:
        
        April 16, 2015 at 10:15 am
        
        We performed the logistic regression analysis with Firth correction by adding \cl firth to our syntax. The odds with conventional log regr was 83 (55-123), with Firths’ it is now 84 (56-124). Mainly the CI became a bit wider.
        
        We may thus conclude from these results that the recurrence rate remains statistically significant, isn’t it?
        Thank you in advance,
      3. Paul Allison says:
        
        April 16, 2015 at 10:22 am
        
        Probably. But the CI based on the usual normal approximation may be inaccurate with the FIRTH method. Instead of CL, use PLRL which stands for profile likelihood risk limits.
Lise Delagrange says:

April 3, 2015 at 5:17 am

Hi Paul,

I am working on my master thesis and i’m finding some difficulties with it.
It is about the relationship between socio-demographic and health related variables and the chance of passing the first year on college. So my dependent variable is passing (=1) or failing (=0).
Now, i’m doing a univariate logistic regression to see which variables are significant and so which I should include in my multivariate logistic regression analysis.

When I look at the Hosmer and Lemeshow test for the categorical predictors (f.e. gender, being on a diet or not) I get following,
chi²:0.000
df: o
sign:.

Why is this? Is this due to the fact that there are only four groups possible?
( male passed, male failed, female passed, female failed)

Furthermore I also have a predictor with 5 respons options (once a week, twice a week, 3-4 times a week,…) and also there my p value is significant. What should I do when it is significant? Now I entered this variable as a continu variable, but maybe this is not correct?

Also, is the hosmer and lemeshow test important in univariate logistic regressions or is it only done in multivariate?

Thanks in advance,
a desperate master student

Reply
1. Paul Allison says:
  
  April 3, 2015 at 8:40 am
  
  See my post on the Hosmer-Lemeshow statistic: https://statisticalhorizons.com/hosmer-lemeshow/
  
  Reply
Jeff Tang says:

March 31, 2015 at 3:50 pm

Hi Paul,
In my case, I want to use logistic regression to model fraud or no fraud with 5 predictors, but the problem is I have only 1 fraud out of 5,000 observations. Is it still able to use logistic regression with Firth logit to model it? What is your suggestion for the best approach for this case?
Thank you so much,
Jeff Tang

Reply
1. Paul Allison says:
  
  March 31, 2015 at 4:24 pm
  
  I’m afraid you’re out of luck Jeff. With only 1 event, there’s no way you can do any kind of reliable statistical analysis.
  
  Reply
  1. Jeff Tang says:
    
    April 1, 2015 at 11:15 am
    
    That’s what I thought. Thank you, Paul.
    By the way, what if I just convert the raw data from each predictor to a standard score (say 1-10) and then sum up in order to at least give me some idea how risky each person to commit a fraud.
    What do you think?
    Thanks again,
    Jeff
    
    Reply
    1. Paul Allison says:
      
      April 1, 2015 at 11:17 am
      
      Problem is, how do you know these are the right predictors?
      
      Reply
      1. Jeff Tang says:
        
        April 1, 2015 at 1:54 pm
        
        I see. I’ll figure it out. Suppose after I find the right predictors, do you think it’s a good idea to use the standard score for this very limited data? What’s your advice?
        Thank you,
        Jeff
      2. Paul Allison says:
        
        April 3, 2015 at 8:38 am
        
        Might be useful. But getting the right predictors is essential.
Ina says:

March 30, 2015 at 7:59 am

Hello sir, I am also trying to model (statistically) my binary response variable with 5 different independent variables. my dataset is a kind of imbalanced one. my sample size is 2153 out of which only 67 are of one kind the rest are of the other kind. what will be a good suggestion in this regard? will it be possible for me to model my data set statistically as it is an imbalanced one?

Reply
1. Paul Allison says:
  
  March 30, 2015 at 1:24 pm
  
  The problem is not lack of balance, but rather the small number of cases on the less frequent outcome. A very rough rule of thumb is that you should have a least 10 cases on the less frequent outcome for each coefficient that you want to estimate. So you may be OK. That rule of thumb is intended to ensure that the asymptotic approximations for p-values and confidence intervals are close enough. It doesn’t ensure that you have enough power to detect the effects of interest. I’d probably just run the model with conventional ML. Then corroborate the results with Firth logit or exact logistic regression.
  
  Reply
Kim says:

March 23, 2015 at 11:56 am

Hi.
Thank you in advance for this fascinating discussion and for your assistance (if you reply, but if not I understand).

I have a model with 1125 cases. I have used binary logistic regression but have been told I do not take into account that 0/1 responses in the dependent variable are very unbalanced (8% vs 92%) and that the problem is that maximum likelihood estimation of the logistic model suffers from small-sample bias. And the degree of bias is strongly dependent on the number of cases in the less frequent of the two categories. It has been suggested that in order to correct any potential biases, I should utilise the penalised likelihood/Firth method/exact logistic regression.
Do you agree with this suggestion or is my unbalanced sample OK because there are enough observations in the smaller group?
Regards,
Kim

Reply
1. Paul Allison says:
  
  March 25, 2015 at 11:05 am
  
  So, you’ve got about 90 cases on the less frequent category. A popular (but very rough) rule of thumb is that you should have about 10 cases (some say 5) for each coefficient to be estimated. That suggests that you could reasonably estimate a model with about 10 predictors. But I’d still advise using the Firth method just to be more confident. It’s readily available for SAS and Stata. Exact logistic regression is a useful method, but there can be a substantial loss of power along with a substantial increase in computing time.
  
  Reply
adiangga says:

March 19, 2015 at 12:25 am

Hi paul, recently, i’m working on my thesis about classification for child labor using decision tree C5.0 algorithm compare with multivariate adaptive regression spline (MARS). I have imbalanced data for child labor (total 2402 sample, with 96% child labor and 4% not child labor)and 16 predictor variables.
Using decision tree for imbalanced data is not quite problem because of many techniques for balancing data, but i’m very confused with MARS(MARS with logit function). i have a few question:
1. could i just use MARS without balancing data? or
2. could 1 use sampling method(Oversampling,undersampling, SMOTE) for balancing data? or
3. could you proposing me some methods for me? Thank you for the advices

Reply
1. Paul Allison says:
  
  March 19, 2015 at 11:57 am
  
  Sorry but I don’t know enough about MARS to answer this with any confidence. Does MARS actually require balancing? It’s hard to see how oversampling or undersampling could help in this situation.
  
  Reply
Alex says:

March 9, 2015 at 11:05 am

This is a great resource, thanks so much for writing it. It answered a lot of my questions.

I am planning to use MLwiN for a multilevel logistic regression, with my outcome variable having 450 people in category 1 and around 3200 people in column 0.

My question is: MLwiN uses quasi-likelihood estimation methods as opposed to maximum likelihood methods. Do the warnings of bias stated in the article above still apply with this estimation technique, and if so, would it be smart to change the estimation method to penalized quasi-likelihood?

Thanks so much for any light you can shed on this issue.

Reply
1. Paul Allison says:
  
  March 10, 2015 at 10:18 am
  
  First of all, I’m not a fan of quasi-likelihood for logistic regression. It’s well known to produce downwardly biased estimates unless the cluster sizes are large. As for rare events, I really don’t know how well quasi-likelihood does in that situation. My guess is that it would be prone to the same problems as regular ML. But with 450 events, you may be in good shape unless you’re estimating a lot of coefficients.
  
  Reply
Pat says:

February 25, 2015 at 1:07 pm

Dear Dr. Allison,

I am analyzing a rare event (about 60 in 15,000 cases) in a complex survey using Stata. I get good results (it seems) on the unweighted file using “firthlogit” but it is not implemented with svy: I need either another way to adjust for the complex survey design or an equivalent of firthlogit that can work with the svyset method.
Any suggestions?

Reply
1. Paul Allison says:
  
  February 26, 2015 at 8:49 am
  
  Sorry, but I don’t have a good solution for Stata. Here’s what I’d do. Run the model unweighted using both firthlogit and logistic. If results are pretty close, then just use logistic with svyset. If you’re willing to use R, the logistf package allows for case weights (but not clustering). Same with PROC LOGISTIC in SAS.
  
  Reply
DC says:

December 10, 2014 at 8:43 am

Dear Dr. Allison,

I work in fundraising and have developed a logistic regression model to predict the likelihood of a constituent making a gift above a certain level. The first question my coworkers asked is what the time frame is for the predicted probability. In other words, if the model suggests John Smith has a 65% chance of making a gift, they want to know if that’s within the next 2 years, 5 years, or what. The predictor variables contain very little information about time, so I don’t think I have any basis to make this qualification.

The event we’re modeling is already pretty rare (~200 events at the highest gift level) so I’m concerned about dropping data, but the following approach has been suggested: If we want to say someone has a probability of giving within the next 3 years, we should rerun the model but restrict the data to events that happened within the last 3 years. Likewise, if we use events from only the last 2 years, then we’d be able to say someone has a probability of giving within the next 2 years.

Apart from losing data, I just don’t see the logic in this suggestion. Does this sound like a reasonable approach to you?

Any suggestions on other ways to handle the question of time would be much appreciated. It seems like what my coworkers want is a kind of survival analysis predicting the event of making a big gift, but I’ve never done that type of analysis, so that’s just a guess.

Thanks for your time,
DC

Reply
1. Paul Allison says:
  
  December 10, 2014 at 2:50 pm
  
  Ideally this would be a survival analysis using something like Cox regression. But the ad hoc suggestion is not unreasonable.
  
  Reply
Emmanuel Dhyne says:

November 27, 2014 at 7:37 am

Dear Dr Allison,

I’m running some analysis about firms’ relations. I’ve got info on B-to-B relations (suppliers – customers) for almost all Belgian firms (let’s assume that I have all transactions – around 650,000 transactions after cleaning for missing values in explanatory variables) and I want to run a probit or a logit regression of the probability that two firms are connected (A supplies B) and I need to create the 0’s observations. What would be the optimal strategy, taking into account that I cannot create all potential transactions (19,249,758,792) ?
I’ve considered either selecting a random sample of suppliers (10% of original sample) and a random sample of customers (same size) and consider all potential transactions between those two sub-sample or to consider all actual transactions and randomly selected non transactions.

Reply
1. Paul Allison says:
  
  December 1, 2014 at 10:03 am
  
  I’d go with the 2nd method–all transactions and a random sample of non-transactions. But with network data, you also need special methods to get the standard errors right. There’s an R package called netlogit that can do this.
  
  Reply
madhu says:

November 26, 2014 at 1:28 am

Hi Paul,

In my case I have 14% (2.9 million) of the data with events. Is it fine if I go with MLE estimation?

Thanks!!!!

Reply
1. Paul Allison says:
  
  December 1, 2014 at 9:56 am
  
  Yes
  
  Reply
Su Lin says:

November 23, 2014 at 10:50 pm

Dear Allison,

I have a study about bleeding complication after a procedure recently. A total of 185 patients were enrolled in this study and 500 times of procedure were performed. Only 16 events were finally observed. So what kind of method I can use to analyze the predictive factors of this events? I’ve tried logistic regression on SPSS,however the reviewers said “The number of events is very low, which limits the robustness of the multivariable analysis with such a high number of variables. ”

Thanks in advance for your help!

Reply
1. Paul Allison says:
  
  November 24, 2014 at 6:36 am
  
  Do you really have 500 potential predictors? If so, you need to classify the procedures into a much smaller number. Then, here’s what I recommend: (1) Do forward inclusion stepwise logistic regression to reduce the predictors to no more than 3. Use a low p-value as your entry criterion, no more than .01. (2) Re-estimate the final model with Firth logit. (3) Verify the p-values with exact logistic regression.
  
  Reply
Marina Z. says:

November 15, 2014 at 9:17 am

Dear Dr. Allison,

I have 10 events in a sample with 46 observations (including the 10 events). I have run firthlogit in Stata, but I could not use the command fitstat to estimate r2. I would like to ask how I can estimate r2 with Stata? Is there any command?

Thanks in advance for your time and attention.

Reply
1. Paul Allison says:
  
  November 17, 2014 at 7:51 am
  
  I recommend calculating Tjur’s R2 which is described in an earlier post. Here’s how to do it after firthlogit:
  
  firthlogit y x1 x2
  predict yhat
  gen phat = 1/(1+exp(-yhat))
  ttest phat, by(y)
  
  The gen command converts log-odds predictions into probabilities. In the ttest output, what you’re looking for is the difference between the average predicted values. You’ll probably have to change the sign.
  
  Reply
Young-joo says:

November 7, 2014 at 3:55 pm

Thank you so much for the post. I am working on the data with only 0.45 percent “yes”s, and your posts were really helpful. The firth method and the rare event logit produces very same coefficients as you explained in your post. The regular post estimation commands such as mfx, however, do not get me the magnitudes of the effects that I would like to see after either method. I read all the posts in the blog, but could not find a clue.
Thank you for your help, Dr. Allison!

Reply
1. Paul Allison says:
  
  November 13, 2014 at 3:19 pm
  
  The mfx command in Stata has been superseded by the margins command. The firthlogit command is user written and thus may not support the post estimation use of the margins command. The problem with the exlogistic command is that it doesn’t estimate an intercept and thus cannot generate predicted values, at least not in the usual way.
  
  Reply
Mathan says:

October 14, 2014 at 8:34 am

Dear Dr. Allison,

I need your expertise on selecting appropriate method. I have 5 rare events(Machine failure) out of 2000 observations.

Now, I need to predict when machine will be down based on the historical data, I have 5 columns

1) Error logs – which were generated by the machine (non-numeric)
2) Time stamp – when error message was generated
3) Severity – Severity of each error log (1-low, 2- Medium, 3- High)
4) Run time – No. of hours the machine ran till failure
5) Failed? – Yes/No

Thanks in advance for your help!

Reply
1. Paul Allison says:
  
  October 14, 2014 at 8:42 am
  
  With just five events, you’re going to have a hard time estimating a model with any reliability. Exact logistic regression is essential. Are the error logs produced at various times BEFORE the failure, or only at the time of the failure? If the latter, then they are useless in a predictive model. Since you’ve got run time, I would advise some kind of survival analysis, probably a discrete time method so that you can use exact logistic regression.
  
  Reply
  1. Mathan says:
    
    October 15, 2014 at 1:09 am
    
    Thanks Dr. Allison, The error logs were produced at various times BEFORE the failure. Is there a minimum required number of events (or proportion of events)for estimating a model? However, I would try other methods as you advised (Survival, Poisson model)
    
    Reply
    1. Paul Allison says:
      
      October 15, 2014 at 6:12 am
      
      Well, if you had enough events, I’d advise doing a survival analysis with time dependent covariates. However, I really don’t think you have enough events to do anything useful. One rule of thumb is that you should have at least 5 events for each coefficient to be estimated.
      
      Reply
Yong-jun Choi says:

October 6, 2014 at 1:20 am

I have a sample of 11,935 persons of whom 944 persons made one and more visits to emergency department during one year. Can I apply logistic regression safely to this data? (My colleague recommended the count data model like ZINB model because conventional logistic regression generates a problem of underestimated OR due to zero excess. But I think an event itself can be sometimes more important information than number of event per patient.)

Reply
1. Paul Allison says:
  
  October 6, 2014 at 9:15 am
  
  Yes, I think it’s quite safe to apply logistic regression to these data. You could try the ZINB model, but see my blog post on this topic. A conventional NB model may do just fine. Personally, I would probably just stick to logistic, unless I was trying to develop a predictive model for the number of visits.
  
  Reply
  1. Yong-jun Choi says:
    
    October 7, 2014 at 2:09 am
    
    Dr. Allison,
    
    I highly appreciate you for the valuable advice. But I have one more question.
    
    He (my colleague) wrote to me:
    
    “Our data have too many zeros of which some may be ‘good’ zeros but others may be ‘bad’ zeros. Then, we should consider that the results of logistic regression underestimate the probability of event (emergency department visit).”
    
    If he is correct, what should I do to minimize this possibility? (Your words ‘quite safe’ in your reply imply that he is wrong, I guess)
    If he is wrong, why is he wrong?
    
    Thank you for sparing your time for me.
    
    Reply
    1. Paul Allison says:
      
      October 7, 2014 at 10:10 am
      
      I would ask your colleague what he means by “too many zeros”. Both logistic regression and standard negative binomial regression can easily allow for large fractions of zeros. I would also want to know what is the difference between “good zeros” and “bad zeros”. Zero-inflated models are most useful when there is strong reason to believe that some of the individuals could not have experienced the event not matter what the values of their predictor variables. In the case of emergency room visits, however, it seems to me that everyone has some non-zero risk of such an event.
      
      Reply
      1. Yong-jun Choi says:
        
        October 14, 2014 at 1:30 am
        
        Dr. Allison,
        
        Thank you very much. We bought some books on statistics including your books 🙂 Your advice stimulated us to study important statistical techniques. Thank you.
Danny Rosenstein says:

September 19, 2014 at 7:33 am

Hello Dr. Allison,

The data I use is also characterized by having very rare events (~0.5% positives) There are however enough positives (thousands) so should hopefully be ok to employ logistic regression according to your guidelines.
My question comes from a somewhat different angle (which I hope is ok).
I have ~20 predictors which by themselves represent estimated probabilities. The issue is that the level of confidence in these probabilities/predictors may vary significantly. Given that these confidence levels could be estimated, I’m looking for a way to take these confidence levels into account as well, since the predictor’s true weight may significantly depend on its confidence.
One suggested option was to divide each predictor/feature into confidence based bins, so that for each case (example) only a single bin will get an actual (non zero) value. Similar to using “Dummy Variables” for category based predictors. Zero valued features seem to have no effect in the logistic regression formulas (I assume that features would need to be normalized to a 0 mean value)
Could this be a reasonable approach ?
Any other ideas (or alternative models) for incorporating the varying confidence levels of the given predictor values?

Thanks in advance for your time and attention

Reply
1. Paul Allison says:
  
  September 20, 2014 at 9:36 am
  
  One alternative: if you can express your confidence in terms of a standard error or reliability, then you can adjust for the confidence by estimating a structural equation model (SEM). You would have to use a program like Mplus or the gsem command in Stata that allows SEM with logistic regression. BTW, if you do dummy variables, there is no need to normalize them to a zero mean.
  
  Reply
  1. Danny Rosenstein says:
    
    September 20, 2014 at 4:05 pm
    
    Thank you so much for your response and advise.
    Regarding the option of using dummy variables, here is what I find confusing:
    – On the one hand, whenever a feature assumes a value of 0 its weight learning does not seem to be affected (according to the gradient descent formula), or maybe i’m missing something ..
    – On the other hand, the features in my case represent probabilities (which are a sort of prediction of the target value). So if in a given example the feature assumes a 0 value (implying a prediction of 0) but the actual target value is 1 it should cause the feature weight to decrease (since, in this example, it’s as far as possible from the true value)
    
    Another related question that I have:
    In logistic regression the linear combination is supposed to represent the odds Logit value ( log (p/1-p) ). In my case the features are them selves probabilities (actually sort of “predictions” of the target value). So their linear combinations seems more appropriate for representing the probability of the target value itself rather than its logit value. Since P is typically very small ~0.5% (implying that log (p/1-p) ~= log(p)) would it be preferable to use the log of the features instead of the original feature values themselves as input for the logistic regression model ?
    Again, thanks a lot for your advise.
    
    Reply
    1. Paul Allison says:
      
      September 26, 2014 at 8:09 am
      
      Because there is an intercept (constant) in the model, a value of 0 on a feature is no different than any other value. You can add any constant to the feature, but that will not change the weight or the model’s predictions. It will change the intercept, however.
      It’s possible that a log transformation of your feature may do better. Try it and see.
      
      Reply
Tony Bredehoeft says:

August 28, 2014 at 3:11 pm

Dr.Allison,

The article and comments here have been extremely helpful. I’m working on building a predictive model for bus breakdowns with logistic regression. I have 207960 records total with 1424 events in the data set. Based on your comments above, it seems I should have enough events to continue without oversampling. The only issue is that I’m also working with a large number of potential predictors, around 80, which relate to individual diagnostic codes that occur in the engine. I’m not suggesting that all of these variables will be in final model, but is there a limit to the number of predictors I should be looking to include in the final model? Also, some of predictors/diagnostic codes happen rarely as well. Is there any concern having rare predictors in a model with rare events?

Thanks,

Tony

Reply
1. Paul Allison says:
  
  August 28, 2014 at 3:57 pm
  
  Well, a common rule of thumb is that you should have at least 10 events for each coefficient being estimated. Even with 80 predictors, you easily meet that criterion. However, the rarity of the predictor events is also relevant here. The Firth method could be helpful in reducing any small-sample bias of the estimators. For the test statistics, consider each 2 x 2 table of predictor vs. response. If the expected frequency (under the null hypothesis of independence) is at least 5 in all cells, you should be in good shape.
  
  Reply
Jens says:

August 25, 2014 at 10:58 am

Dear Dr. Allison,

I have a small dataset (90 with 23 events) and have performed an exact logistic regression which leads to significant results.
I wanted to add an analysis of the Model Fit Statistics and the Goodness-of-Fit Statistics like AIC, Hosmer-Lemeshow-Test or Mc Fadden’s R. After reading your book about the logistic regression using SAS (second edition) in my understanding all these calculations only make sense respectively are possible if the conventional logistic regression is used. Is my understanding correct? Are there other opportunities to check the Goodness-of-fit in case of using the exact logistic regression? Thank you.

Reply
1. Paul Allison says:
  
  August 25, 2014 at 12:29 pm
  
  Standard measures of fit are not available for exact logistic regression. I a not aware of any other opportunities.
  
  Reply
Chacha Mangu says:

August 23, 2014 at 5:39 pm

Hello,

Paul,

I am currently doing my project for MSc, I have a dataset with 2312 observation with only 29 observations. I want to perform logistic association. Which method would you recommend?

Reply
1. Paul Allison says:
  
  August 24, 2014 at 12:03 pm
  
  I assume you mean 29 events. I’d probably use the Firth method to get parameter estimates. But I’d double-check the p-values and confidence intervals with conditional logistic regression. And I’d keep the number of predictors low–no more than 5, preferably fewer.
  
  Reply
Stefan says:

August 20, 2014 at 8:39 am

Hello Mr.Allison,
I’m writing you because I have a similar problem. I have an unbalanced panel data with 23 time periods (the attrition is du to lose of indiv over periods). I would like to ask your opinion for 2 issues:

1. How can I do the regression, should I use the pooled data or panel data with FE/RE?
2. I also have a problem of rare events, for the pooled data I have almost 10000000 obs and only 45000 obs whit the event=1 (0.45%).What do you think I shold do in this case.

Thank you very much, I appreciate you help.
Stefan

Reply
1. Paul Allison says:
  
  August 20, 2014 at 9:40 am
  
  1. I would do either fixed or random effects logistic regression.
  2. With 45,000 events, you should be fine with conventional maximum likelihood methods.
  
  Reply
  1. Stefan says:
    
    August 20, 2014 at 10:15 am
    
    First of all, thank you for your answears.
    The problem is that when I do logistic regression for the pooled data I obtain a small Somers D (0.36) and my predicted probabilities are very small, even for the event=1 (The probabilities are nor bigger than 0.003). I don’t know what to do.
    What do you think is the problem, and what can I do.
    Thank you again.
    
    Reply
Robert Pacis says:

August 14, 2014 at 7:17 pm

Sorry, follow-up question… what’s the minimum acceptable c-stat… I usually hear .7, so if I get, say 0.67, should I consider a different modeling technique?

Reply
1. Paul Allison says:
  
  August 15, 2014 at 2:29 am
  
  The c-stat is the functional equivalent of an R-squared. There is no minimum acceptable value. It all depends on what you are trying to accomplish. A different modeling technique is not necessarily going to do any better. If you want a higher c-stat, try getting better predictor variables.
  
  Reply
Robert Pacis says:

August 14, 2014 at 7:14 pm

Dr. Allison,

Hi. You may have already answered this from earlier threads, but is a sample size of 9000 with 85 events/occurrence considered a rare-event scenario? is logistic regression appropriate?

Many thanks.

Rob

Reply
1. Paul Allison says:
  
  August 15, 2014 at 2:27 am
  
  Yes, it’s a rare event scenario, but conventional logistic regression may still be OK. If the number of predictors is no more than 8, you should be fine. But probably a good idea to verify your results with exact logistic regression and/or the Firth method.
  
  Reply
Olga says:

August 10, 2014 at 6:00 pm

Exact logistic regression, rare events, and Firth method work well for binary outcomes. What would you suggest for rare continuous outcomes?

Say, I have multiple sources of income (20,000+ sources). Taken separately, each source throughout a year generates profit only on rare occasions. Each source could have 362 days of zero profit, and 3 days of positive profit. The number of profit days slightly vary from source to source.

I have collected daily profit values generated by each source into one data set. It looks like pooled cross sections. This profit is my dependent variable. Independent variables associated with it are also continuous variables.

Can you provide me any hints of which framework to use? (I tried tobit model that assumes left censoring.) Can I still use Firth or rare events?

Thanks.

Reply
1. Paul Allison says:
  
  August 11, 2014 at 11:33 am
  
  Well, I’d probably treat this as a binary outcome rather than continuous: profit vs. no profit. Then, assuming that your predictors of interest are time-varying, I’d do conditional logistic regression, treating each income source as a different stratum. Although I can’t cite any theory, my intuition is that the rarity of the events would not be a serious problem in this situation.
  
  Reply
Ming says:

July 2, 2014 at 3:59 pm

for a rare event example, 20 events in 10,000 cases, may we add multiple event(like 19 times the events, so that we can get 200 events) in the data. once we get the predicted probablity, we jsut need to adjust the probablity by the percentages(in this case 10/10000 -> 200/10200).
Or we may use boostrapping method to resample the data?

Reply
1. Paul Allison says:
  
  July 3, 2014 at 10:20 am
  
  No, it’s definitely not appropriate to just duplicate your 20 events. And I can’t see any attraction to resampling. For p-values, I’d recommend exact logistic regression.
  
  Reply
Becky says:

June 23, 2014 at 3:13 am

Dr. Allison, this is an excellent post with continued discussion. I am currently in debate with contractors who have ruled out 62 events in a sample of 1500 as too small to analyse empirically. Is 62 on the cusp of simple logistic regression or would the Firth method still be advisable? Further, is there a rule of thumb table available which describes minimum number of events necessary relative to sample and number of independent variables? Many thanks. Becky

Reply
1. Paul Allison says:
  
  June 23, 2014 at 6:55 am
  
  It may not be too small. One very rough rule of thumb is that there should be at least 10 cases on the less frequent category for each coefficient in the regression model. A more liberal rule of thumb is at least 5 cases. I would try both Firth regression and exact logistic regression.
  
  Reply
abi says:

June 11, 2014 at 9:31 am

Dear Dr. Paul Allison,

In which case can we use 10% level of significance( p-value cut off point) instead of using 5%? For instance, if you have nine independent variables,and run univariate logistic regression, you find that the p-value for your three independent variables is below 10%. If you drop those variables which are above 10% (using 10% level of significance) and use firth to analyse your final model, you will end up with significant value(P<0.05) of the three variables. Is it possible to use this analysis and what would be the reason why you use 10% as cut off value?

Reply
1. Paul Allison says:
  
  June 12, 2014 at 10:46 am
  
  I don’t quite understand the question. Personally, I would never use .10 as a criterion for statistical significance.
  
  Reply
abi says:

June 9, 2014 at 8:30 am

Dear Dr. Paul Allison, I would like to know which kind of logistic regression analysis shall I use if have 1500 samples and only 30 positives? Shall I use exact or firth? What would be the advantage of using either of them in the analysis?

Reply
1. Paul Allison says:
  
  June 9, 2014 at 12:52 pm
  
  Firth has the advantage of reducing small sample bias in the parameter estimates. Exact is better at getting accurate p-values (although they tend to be conservative). In your case I would do both: Firth for the coefficients and exact for the p-values (and/or confidence limits).
  
  Reply
  1. Colleen says:
    
    July 9, 2014 at 9:11 pm
    
    I think this situation is most similar to my own but I’d like to check if possible. I have an experiment that has 1 indepdendent variable with 3 levels, sample size of 30 in each condition. Condition 1 has 1 success/positive out of 30. Condition 2 has 4/30, and Condition 3 has 5/30. Can I rely on Firth or do I need both? (And is it acceptable to report coefficients from one but probability from another? I wouldn’t have guessed that would be ok.)
    
    Reply
    1. Paul Allison says:
      
      July 10, 2014 at 8:13 am
      
      I don’t think you even need logistic regression here. You have a 3 x 2 table, and you can just do Fisher’s exact test, which is equivalent to doing exact logistic regression. I don’t think there’s much point in computing odds ratio, either, because they would have very wide confidence intervals. I’d just report the fraction of successes under each condition.
      
      Reply
Valentina says:

May 27, 2014 at 10:54 am

Dear Dr. Paul Allison, I understood we have to pay attention to small sample bias for small categories. But I have continuous independent variables, and 50 events over 90.000 cases (all times 11 years). If I use in a logit estimation, for example, 4 independent variables can I have some problems in the interpretation of their estimated coefficients and their significance? Thanks

Reply
1. Paul Allison says:
  
  May 27, 2014 at 11:23 am
  
  I’d probably want to go with the Firth method, using p-values based on profile likelihood. To get more confidence in the p-values, you might even want to try exact logistic regression. Although the overall sample size is pretty large for exact logistic, the small number of events may make it feasible.
  
  Reply
Jayenta says:

May 23, 2014 at 6:17 am

Dr. Paul Allison, I am very thankful to you for your post and the discussions followed, from which I have almost solved my problem except one. My event is out-migrant having 849 cases which is 1.2% of the total sample(69,207). Regarding the small proportion, I think my data is in the comfort zone to apply for logistic regression. But the dependent variable is highly skewed (8.86 skewness). Does it pose any problems, and if so, how can I take care of this? Reducing the number of non-events by taking random sample has been found helpful but I doubt whether it affects the actual characteristics of the population concerned. Plz clarify me on this. I use SPSS program. Thanks.

Reply
1. Paul Allison says:
  
  May 23, 2014 at 7:42 am
  
  The skewness is not a problem. And I see know advantage in reducing the number of non-events by taking a random sample.
  
  Reply
Sander Greenland says:

May 21, 2014 at 4:07 pm

This is a nice discussion, but penalization is a much more general method than just the Firth bias correction, which is not always successful in producing sensible results. There are real examples in which the Firth method could be judged inferior (on both statistical and contextual grounds) to stronger penalization based on conjugate-logistic (log-F) priors. These general methods are easily implemented in any logistic-regression package by translating the penalty into prior data. For examples see Sullivan & Greenland (2013, Bayesian regression in SAS software. International Journal of Epidemiology, 42, 308-317. These methods have a frequentist justification in terms of MSE reduction (shrinkage) so are not just for Bayesians; see the application to sparse data and comparison to Firth on p. 313.

Reply
1. Paul Allison says:
  
  May 21, 2014 at 8:01 pm
  
  Thanks for the suggestions.
  
  Reply
Jitendra says:

May 17, 2014 at 12:11 pm

Dear Dr. Allison,

I am trying to build a logistic regression model for a dataset with 1.4 million records with the rare event comprising 50000 records. The number of variables is about 50 most of which are categorical variables which on an average about 4 classes each. I wanted to check with you if it is advisable to use the Firth method in this case.
Thank You

Reply
1. Paul Allison says:
  
  May 21, 2014 at 8:04 pm
  
  You’re probably OK with conventional ML, but check to see how many events there are in each category of each variable. If any of the numbers are small, say, less than 20, you may want to use Firth. And there’s little harm in doing so.
  
  Reply
Maria says:

May 16, 2014 at 8:27 am

Hi Paul!
I would be most grateful if you could help me with the following questions: 1) I have a logistic regression model with supposedly low power (65 events and ten independent variables). Several variables do however come out significant. Are these significance tests unreliable in any way?
And 2) do you know if it is possible to perform the penalized likelihood in SPSS?

Reply
1. Paul Allison says:
  
  May 21, 2014 at 8:06 pm
  
  They could be unreliable. In this case, I would try exact logistic regression. I don’t know if penalized likelihood is available in SPSS.
  
  Reply
S Ray says:

May 15, 2014 at 3:28 am

Hi Dr. Allison,
Iam working on natural resource management issues. In my project ‘yes’responses of my dependent variable are 80-85% while ‘no’ responses are 14-18%. Can I use Binary logit model here?

with Regards
S. Ray

Reply
1. Paul Allison says:
  
  May 21, 2014 at 8:08 pm
  
  Probably, but as I said in my post, what matters more is the number of “no”s, not the percentage.
  
  Reply
Xinyuan says:

May 15, 2014 at 1:40 am

Hi Dr. Allison,

When I have 20 events out of 1000 samples, if re-sampling like bootstrap method can help to improve estimation? Thanks very much !

Reply
1. Paul Allison says:
  
  May 21, 2014 at 8:08 pm
  
  I strongly doubt it.
  
  Reply
  1. Xinyuan says:
    
    May 21, 2014 at 11:24 pm
    
    Dr. Allison, it is great to get your reply, thanks very much. Could you help to explain why bootstrap can’t help when events are rare ? Besides, if I have 700 responders out of 60,000 samples and the variables in final model is 15, but the number of variables is 500 in the original varible selction process, do you think the 700 events are enough ? Thanks again !
    
    Reply
    1. Paul Allison says:
      
      May 23, 2014 at 7:44 am
      
      What do you hope to accomplish by bootstrapping?
      
      Reply
      1. Xinyuan says:
        
        May 26, 2014 at 9:49 pm
        
        I want to increase the number of events by bootstrapping and thus the events are enough to make parameter estimation.
      2. Paul Allison says:
        
        May 27, 2014 at 8:06 am
        
        Bootstrapping can’t achieve that. What it MIGHT be able to do is provide a more realistic assessment of the sampling distribution of your estimates than the usual asymptotic normal distribution.
Annie says:

May 7, 2014 at 11:09 am

Hi Dr. Allison,

Thanks for this post. I have been learning how to use logistic regression and your blog has been really helpful. I was wondering if we need to worry about the number of events in each category of a factor when using it as a predictor in the model. I’m asking this because I have a few factors set as independent variables in my model and some are highly unbalanced, which makes me worry that the number of events might be low in some of the categories (when size is low). For example, one variable has 4 categories and sizes range from 23 (15 events) to 61064! Total number of events is 45334 for a sample size of 83356. Thanks!

Reply
1. Paul Allison says:
  
  May 7, 2014 at 11:16 am
  
  This is a legitimate concern. First of all, you wouldn’t want to use a category with a small number of cases as the reference category. Second, the standard errors of the coefficients for small categories will probably be high. These two considerations will apply to both linear and logistic regression. In addition, for logistic regression, the coefficients for small categories are more likely to suffer from small-sample bias. So if you’re really interested in those coefficients, you may want to consider the Firth method to reduce the bias.
  
  Reply
Katherine Barbieri says:

April 22, 2014 at 2:57 pm

I should have mentioned that I have 8 independent variables in my models.

Reply
Katherine Barbieri says:

April 22, 2014 at 1:28 pm

I am in political science and wanted to use rare events logit in Stata, but it does not allow me to use fixed or random effects. After reading your work, I am not even sure my events are rare. Could you please let me know if I have a problem and how I might resolve it in Stata?
I have one sample with 7851 observations and 576 events. I have another sample with 6887 observations and 204 events.
I appreciate your advice.
Katherine

Reply
1. Paul Allison says:
  
  April 23, 2014 at 9:57 am
  
  I don’t see any need to use rare event methods for these data.
  
  Reply
Bart says:

April 19, 2014 at 1:23 pm

“Does anyone have a counter-argument?”

In the 2008 paper “a weakly informative default prior distribution for logistic and other regression models” by Gelman, Jakulin, Pittau and Su, a different fully Bayesian approach is proposed:
– shifting and scaling non-binary variables to have mean 0 and std dev 0.5
– placing a Cauchy-distribution with center 0 and scale 2.5 on the coefficients.

Cross-validation on a corpus of 45 data sets showed superior performance. Surprisingly the Jeffreys’ prior, i.e. Firth method, performed poorly in the cross-validation. The second-order unbiasedness of property of Jeffreys’ prior, while theoretically defensible, doesn’t make use of valuable prior information, notably that changes on the logistic scale are unlikely to be more that 5.

This paper has focused on solving the common problem of inifite ML estimates when there is complete separation, not so much on rare events per se. The corpus of 45 data sets are mostly reasonably balanced data sets with Pr(y=1) between 0.13 and 0.79.
Yet the poor performance of the Jeffreys’ prior in the cross-validation is striking. Its mean logarithmic score is actually far worse than that of conventional MLE (using glm).

Reply
Paul B says:

April 15, 2014 at 3:24 am

I know you’ve answered this many times above regarding logistic regression and discrete-time models — that if you have a huge number of observations, then it is best to take all of the events and a simple random sample of all of the non-events which is at least as large as the number of events. My question is: Does this advice apply also to continuous time models, specifically the Cox PH with time-varying covariates? I ask because I have a dataset with 2.8 million observations, 3,000 of which are events. Due to the many time-varying covariates and other fixed covariates (about 10 of each), we had to split the data into counting process format, so the 3,000 events have become 50,000 rows. Thus, our computing capabilities are such that taking a simple random sample from the non-events that is 15,000 (which become about 250,000 rows) and running these in PHREG with the events takes considerable computing time (it uses a combination of counting process format AND programming statements). Long story short, the question is – is 15,000 enough? And what corrections need to be made to the results when the model is based on a SRS of the non-events?

Reply
1. Paul Allison says:
  
  April 15, 2014 at 3:09 pm
  
  I think 15,000 is enough, but the methodology is more complex with Cox PH. There are two approaches: the nested case-control method and the case-cohort method. The nested case-control method requires a fairly complicated sampling design, but the analysis is (relatively) straightforward. Sampling is relatively easy with the case-cohort method, but the analysis is considerably more complex.
  
  Reply
  1. Paul B says:
    
    April 16, 2014 at 5:18 am
    
    Thank you so much for the quick response! I really appreciate the guidance. I’ve just been doing some reading about both of these methods and your concise summary of the advantages and disadvantages of each approach is absolutely right on. I wanted to share, in case others are interested, two good and easy-to-understand articles on these sampling methodologies which I found: “Comparison of nested case-control and survival analysis methodologies for analysis of time-dependent exposure”, Vidal Essebag, et al. and “Analysis of Case-Cohort Designs”, William E. Barlow, et. al.
    
    Reply
Aaron says:

March 14, 2014 at 8:44 am

Can you use model fit statistics from SAS such as the AIC and -2 log likelihood to compare models when penalized likelihood estimation with the firth method is used?

Reply
1. Paul Allison says:
  
  March 16, 2014 at 9:57 am
  
  I believe that the answer is yes, although I haven’t seen any literature that specifically addresses this issue.
  
  Reply
J says:

March 10, 2014 at 11:34 am

Since it sounds like the bias relates to maximum likelihood estimation, would Bayesian MCMC estimation methods also be biased?

Reply
1. Paul Allison says:
  
  March 10, 2014 at 11:52 am
  
  Good question, but I do not know the answer.
  
  Reply
  1. Sam, applied med stats. says:
    
    April 3, 2014 at 5:04 pm
    
    Is this a relevant article?
    
    Mehta, Cyrus R., Nitin R. Patel, and Pralay Senchaudhuri. “Efficient Monte Carlo methods for conditional logistic regression.” Journal of The American Statistical Association 95, no. 449 (2000): 99-108.
    
    Reply
    1. Paul Allison says:
      
      April 3, 2014 at 9:12 pm
      
      This article is about computational methods for doing conditional logistic regression. It’s not really about rare events.
      
      Reply
Joy says:

March 8, 2014 at 1:47 am

Hi Paul! i’ve been reading this trail and i also encounter problems in modeling outcomes for rare events occurring at 10% in the population we’re studying. One option that we did to get the unique behaviour is to get equal samples from outcomes and non outcomes. Just to determine the behavior to predict such outcomes. But when we ran the logistic model, we did not apply any weight to bring the results to be representative of the population. Is this ok? Am really not that happy with the accuracy rate of the model only 50% among predicted to result to the outcome had the actual outcome. Is our problem just a function of the equal sampling proportion? And will the firth method help to improve our model? Hope to get good insights /reco from you… Thanks!

Reply
1. Paul Allison says:
  
  March 8, 2014 at 9:38 am
  
  Unless you’re working with very large data sets where computing time is an issue, there’s usually nothing to be gained by sampling to get equal fractions of events and non-events. And weighting such samples to match the population usually makes things worse by increasing the standard errors. As I tried to emphasize in the blog, what’s important is the NUMBER of rare events, not the fraction of rare events. If the number of rare events is substantial (relative to the number of predictors), the Firth method probably won’t help much.
  
  Reply
  1. Joy says:
    
    March 9, 2014 at 6:19 am
    
    Hi, thank you so much for your response. We’re working indeed with very large data. We need to sample to make computing time more efficient. I understand that what matters are the number of rare events and not the fraction, that’s why we made sure that we have a readable sample of the events. But I feel that the problem of accuracy of predicting the event is because of the equal number of events and non events used in the model. Is this valid? And yes, applying weights did no good. It made the model results even worse. For the model build for my base, should I just use random sampling of my entire population and just make sure that I have a readable base of my events?
    
    Reply
    1. Paul Allison says:
      
      March 9, 2014 at 10:03 am
      
      When sampling rare events from a large data base, you get the best estimates by taking all of the events and a random sample of the non-events. The number of non-events should be at least equal to the number of events, but the more non-events you can afford to include, the better. When generating predicted probabilities, however, you should adjust for the disproportionate sampling. In SAS, this can be done using the PRIOREVENT option on the SCORE statement.
      
      Reply
      1. Jit says:
        
        April 28, 2015 at 6:22 am
        
        Dr. Allison,
        
        You mentioned “The number of non-events should be at least equal to the number of events” — is this a necessity for logistic regression? That is, the event rate has to be lower than 50%?
      2. Paul Allison says:
        
        April 28, 2015 at 7:38 am
        
        Certainly not. That was just a recommendation for those cases in which you want to do stratified sampling on the dependent variable. If the number of events is small, it wouldn’t be sensible to then sample fewer non-events than events. That would reduce statistical power unnecessarily.
Saurabh Tanwar says:

January 9, 2014 at 1:30 am

Hi Dr. Allison,

I am working on a rare event model with response rates of only 0.13% (300 events in a data sample of 200,000). I was reading through your comments above and you have stressed that what matters is the number of the rarer event, not the proportion. Can we put “minimum number of events” data must have for modeling.

In my case, I am asking this as I do have an option of adding more data to increase the number of events(however the response rate will remain the same 0.13%). How many events will be sufficient?

Also, what should be the best strategy here. Stratified sampling or Firth method?

Thanks,
Saurabh

Reply
1. Paul Allison says:
  
  January 22, 2014 at 9:04 am
  
  A frequently mentioned but very rough rule of thumb is that you should have at least 10 events for each parameter estimated. The Firth method is usually good. Stratified sampling (taking all events and a simple random sample of the non-events) is good for reducing computation time when you have an extremely large data set. In that method, you want as many non-events as you can manage.
  
  Reply
Alfonso says:

January 4, 2014 at 10:55 am

Dear Colleagues, sorry to interrupt your discussion but I need of a help from experts.
I am a young cardiologist and I am studying the outcome in patients with coronary ectasia during acute myocardial infarction (very rare condition). I have only 31 events (combined outcome for death, revascularization and myocardial infarction). after Univariate analysis I selected 5 variables. Is it possibile in your opinion to carry on a Cox regression analysis in this case?The EPV is only 31/5: 6.2
Thanks

Reply
1. Paul Allison says:
  
  January 22, 2014 at 9:08 am
  
  It’s probably worth doing, but you need to be very cautious about statistical inference. Your p-values (and confidence intervals) are likely to be only rough approximations. A more conservative approach would be to do exact logistic regression.
  
  Reply
Kelly says:

December 14, 2013 at 12:20 pm

I have a rare predictor (n=41)and a rare outcome. Any guidelines on how may events are needed for the predictor? (Or, the n in a given chi-square cell?)

Thanks so much!

Reply
1. Paul Allison says:
  
  January 22, 2014 at 9:12 am
  
  Check the 2 x 2 table and compute expected frequencies under the independence hypothesis. If they are all > 5 (the traditional rule of thumb) you should be fine.
  
  Reply
Bernhard Schmidt says:

December 10, 2013 at 6:48 am

Dear Dr. Allison,

I have data of 41 patients with 6 events (=death). I am studying the prognostic value of a diagnostic parameter (DP) (numerical) for outcome (survival/death).
In a logistic regression outcome vers DP, DB was significant. However, I like to clarify whether this prognostic value is independant from age, and 3 other dichotomic parameters (gender disease, surgery). In a multiple logistic regression DP was the only significant parameter out of these 5. But I was told the event/no-of-parameters ratio should be at least 5. Therefore, this result has no meaning. Is there any method which could help coming closer to an answer? Or is it simply not enough data (unfortunately, small population is a common problem in clinic studies) Thank you very much for any suggestion.
Bernhard

Reply
1. Paul Allison says:
  
  January 22, 2014 at 9:15 am
  
  Try exact logistic regression, available in SAS, Stata, and some other packages. This is a conservative method, but it has no lower bound on the number of events. You may not have enough data to get reliable results, however.
  
  Reply
John says:

November 22, 2013 at 3:02 pm

Hi Dr. Allison,

You have mentioned that 2000 events out of 100,000 is a good sample for logistic regression, which is 98% – 2% split. I have been always suggested that we should have 80-20 or 70-30 split for logistic regression. And in case such split is not there than we should reduce the data. For example we should keep 2000 events and randomly select 8000 non-event observation and should run model on 10,000 records inplace of 100,000. Please suggest.

Reply
1. Paul Allison says:
  
  November 25, 2013 at 1:28 pm
  
  There is absolutely no requirement that there be an 80-20 split or better And deleting cases to achieve that split is a waste of data.
  
  Reply
PN says:

October 3, 2013 at 3:05 am

I have data set of about 60,000 observations with 750 event cases. I have 5 predictor variables. When I run the logistic regression I get all the predictors as significant. The Concordant pairs are about 80%. However, the over all model fit is not significant. Any suggestions to deal with this?

Reply
1. Paul Allison says:
  
  November 25, 2013 at 2:17 pm
  
  It’s rather surprising that all 5 predictors would be significant (at what level?) but the overall model fit is not significant. Send me your output.
  
  Reply
Scott says:

October 2, 2013 at 2:32 pm

(Correction2 – I sincerely apologize for my errors – the following is a correct and complete version of my question)
Dr. Allison,
I have a sample of 7108 with 96 events. I would like to utilize logistic regression and seem to be OK with standard errors. However, when analyzing standardized residuals for outliers, all but 5 of the 96 cases positive for the event have a SD>1.96. I have a few questions:
1) Is 96 events sufficient for logistic regression?
2) With 96 events, how many predictors would you recommend?
3) In that rare events analysis is really analysis of outliers, how do you deal with identifying outliers in such a case?
Thank you.

Reply
1. Paul Allison says:
  
  November 25, 2013 at 2:20 pm
  
  1. Yes, 96 events is sufficient.
  2. I’d recommend no more than 10 predictors.
  3. I don’t think standardized residuals are very informative in a case like this.
  
  Reply
M says:

September 26, 2013 at 1:05 am

Dear Dr. Allison,

I have a sample with 5 events out of 1500 total sample. Is it possible to perform logistics regression with this sample (I have 5 predictors)? Do you know if Firth method is available with SPSS?

Thank you.

Reply
1. Paul Allison says:
  
  January 22, 2014 at 10:16 am
  
  Not much you can do with just five events. Even a single predictor could be problematic. I’d go with exact logistic regression, not Firth. As far as I know, Firth is not available in SPSS.
  
  Reply
  1. Jon Peck says:
    
    February 4, 2019 at 12:47 pm
    
    Firth logistic regression is available in SPSS Statistics via the STATS FIRTHLOG extension command, which can be installed from the Extensions menu.
    
    Reply
    1. Paul Allison says:
      
      February 26, 2019 at 2:37 pm
      
      Thanks for the info.
      
      Reply
Rich says:

September 24, 2013 at 11:35 pm

According to Stata Manual on the complementary log-log, “Typically, this model is used when the positive (or negative) outcome is rare” but there isn’t much explanation provided.

I tried looking up a few papers and textbooks about clog-log but most simply talk about the asymmetry property.

Can we use clog-log for rare event binary outcome? Which is preferred?

Reply
1. Paul Allison says:
  
  January 22, 2014 at 10:18 am
  
  I’m not aware of any good reason to prefer complementary log-log over logit in rare event situations.
  
  Reply
Karen says:

September 16, 2013 at 12:01 pm

Dr. Allison–
Thank you very much for this helpful post. I am analyzing survey data using using SAS. I am looking at sexual violence and there are only 144 events. Although the overall sample is quite large (over 18,000), due to skip patterns in the survey, I looking at a subpopulation of only sexually active males (the only ones in the survey asked the questions of interest). The standard errors for the overall sample look excellent, but when applying subpopulation analysis the standard errors are large. Do you have any suggestions to address this? I believe that I can’t use the Firth method in this case because I use SAS and it doesn’t seem to be available for Proc Surveylogistic.

Thank you.
–Karen

Reply
1. Paul Allison says:
  
  January 22, 2014 at 10:24 am
  
  How many events in your subpopulation? There may not be much you can do about this.
  
  Reply
Juliette C says:

August 27, 2013 at 8:25 am

Dear Dr. Allison,

I have a population of 810,000 cases with 500 events. I would like to use logit model. I am using about 10 predictors. If I did a logic regression, it could be done goods results in the coefficients estimations (especially for constant term)?

Thank you!

Reply
1. Paul Allison says:
  
  August 27, 2013 at 9:10 am
  
  I see no problem with this. You can judge the quality of the constant term estimate by its confidence interval.
  
  Reply
  1. Juliette C says:
    
    August 27, 2013 at 12:11 pm
    
    I don’t understand because I read in the article https://files.nyu.edu/mrg217/public/binaryresponse.pdf (page 38 talking about king and Zeng’s article) that “logit coefficients can lead us to underestimate the probability of an event even with sample sizes in the thousands when we have rare events data”. In fact, they explain constant term is affected (largely negative) but I think they talk also of biased’s coefficients (page 42).
    
    Also, we can read a lot of things about prior correction with rare event for samples. I am wondering what the interest of this correction? Why should we use a sample rather than the whole population available if the estimates are biaised in both cases?
    
    Reply
    1. Paul Allison says:
      
      August 28, 2013 at 10:09 am
      
      As I said in my post, what matters for bias is not the rarity of events (in terms of a small proportion) but the number of events that are actually observed. If there is concern about bias, the Firth correction is very useful and readily available. I do not believe that undersampling the non-events is helpful in this situation.
      
      Reply
      1. Vy Vuong says:
        
        March 2, 2021 at 3:20 pm
        
        Dear Dr. Allison,
        
        Thank you for your post. I have three questions:
        
        1. Could you please help me understand more what you meant by “what matters for bias is not the rarity of events (in terms of a small proportion) but the number of events that are actually observed”? What is the cut-off to define a rare event if we can’t use the ratio between the event and the non-event?
        
        2. I have a sample size of 5,760 patients – 296 of them overused a medication while the rest (5,464) didn’t overuse. Would you say the 296 patients are rare events?
        
        3. I ran firth logistic regression and regular logistic regression, the results are pretty similar (but not the same). Should I report the firth logistic or regular logistic’s results in the manuscript?
        
        Thank you!
      2. Paul Allison says:
        
        March 3, 2021 at 9:13 am
        
        1. The standard (but very rough) rule of thumb is that you should have at least 10 events for each coefficient that you want to estimate.
        2. By the standard rule of thumb, you should be able to estimate a model with about 29 coefficients.
        3. Either is fine. I’d probably just go with regular logistic.
F. says:

June 13, 2013 at 7:40 am

Dear Dr. Allison,

I have a slightly different problem but maybe you have an idea. I use multinomial logit model. One value of the dependent variable has 100 events, the other 4000 events. The sample sice is 1 900 000. I am thinking the 100 events could be to little.

Thank you!

Reply
1. Paul Allison says:
  
  June 13, 2013 at 10:14 pm
  
  100 might be OK if you don’t have a large number of predictors. But don’t make this category the reference category.
  
  Reply
  1. F. says:
    
    June 16, 2013 at 9:59 am
    
    Thank you,
    
    I am using about ten predictors; would you consider this a low number in this case?
    
    in general: is there an easy to implement way to deal with rare events in a multinomial logit model?
    
    Reply
    1. Paul Allison says:
      
      June 16, 2013 at 8:24 pm
      
      Should be OK.
      
      Reply
John Burton says:

June 2, 2013 at 4:09 pm

Dear Dr Allison,

Is there a threshold that one should adhere to for an independent variable to be used for LR , in terms of ratio of two categories within the independent categorical variable. e.g. If I am trying to assess that in a sample size of 100 subjects, gender is a predictor of getting an infection (coded as 1), but 98 subjects are male and only 2 are females, will the results be reliable due to such disparity between the two categories within the independent categorical variables. [The event rate to variable ratio is set flexibly at 5].

thank you for your advice.

regards

John

Reply
1. Paul Allison says:
  
  June 3, 2013 at 7:27 am
  
  With only 2 females, you will certainly not be able to get reliable estimates of sex differences. That should be reflected in the standard error for your sex coefficient.
  
  Reply
Hongmei says:

May 29, 2013 at 10:26 pm

Paul, I saw your post while searching for more information related to rare events logistic regressions. Thank you for the explanation, but why not zip regression?

Reply
Yotam says:

May 19, 2013 at 3:18 pm

Dear Dr. Allison,

I am analyzing the binary decisions of 500,000 individuals across two periods (so one million observations total). There were 2,500 successes in the first period, and 6,000 in the second. I estimate the effects of 20 predictors per period (40 total). For some reason, both logit and probit models give me null effects to variables that are significant under a linear probability model.

Any thoughts on why this might be the case? Thanks very much.

Reply
1. Paul Allison says:
  
  May 20, 2013 at 9:28 am
  
  Good question. Maybe the LPM is reporting inaccurate standard errors. Try estimating the LPM with robust standard errors.
  
  Reply
  1. Yotam says:
    
    May 21, 2013 at 9:57 pm
    
    Thanks so much for the suggestion. I did use robust standard errors (the LPM requires it as it fails the homoskedasticity assumption by construction), and the variables are still significant under the LPM. I recall reading somewhere that the LPM and logit/probit may give different estimates when modeling rare events, but cannot find a reference supporting this or intuit myself why this might by the case.
    
    Reply
    1. Paul Allison says:
      
      May 22, 2013 at 8:23 am
      
      It does seem plausible that results from LPM and logit would be most divergent when the overall proportion of cases is near 1 or 0, because that’s where there should be most discrepancy between a straight line and the logistic curve. I have another suggestion: check for multicollinearity in the variable(s) that are switching significance. Seemingly minor changes in specification can have major consequences when there is near-extreme collinearity.
      
      Reply
      1. Yotam says:
        
        June 2, 2013 at 8:16 pm
        
        Thanks, and so sorry for the late reply. I think you are right that collinearity may be responsible for the difference. In my analysis, I aim to find the incremental effect of several variables in the latter period (post-treatment) above and beyond effects in the eariler period (pre-treatment). Every variable thus enters my model twice, once alone and once interacted with a period indicator. The variables are, of course, very correlated to themselves interacted with the indicator. Thanks again!
      2. Susan says:
        
        September 25, 2014 at 9:48 pm
        
        Dr.Allison,
        I appreciate your comments on this topic, I want to know is there any articles about the influence of the events of independent variables ? Thanks a lot.
      3. Paul Allison says:
        
        September 26, 2014 at 8:03 am
        
        Sorry, but I don’t know what you mean by “events of independent variables.”
Mathews says:

May 17, 2013 at 7:58 am

Hi Dr.Allison ,

In the case of rare event logistic regressions ( sub 1% ) , would the pseudo R2( Cox and Snell etc ) be a reliable indicator of the model fit since the upper bound of the same depends on the overall probability of occurrence of the event itself. Would a low R2 still represent a poor model ? I’m assuming the confusion matrix may no longer be a great indicator of the model accuracy either ….

Thanks

Reply
1. Paul Allison says:
  
  May 20, 2013 at 9:31 am
  
  McFadden’s R2 is probably more useful in such situations than the Cox-Snell R2. But I doubt that either is very informative. I certainly wouldn’t reject a model in such situations just because the R2 is low.
  
  Reply
Kyle says:

March 4, 2013 at 9:28 pm

Dr. Allison,

I’m wondering your thoughts on this off-the-cuff idea: Say I have 1000 samples and only 50 cases. What if I sample 40 cases and 40 controls, and fit a logistic regression either with a small number of predictors or with some penalized regression. Then predict the other 10 cases with my coefficients, save the MSE, and repeat the sampling, many, many times (say, B). Then build up an estimate for the ‘true’ coefficients based on a weighted average of the B inverse MSEs and beta vectors. ok idea or hugely biased?

Reply
1. Paul Allison says:
  
  March 5, 2013 at 9:41 am
  
  I don’t see what this buys you beyond what you get from just doing the single logistic regression on the sample of 1000 using the Firth method.
  
  Reply
Vaidy says:

March 3, 2013 at 2:44 am

Dear Dr. Allison,

I have an unbalanced panel data on low birth weight kids. I am interested in evaluating the probability of hospital admissions (per 6-months) between 1 to 5 years of age. Birth weight categories are my main predictor variables of interest, but I would also want to account for their time varying effects, by interacting BW categories with age-period. The sample size of the cohort at age1 is ~51,000 but the sample size gets reduced to 19,000 by age5. Hospital admissions in the sample at yrs 1 and 5 are respectively 2,246 and 127. Are there issues in using the logistic procedure in the context of an unbalanced panel data such as the one I have ? Please provide your thoughts as they may apply to 1)pooled logistic regression using cluster robust SE and 2)using a fixed/random effects panel approach ? Many thanks in advance.

Best,

Vaidy

Reply
1. Paul Allison says:
  
  March 4, 2013 at 7:33 am
  
  Regarding the unbalanced sample, a lot depends on why it’s unbalanced. If it’s simply because of the study design (as I suspect), I wouldn’t worry about it. But if it’s because of drop out, then you have to worry about the data not being missing completely at random. If that’s the case, maximum likelihood methods (like random effects models) have the advantage over simply using robust standard errors. Because FE models are also ML estimates, they should have good properties also.
  
  Reply
  1. Vaidy says:
    
    March 6, 2013 at 12:34 am
    
    Dr.Allison,
    
    Thanks for your response. I guess I am saying I have two different issues here with my unbalanced panel: 1)the attrition issue that you rightly brought up; 2) i am concerned about incidental parameters problem by using fixed/random effects logistic regression with heavily attrited data. I ran some probit models to predict attrition and it appears that attrition in my data is mostly random. Is the second issue regarding incidental parameters problem really of concern ? Each panel in my data is composed of minimum two waves. Thanks.
    
    Reply
    1. Paul Allison says:
      
      March 6, 2013 at 6:47 am
      
      First, it’s not possible to tell whether your attrition satisfies the missing at random condition. MAR requires that the probability of a datum being missing does not depend on the value of that datum. But if you don’t observe it, there’s not way to tell. Second, incidental parameters are not a problem if you estimate the fixed effects model by way of conditional likelihood.
      
      Reply
      1. Vaidy says:
        
        March 6, 2013 at 1:49 pm
        
        Thanks for clarifying about the incidental parameters problem. I get your point about the criteria for MAR, that the missigness should not depend on the value of the datum. Key characteristics that could affect attrition are not observed in my data (e.g. SES, maternal characteristics, family income etc.). If there is no way to determine MAR, will it be fine to use a weighting procedure based on the theory of selection on observables ? For e.g. Fitzgerald and Moffit (1998) developed an indirect method to test attrition bias in panel data by using lagged outcomes to predict non-attrition. They call the lagged outcomes as auxillary variables. I ran probit regressions using different sets of lagged outcomes (such as lagged costs, hospitalization status, disability status etc.)and none of the models predicted >10% variation in non-attrition. This essentially means that attrition is probably not affected by observables. But should I still weight my observations in the panel regressions using the predicted probabilities of non-attrition from the probit models ?
        
        Of course, I understand that this still does not address selection on unobservables [and hence your comment about I cannot say that data is missing at random].
        
        Thanks,
        
        Vaidy
      2. Paul Allison says:
        
        March 6, 2013 at 3:39 pm
        
        MAR allows for selection on observables. And ML estimates of fixed and random effects automatically adjust for selection on observables, as long as those observables are among the variables in the model. So there’s no need to weight.
Kara says:

February 28, 2013 at 10:55 am

I am looking at comparing trends in prescription rates over time from a population health database. The events are in the range of 1500 per 100000 people +/- each of 5 years.
The Cochrane Armitage test for trend or logistic regression always seem to be significant even though event rate is going from 1.65 to 1.53. Is there a better test I should be performing or is this just due to large population numbers yielding high power?

thank you,

Reply
1. Paul Allison says:
  
  February 28, 2013 at 3:15 pm
  
  It’s probably the high power.
  
  Reply
Athanasios Theofilatos says:

February 26, 2013 at 11:23 am

I am going to analyze a situation where there are 97 non-events and only 3 events… i will try rare-events logistic as well as bayesian logistic…

Reply
1. Paul Allison says:
  
  February 26, 2013 at 3:05 pm
  
  With only three events, no technique is going to be very reliable. I would probably focus on exact logistic regression.
  
  Reply
Mohammed Shamsul Karim says:

February 21, 2013 at 11:31 am

Dear Paul

I am using a data set of 86,000 observations to study business start-up. The most of the responses are dichotomous. Business start-up rate is 5% which is dependent variable. I used logistic regression and result shows all 10 independent variables are highly significant. I tried rare event and got same result. People are complaining for highly significant result and saying the result may be biased. Would you please suggest me?

Regards

Reply
1. Paul Allison says:
  
  February 26, 2013 at 3:27 pm
  
  Given what you’ve told me, I think your critics are being unreasonable.
  
  Reply
Joe says:

February 13, 2013 at 6:00 pm

Hi Dr. Allison,

You mention in your original post that if a sample has 100,000 cases with 2,000 events, you’re golden. My question is this: from that group of 100,000 cases with 2,000 or so events, what is the appropriate sample size for analysis? I am working with a population of about 100,000 cases with 4,500 events; I want to select a random sample from this, but don’t want the sample to be too small (want to ensure there are enough events in the analysis). A second follow up question – is it ok for my cutoff value in logistic regression to be so low (around 0.04 or so?)
Thank so much for any help you can provide!
Joe

Reply
1. Paul Allison says:
  
  February 15, 2013 at 11:17 am
  
  My question is, do you really need to sample? Nowadays, most software packages can easily handle 100,000 cases for logistic regression. If you definitely want to sample, I would take all 4500 cases with events. Then take a simple random sample of the non-events. The more the better, but at least 4500. This kind of disproportionate stratified sampling on the dependent variable is perfectly OK for logistic regression (see Ch. 3 of my book Logistic Regression Using SAS). And there’s no problem with only .04 of the original sample having events. As I said in the blog post, what matters is the number of events, not the proportion.
  
  Reply
  1. Oded says:
    
    November 4, 2015 at 3:36 am
    
    Dr. Allison,
    
    How do you then change the value of the LR probability prediction for an event, so it will reflect its probability on all the traffic (or rows) and not on sample of them ?
    Thanks
    Oded
    
    Reply
    1. Paul Allison says:
      
      November 19, 2015 at 8:09 pm
      
      There is a simple formula for adjusting the intercept. Let r be the proportion of events in the sample and let p be the proportion in the population. Let b be the intercept you estimate and B be the adjusted intercept. The formula is
      
      B = b – log{[(r/(1-r)]*[(1-p)/p]}
      
      Reply
Kar says:

January 23, 2013 at 10:47 pm

Hi Dr. Allison,

I have a small data set (100 patients), with only 25 events. Because the dataset is small, I am able to do an exact logistic regression. A few questions…

1. Is there a variable limit for inclusion in my model? Does the 10:1 rule that is often suggested still apply?
2. Is there a “number” below which conventional logistic regression is not recommended…i.e. 20?

Thanks and take care.

Reply
1. Paul Allison says:
  
  February 11, 2013 at 3:12 pm
  
  1. I’d probably be comfortable with the more “liberal” rule of thumb of 5 events per predictor. Thus, no more than 5 predictors in your regression.
  2. No there’s no lower limit, but I would insist on exact logistic regression for accurate p-values.
  
  Reply
  1. SAM2013 says:
    
    April 3, 2013 at 11:26 pm
    
    Dr. Allison,
    
    I benefited a lot from your explanation of Exact logistic regression and I read your reply on this comment that you would relax the criteria to only 5 events per predictor instead of 10. I am in this situation right now and I badly need your help. I will have to be able to defend that and I wanna know if there is evidence behind the relaxed 5 events per predictor rule with exact regression?
    
    Thanks a lot.
    
    Reply
    1. Paul Allison says:
      
      April 4, 2013 at 2:00 pm
      
      Below are two references that you might find helpful. One argues for relaxing the 10 events per predictor rule, while the other claims that even more events may be needed. Both papers focus on conventional ML methods rather than exact logistic regression.
      
      Vittinghoff, E. and C.E. McCulloch (2006) “Relaxing the rule of ten events per variable in logistic and Cox regression.” American Journal of Epidemiology 165: 710-718.
      
      Courvoisier, D.S., C. Combescure, T. Agoritsas, A. Gayet-Ageron and T.V. Perneger (2011) “Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure.” Journal of Clinical Epidemiology 64: 993-1000.
      
      Reply
  2. SAM2013 says:
    
    April 3, 2013 at 11:28 pm
    
    Hello again,
    
    I also wanted to confirm this from you, that if I have the gender as a predictor (male, female), this is considered as TWO and not one variables, right?
    
    Thanks.
    
    Reply
    1. Paul Allison says:
      
      April 4, 2013 at 11:11 am
      
      Gender is only one variable.
      
      Reply
      1. SAM2013 says:
        
        April 5, 2013 at 6:05 am
        
        Thank you very much for your help. I guess I gave you a wrong example for my question. I wanted to know if a categorical variable has more than two levels, would it still be counted as one variable for the sake of the rule we are discussing?
        
        Also, do we have to stick to the 5 events per predictor if we use Firth, or can we violate the rule completely, and if it is OK to violate it, do I have to mention a limitation about that?
        
        Sorry for the many questions.
        
        Thanks
      2. Paul Allison says:
        
        April 5, 2013 at 8:42 am
        
        What matters is the number of coefficients. So a categorical variable with 5 categories would have four coefficients. Although I’m not aware of any studies on the matter, my guess is that the same rule of thumb (of 5 or 10 events per coefficient), would apply to the Firth method. Keep in mind, however, that this is only the roughest rule of thumb. It’s purpose is to ensure that the asymptotic approximations (consistency, efficiency, normality) aren’t too bad. But it is not sufficient to determine whether the study has adequate power to test the hypotheses of interest.
Adwait says:

December 21, 2012 at 2:47 pm

Hi Dr. Allison,

If the event I am analyzing is extremely rare (1 in 1000) but the available sample is large (5 million) such that there are 5000 events in the sample, would logistic regression be appropriate? There are about 15-20 independent variables that are of interest to us in understanding the event. If an even larger sample would be needed, how much larger should it be at a minimum?

If logistic regression is not suitable, what are our options to model such an event?

Thanks,
Adwait

Reply
1. Paul Allison says:
  
  December 21, 2012 at 3:08 pm
  
  Yes, logistic regression should be fine in this situation. Again, what matters is the number of the rarer event, not the proportion.
  
  Reply
elham says:

December 18, 2012 at 4:28 am

Hi,
I am a phD student at biostatistics. I have a data set with approximately 26000 cases where there are only 110 events. I used the method of weighting for rare events in Gary King article. My goal was to estimate ORs in a logistic regression,unfortunetly standard errors and confidence intervals are big , and there is a little difference with usual logistic regression. I dont no why, what is your idea? can I use penalized likelihood?

Reply
1. Paul Allison says:
  
  December 18, 2012 at 9:36 am
  
  My guess is that penalized likelihood will give you very similar results. 110 events is enough so that small sample bias is not likely to be a big factor–unless you have lots of predictors, say, more than 20. But the effective sample size here is a lot closer to 110 than it is to 26,000. So you may simply not have enough events to get reliable estimates of the odds ratios. There’s no technical fix for that.
  
  Reply
  1. kiran kumar says:
    
    February 11, 2014 at 7:09 am
    
    Paul,
    
    Please clear me this. I have the sample of 16000 observations with equal number of good and bads. Is it good way of building the model or should I reduce the bads.
    
    Reply
    1. Paul Allison says:
      
      February 11, 2014 at 2:09 pm
      
      Don’t reduce the bads. There would be nothing to gain in doing that, and you want to use all the data you have.
      
      Reply
Georg Heinze says:

October 24, 2012 at 9:19 am

I fully agree with Paul Allison. We have done extensive simulation studies with small samples, comparing the Firth method with ordinary maximum likelihood estimation. Regarding point estimates, the Firth method was always superior to ML. Furthermore, it turned out that confidence intervals based on the profile penalized likelihood were more reliable in terms of coverage probability than those based on standard errors. Profile penalized likelihood confidence intervals are available, e.g., in SAS/PROC LOGISTIC and in the R logistf package.

Reply
Partha says:

October 16, 2012 at 2:51 pm

On a different note, I have read in Paul’s book that when there is a proportionality violation, creating time-varying covariates with the main predictor, and testing for its significance is both the diagnosis and the cure.

So, if the IV is significant after the IV*duration is also significant, then, are we ok to interpret the effect?

How does whether the event is rare or not affect the value of the above procedure?

Reply
1. Paul Allison says:
  
  January 11, 2013 at 8:23 am
  
  Yes, if the IV*duration is significant, you can go ahead and interpret the “effect” which will vary with time. The rarity of the event reduces the power of this test.
  
  Reply
harry says:

October 16, 2012 at 8:18 am

“Does anyone have a counter-argument? If so, I’d like to hear it.”
I usually default to using Firth’s method, but in some cases the true parameter really is infinite. If the response variable is presence of testicular cancer and one of the covariates is sex, for example. In that case, it’s obvious that sex should not be in the model, but in other cases it might not be so obvious, or the model might be getting fit as part of an automated process.

Reply
Partha says:

October 10, 2012 at 11:28 am

Is this the case with PHREG as well? If you have 50 events for 2000 observations, will using the firth option the appropriate one if your goal is to not only model likelihood but also the median time to event?

Reply
1. Paul Allison says:
  
  October 10, 2012 at 11:59 am
  
  The Firth method can be helpful in reducing small-sample bias in Cox regression, which can arise when the number of events is small. The Firth method can also be helpful with convergence failures in Cox regression, although these are less common than in logistic regression.
  
  Reply
  1. Tarana Lucky says:
    
    February 20, 2013 at 7:57 pm
    
    I am interested to determine what are the significant factors associated an “outcome”, which is a binary variable in my sample.My sample size from a cross-sectional survey is 20,000 and the number of respondents with presence of “outcome” is 70. Which method would be appropriate, multiple logistic or poisson regression?
    
    Thanks.
    
    Reply
    1. Paul Allison says:
      
      February 26, 2013 at 3:07 pm
      
      There is no reason to consider Poisson regression. For logistic regression, I would use the Firth method.
      
      Reply
Paul Allison says:

July 14, 2012 at 7:56 am

This has no advantage over logistic regression. There’s still small sample bias if the number of events is small. Better to use exact logistic regression (if computationally practical) or the Firth method.

Reply
1. Rose Ignacio says:
  
  September 12, 2012 at 11:18 am
  
  Can you please explain further why you say Poisson regression has no advantage over logistic regression when we have rare events? Thanks.
  
  Reply
  1. Paul Allison says:
    
    September 12, 2012 at 4:20 pm
    
    When events are rare, the Poisson distribution provides a good approximation to the binomial distribution. But it’s still just an approximation, so it’s better to go with the binomial distribution, which is the basis for logistic regression.
    
    Reply
Dr. Md. Zakir Hossain says:

July 14, 2012 at 4:49 am

I am thinking to use Poisson regression in case where event is rare, since p (probability of success) is very small and n (sample size is large).

Reply
1. SAY Ahmadi says:
  
  April 12, 2020 at 5:29 am
  
  Dear MD,
  
  I like Poisson regression. But I unfortunately in such cases number even is not for individual samples, but for individual groups of samples.
  
  What is your suggestion?
  
  Regards!
  
  Reply

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Logistic Regression for Rare Events

Comments

Leave a Reply Cancel reply