Allison PicIn my April post, I described a new method for testing the goodness of fit (GOF) of a logistic regression model without grouping the data. That method was based on the usual Pearson chi-square statistic applied to the ungrouped data. Although Pearson’s chi-square does not have a chi-square distribution when data are not grouped, it does have approximately a normal distribution (under the null hypothesis that the fitted model is correct). By subtracting the mean (which happens to be the sample size) and dividing by an appropriate standard deviation, you get a z-statistic that has pretty good properties—better than the Hosmer-Lemeshow test in simulation studies.

But there are other GOF tests for ungrouped data. One that deserves serious consideration is Stukel’s test, which is easily calculated with standard logistic regression software. Stukel (1988) proposed a generalization of the logistic regression model that has two additional parameters. These allow for departures from the logistic curve as it approaches either 1 or 0. Special cases of the model also include (approximately) the complementary log-log model and the probit model.

The logistic model can be tested against this more general model by a simple procedure. Let gi be the linear predictor from the fitted model, that is, gi = xib where xi is the vector of covariate values for individual i and b is the vector of estimated coefficients. Then create two new variables:

     za = g2 if g>=0, otherwise za = 0
     zb = g2 if g<0, otherwise zb = 0.

Add these two variables to the logistic regression model and test the null hypothesis that both of their coefficients are equal to 0. Stukel suggested a score test, but there’s no obvious reason to prefer that to a Wald test or a likelihood ratio test. Note that in many data sets, g is either never greater than 0 or never less than 0. In those cases, only one z variable is necessary.

Here’s an example of how to calculate a Wald version of Stukel’s test with Stata. I used a well-known data set on labor force participation of 753 married women (Mroz 1987). The dependent variable inlf is coded 1 if a woman was in the labor force, otherwise 0. A logistic regression model was fit with six predictors.

logistic inlf kidslt6 age educ huswage city exper
predict g, xb
gen za=(g>=0)*g^2
gen zb=(g<0)*g^2
logistic inlf kidslt6 age educ huswage city exper za zb
test za zb

This program produced a chi-square of .11 with 2 df and a p-value of .95. Clearly there is no evidence for misspecification. A likelihood ratio test comparing the two models produced almost exactly the same result.

Here’s the equivalent SAS code:

proc logistic data=my.mroz;
model inlf(desc) = kidslt6 age educ huswage city exper;
output out=a xbeta=g;
data b;
set a;
proc logistic data=b;
model inlf(desc) = kidslt6 age educ huswage city exper za zb;
test za=0,zb=0;

How well does the Stukel test stack up against alternatives? For detecting quadratic departures from linearity, simulation studies suggest that the Stukel test is a little less powerful than either the standardized Pearson test (mentioned above) or the traditional Hosmer-Lemeshow test (Hosmer et al. 1997). For detecting interactions, however, Stukel’s test is more powerful than the standardized Pearson (Allison 2014), which was previously shown to be more powerful than Hosmer-Lemeshow (Hosmer and Hjort 2002). Finally, Stukel is considerably more powerful than either of the other two at detecting departures from the logit link function (Hosmer et al. 1997).

So the Stukel test is definitely worth using, possibly in conjunction with the standardized Pearson test. It’s also worth noting the resemblance of the Stukel test to a misspecification test that is frequently recommended by econometricians. Ramsey (1969) proposed including the square (and possibly higher powers) of the predicted values in a regression, and testing for statistical significance. The Stukel test is different only insofar as it splits the squared predicted values into two separate components.


Allison, Paul D. (2014) “Measures of fit for logistic regression.” Paper 1485-2014 presented at the SAS Global Forum, Washington, DC.

Hosmer, D.W.and N.L. Hjort (2002) “Goodness-of-fit processes for logistic regression: Simulation results.” Statistics in Medicine 21:2723–2738.

Hosmer, D.W., T. Hosmer, S. Le Cessie and S. Lemeshow (1997). “A comparison of goodness-of-fit tests for the logistic regression model.” Statistics in Medicine 16: 965–980.

Mroz, T.A. (1987) “The sensitiviy of an empirical model of married women’s hours of work to economic and statistical assumptions.” Econometrica 55: 765-799.

Ramsey, J.B. (1969) “Tests for specification errors in classical linear least squares regression analysis.” Journal of the Royal Statistical Society, Series B. 31: 350–371.

Stukel, T.A. (1988) “Generalized logistic models.” Journal of the American Statistical Association 83: 426–431.