When researchers estimate multinomial logit models, they are often advised to test a property of the models known as the independence of irrelevant alternatives (IIA). I’ve long been suspicious of IIA tests, but I never took the time to carefully investigate them. Until now. What I’ve learned is that IIA tests have a better foundation than I thought, but that they are still problematic in important ways.

In discrete choice theory, the IIA assumption says that when people are asked to choose among a set of alternatives, their odds of choosing A over B should not depend on whether some other alternative C is present or absent. As an example, consider the 1992 U.S. presidential election. The two major party candidates were Bill Clinton and George H. W. Bush. But H. Ross Perot was also on the ballot in all 50 states. The IIA assumption says that, for each voter, the odds of choosing Clinton over Bush was not affected by Perot’s presence on the ballot.

Let’s put this into formulas. Let pC, pB and pP be the probabilities of choosing Clinton, Bush and Perot, respectively. The multinomial logit model can be expressed as two, simultaneous, binary logit models,

log(piC/piB)  = b1xi

log(piP/piB)  = b2xi

where xi is a column vector of predictors for individual i, and b1 and b2 are row vectors of coefficients. The IIA property says that b1 is the same regardless of whether Perot is on the ballot and that b2 is the same regardless of whether Clinton is on the ballot.

Several tests have been proposed to test this assumption. The two most common are the Hausman-McFadden test (1984) and the Small-Hsiao test (1985). Both employ the same general strategy: for each alternative, delete individuals who chose that alternative and re-estimate the model for the remaining alternatives; then construct a test comparing the new estimates with the original estimates.

In our voting example, for instance, we could exclude the people who voted for Perot, and estimate a binary logit model predicting Clinton vs. Bush. We could then test whether the binary coefficients were the same as the multinomial coefficients. We could also exclude the people who voted for Clinton and re-estimate the second equation by binary logit. Again, we could compare the binary coefficients with the multinomial coefficients. And, finally, we could redo the test with Bush as the excluded category.

Here’s what always bothered me about tests like this:  Since Perot was on the ballot in every state, how can we possibly get evidence of how people would vote if he were not on the ballot?  It hardly seems sufficient to redo the analysis after excluding people who voted for Perot. The people who remain still had Perot as an option when they cast their vote for Clinton or Bush.

Keep in mind that the principle of IIA and the tests of that assumption were developed in the framework of discrete choice theory, where people often have different choice sets and the model is estimated by conditional logistic regression. In such settings, there are clear opportunities to test the IIA assumption. But even in discrete choice applications, many data sets have the same set of choices available to all individuals.

So that’s where I stood until a few weeks ago. But after examining the literature, getting the advice of some knowledgeable people (notably, Scott Long and Bob Gray), and doing some simulations, I’ve concluded that I was wrong about my central objection. Comparing the binary logits with multinomial logits can tell you something about the appropriateness of the multinomial model. But maybe not as much as we would like. Here’s what I’ve learned:

1.  If you’re estimating a “saturated” model, the binary logits will always be identical to the multinomial logits, and no test of IIA is possible. What’s a saturated model?  One with a single categorical predictor.  Or one with multiple categorical predictors together with all possible interactions.  Because such models perfectly predict the cell frequencies in the multi-way contingency table, there’s no information “left over” to test the fit of the model. So any tests of IIA necessarily depend on parametric restrictions on the right hand side of the regression model.

2.  In the more typical case where the predictors are not saturated, there are alternative models that imply that the binary logit coefficients will not converge in probability to the same values as the multinomial logit coefficients.

One such model is the nested logit model which does not have the IIA property. For our voting example, suppose that people first decided whether or not they would vote for Perot, and suppose that decision was governed by a binary logit model. Then, among those who decided not to vote for Perot, the choice between Clinton and Bush was governed by a second binary logit model.*

It can easily be demonstrated by simulation that if you mistakenly estimate a multinomial logit model rather than the correct nested logit model, the coefficients comparing Clinton with Bush will not converge to the correct values. But the binary logit coefficients, among people who didn’t vote for Perot, will converge to the correct values. So the Hausman-McFadden test or the Small-Hsiao test would seem like sensible ways to discriminate between the nested logit and the multinomial logit models.

3. Simulation studies by Fry and Harris (1996, 1998) and Cheng and Long (2007) have shown that both the Hausman-McFadden test and the Small-Hsiao test perform rather poorly, even in large samples. Specifically, the actual probability of rejecting the null hypothesis was often quite different than the nominal alpha level. The magnitude of these discrepancies varied greatly across different data structures. Based on their results, Cheng and Long went so far as to say that “tests of the IIA assumption that are based on the estimation of a restricted choice set are unsatisfactory for applied work.”

Given point 3, I still can’t recommend these tests to anyone. And given point 1, it seems that any test of the IIA property will depend critically on the parameterization of the model (which is consistent with the findings of Cheng and Long). But the fact remains that the multinomial logit model can, in principle, be empirically disconfirmed–even when everyone has the same choice set.

*The model proposed here is simpler than the nested logit model proposed by Hausman and McFadden (1984), which has an additional parameter. However, that parameter is not identified when there are only three possible outcomes.

References:

Cheng, Simon and J. Scott Long (2006) “Testing for IIA in the Multinomial Logit Model.”  Sociological Methods & Research: 35: 583-600.

Fry, Tim R. L. and Mark N. Harris (1996) “A Monte Carlo Study of Tests for the Independence of Irrelevant Alternatives Property.” Transportation Research Part B: Methodological 30:19-30.

Fry, Tim R. L. and Mark N. Harris (1998) “Testing for Independence of Irrelevant Alternatives: Some Empirical Results.” Sociological Methods & Research 26: 401-23.

Hausman, Jerry A. and Daniel McFadden (1984) “Specification Tests for the Multinomial Logit Model.” Econometrica 52:1219-40.

Small, Kenneth A. and Cheng Hsiao (1985) “Multinomial Logit Specification Tests.” International Economic Review 26:619-27.