When researchers estimate multinomial logit models, they are often advised to test a property of the models known as the independence of irrelevant alternatives (IIA). I’ve long been suspicious of IIA tests, but I never took the time to carefully investigate them. Until now. What I’ve learned is that IIA tests have a better foundation than I thought, but that they are still problematic in important ways.
In discrete choice theory, the IIA assumption says that when people are asked to choose among a set of alternatives, their odds of choosing A over B should not depend on whether some other alternative C is present or absent. As an example, consider the 1992 U.S. presidential election. The two major party candidates were Bill Clinton and George H. W. Bush. But H. Ross Perot was also on the ballot in all 50 states. The IIA assumption says that, for each voter, the odds of choosing Clinton over Bush was not affected by Perot’s presence on the ballot.
LEARN MORE IN A SEMINAR WITH PAUL ALLISON
Let’s put this into formulas. Let pC, pB and pP be the probabilities of choosing Clinton, Bush and Perot, respectively. The multinomial logit model can be expressed as two, simultaneous, binary logit models,
log(piC/piB) = b1xi
log(piP/piB) = b2xi
where xi is a column vector of predictors for individual i, and b1 and b2 are row vectors of coefficients. The IIA property says that b1 is the same regardless of whether Perot is on the ballot and that b2 is the same regardless of whether Clinton is on the ballot.
Several tests have been proposed to test this assumption. The two most common are the Hausman-McFadden test (1984) and the Small-Hsiao test (1985). Both employ the same general strategy: for each alternative, delete individuals who chose that alternative and re-estimate the model for the remaining alternatives; then construct a test comparing the new estimates with the original estimates.
In our voting example, for instance, we could exclude the people who voted for Perot, and estimate a binary logit model predicting Clinton vs. Bush. We could then test whether the binary coefficients were the same as the multinomial coefficients. We could also exclude the people who voted for Clinton and re-estimate the second equation by binary logit. Again, we could compare the binary coefficients with the multinomial coefficients. And, finally, we could redo the test with Bush as the excluded category.
Here’s what always bothered me about tests like this: Since Perot was on the ballot in every state, how can we possibly get evidence of how people would vote if he were not on the ballot? It hardly seems sufficient to redo the analysis after excluding people who voted for Perot. The people who remain still had Perot as an option when they cast their vote for Clinton or Bush.
Keep in mind that the principle of IIA and the tests of that assumption were developed in the framework of discrete choice theory, where people often have different choice sets and the model is estimated by conditional logistic regression. In such settings, there are clear opportunities to test the IIA assumption. But even in discrete choice applications, many data sets have the same set of choices available to all individuals.
So that’s where I stood until a few weeks ago. But after examining the literature, getting the advice of some knowledgeable people (notably, Scott Long and Bob Gray), and doing some simulations, I’ve concluded that I was wrong about my central objection. Comparing the binary logits with multinomial logits can tell you something about the appropriateness of the multinomial model. But maybe not as much as we would like. Here’s what I’ve learned:
1. If you’re estimating a “saturated” model, the binary logits will always be identical to the multinomial logits, and no test of IIA is possible. What’s a saturated model? One with a single categorical predictor. Or one with multiple categorical predictors together with all possible interactions. Because such models perfectly predict the cell frequencies in the multi-way contingency table, there’s no information “left over” to test the fit of the model. So any tests of IIA necessarily depend on parametric restrictions on the right hand side of the regression model.
2. In the more typical case where the predictors are not saturated, there are alternative models that imply that the binary logit coefficients will not converge in probability to the same values as the multinomial logit coefficients.
One such model is the nested logit model which does not have the IIA property. For our voting example, suppose that people first decided whether or not they would vote for Perot, and suppose that decision was governed by a binary logit model. Then, among those who decided not to vote for Perot, the choice between Clinton and Bush was governed by a second binary logit model.*
It can easily be demonstrated by simulation that if you mistakenly estimate a multinomial logit model rather than the correct nested logit model, the coefficients comparing Clinton with Bush will not converge to the correct values. But the binary logit coefficients, among people who didn’t vote for Perot, will converge to the correct values. So the Hausman-McFadden test or the Small-Hsiao test would seem like sensible ways to discriminate between the nested logit and the multinomial logit models.
3. Simulation studies by Fry and Harris (1996, 1998) and Cheng and Long (2007) have shown that both the Hausman-McFadden test and the Small-Hsiao test perform rather poorly, even in large samples. Specifically, the actual probability of rejecting the null hypothesis was often quite different than the nominal alpha level. The magnitude of these discrepancies varied greatly across different data structures. Based on their results, Cheng and Long went so far as to say that “tests of the IIA assumption that are based on the estimation of a restricted choice set are unsatisfactory for applied work.”
Given point 3, I still can’t recommend these tests to anyone. And given point 1, it seems that any test of the IIA property will depend critically on the parameterization of the model (which is consistent with the findings of Cheng and Long). But the fact remains that the multinomial logit model can, in principle, be empirically disconfirmed–even when everyone has the same choice set.
*The model proposed here is simpler than the nested logit model proposed by Hausman and McFadden (1984), which has an additional parameter. However, that parameter is not identified when there are only three possible outcomes.
References:
Cheng, Simon and J. Scott Long (2006) “Testing for IIA in the Multinomial Logit Model.” Sociological Methods & Research: 35: 583-600.
Fry, Tim R. L. and Mark N. Harris (1996) “A Monte Carlo Study of Tests for the Independence of Irrelevant Alternatives Property.” Transportation Research Part B: Methodological 30:19-30.
Fry, Tim R. L. and Mark N. Harris (1998) “Testing for Independence of Irrelevant Alternatives: Some Empirical Results.” Sociological Methods & Research 26: 401-23.
Hausman, Jerry A. and Daniel McFadden (1984) “Specification Tests for the Multinomial Logit Model.” Econometrica 52:1219-40.
Small, Kenneth A. and Cheng Hsiao (1985) “Multinomial Logit Specification Tests.” International Economic Review 26:619-27.
Comments
What test alternative could use if IIA test violate multinomial assumptions?
Well, my post concludes that none of the available tests are satisfactory when individual has the same choice set. Maybe someone has come up with a better test since then, but I’m not aware of it. Irrespective of any test, you may have good theoretical reasons for believing that IIA is not satisfied. In that case, you may want to consider alterantive models, like the nested logit model.
This is very interesting! May I ask a question? I have a dataset where individuals are nested within clusters that correspond to different countries in different years. My dv is a categorical variable containing 4 options. The problem is that not all 4 options are always available in all countries_years. For instance, Canada 1997 and Ireland 2011 both have options 1 to 4, but Canada 2004 only has options 1 to 3 because option 4 was discontinued in Canada for 2004. Can I still run a multilevel multinomial logit model with non-homogeneous options? Does this relate to IIA?
You should probably be doing conditional logit, which conditions on the choice set available to each individual.
Thanks for this wonderful explanation. Currently i’m dealing with a mlogit model and i wanted to run the IIA test to be sure that is not violeted. The command mlogtest, iia is not working with me i don’t know why. I’m doing my analysis with stata. Any help will be appreciated
Thanks in advance
Have you installed spost13? If so, what kind of error message are you getting?
I have read quite a few papers using MNL(MulatiNomial Logit) models. Some have used the IIA tests (both Hausman-McFadden and Small-Hsiao), while some have not. So is it necessary to conduct these tests for the MNL or can we bypass them?
In my opinion, they have little to no value for the vast majority of applications.
It feels to me that this is akin to a missing data problem.
We condition on the third category of our outcome when comparing the other two which raises possibility of collider bias. Since we have no idea what the distribution of binary Y (0/1) would have been amongst those who chose the third option we are faced with the possibility that the unobserved distribution of 0/1 will be either marginally or conditional different to the observed. I am coming around to the idea that lack of IIA may actually be the same as MNAR (binary Y conditionally related to R(Y)) in which case it’s no surprise that there is no formal test for it.
This sounds quite plausible.
I’ve been thinking about this a lot over the last week. I believe the problem boils down to an unmeasured common cause of your different outcomes. This can be demonstrated with causal DAGs.
It’s been quite a challenge to get my head round what the outcomes might be, particularly as I work in medical stats where it’s hard to think of illness or death as a choice.
Provided all common causes are measured then the outcomes can be rendered conditional independent in other words we have IIA.
Makes sense to me.
Is it possible to test for iia in Stata when using a multinominal logistic regression constructed with gsem?
I don’t think there’s any built-in method, but it might be possible with some custom programming.
I’m using STATA, how can I test the IIA assumption?
Use the command mlogtest, iia
Thanks for this insightful post.
I have a question regarding the purported repercussions of violating the IIA assumption: does the violation of IIA only distorts the causal inference from the regression coefficients, or contaminates the predictive power of the model too? That’s also related to an earlier post of yours here (Prediction vs. Causation in the Regression Analysis).
I am running a multinational logit model to predicts the propensity of selecting to different doses of a treatment (these predicted values are used to produce sample weights for the regression of the effect later in the next stage). So accuracy of the coefficients (i.e., causation) is not of concern in this matching multinational logit model (just interested in predictive efficiency to estimate the propensity of each observation to select into different doses of treatment to produce accurate weights). Plus none of the references (on multi-dose matching) that I reviewed tested for IIA although apparently doses of a given treatment are not distinct enough to satisfy the IIA. A reviewer is asking us whether our model satisfies the IIA assumption, but if violating IIA has no repercussions for predictive capacity of the model (as earlier matching literature suggests), its violation should have no bearing for our model choice. “Generalized ordered logit model” (that relaxes the IIA) produces similar results, but at the cost of inflating the standard errors (plus I found nobody testing for IIA in this context—while its violation is very likely—to opting to “generalized ordered logit model”).
For predictive modeling, I personally think the IIA assumption is irrelevant.
In Point 1, you say “If you’re estimating a “saturated” model, the binary logits will always be identical to the multinomial logits, and no test of IIA is possible. What’s a saturated model? One with a single categorical predictor. Or one with multiple categorical predictors together with all possible interactions. …there’s no information “left over” to test the fit of the model. So any tests of IIA necessarily depend on parametric restrictions on the right hand side of the regression model.”
By this, do you mean that no test of IIA is possible if the mlogit model has been parameterized to include all possible categories of the dependent variable? For instance, let’s say a survey question pertains to choice of contraception, and the options are classified into three exclusive categories: (a) traditional, (b) modern, and (c) none. All choices are classified into these three categories, none are left out, and there are only these three categories. If the model is set up with reference category (c), such that (a) is compared to (c), and (b) is compared to (c), then there is no alternative category even available, right? And hence in this case, would you say that the IIA assumption is met, or that it is not testable?
No, “saturation” is not about the dependent variable, it’s about the predictor variables.
If I’m using SPSS, how do I test the Independance of Irrelevant Alternatives assumption?
As far as I know, there’s not standard test for IIA in SPSS. SPSS does offer the nested logit and multinomial probit models–both of which relax the IIA assumption–but only for choice-based data. To do it yourself, consult the reference in my post.
Good article.
However, if the survey analyst has (much more useful) ‘feeling thermometer’/’likelihood to vote for’ scores for each of the presidential candidates (say, scored 0-100) available to them, pretty much this entire methodological discussion falls away and one could use generalized linear models instead.
A question, since I am not sure that I understood.
If Clinton with Bush will not converge to the correct values,
Would be ok to do the binary logistic regression, such as C vs B, P vs C, P v B?
Franco, I guess in this case the sample is not random (as it is also in case of IIA tests), so you cannot estimate parameters with the use of such truncated dataset. Am I right? Isn’t it also the problem of IIA tests?