At the 1998 Annual Meeting of the American Political Science Association, Gary King and three co-authors presented a paper titled “Listwise deletion is evil: What to do about missing data in political science.” The paper was later published under a different title in the American Political Science Review, but the original title has stuck in my head ever since. Is listwise deletion really evil? How much harm has it caused? Should we avoid it at all costs?
King et al. claimed that, on average, data analysis in political science research typically loses about a third of the cases due to listwise deletion of missing data (also known as complete case analysis). As a consequence, the increase in mean squared error is comparable to what you might expect from omitted variable bias. They then made the case that multiple imputation would be far superior in the vast majority of research projects.
LEARN MORE IN A SEMINAR WITH PAUL ALLISON
Now I’ve been a proponent of multiple imputation for many years, although I’ve also argued that maximum likelihood (ML) methods for handling missing data may be even better. Nevertheless, I still believe that listwise deletion is not as bad as many people think, and even has advantages over ML and multiple imputation in some applications.
King et al. were obviously correct that listwise deletion can lead to massive losses of data, which can substantially increase the probability of Type II errors. But with the rise of “big data”, many researchers now find themselves in situations where statistical power is not a major issue. If listwise deletion reduces your sample size from a million to 500,000, loss of power is probably not going to keep you up at night.
In cases like this, the focus should shift to bias. Which method—listwise deletion or multiple imputation—is going to give you the least bias? It’s well known that listwise deletion does not introduce bias if the data are missing completely at random (MCAR). Under MCAR, listwise deletion is equivalent to simple random sampling, and we know that simple random sampling does not lead to bias.
But MCAR is a very strong assumption, and there are usually many reasons to suspect violations. For example, if men are less likely to report their income than women, then data on income are not MCAR. That would certainly lead to biased estimates of mean income for the whole population.
By contrast standard methods for multiple imputation and ML are approximately unbiased under the much weaker assumption that data are missing at random (MAR). Under MAR, it’s OK if men are less likely to report their income than women, as long as the probability of reporting income does not depend on income itself (within each gender). Thus, MAR allows the probability of missingness to depend on observed variables, and that’s a major advantage over listwise deletion in reducing bias.
This is all well known. What most researchers don’t know is that listwise deletion may actually be less biased than multiple imputation or ML when data are missing on predictor variables in regression analysis. For example, suppose you’re doing a linear regression in which the dependent variable is number of children and one of the predictors is annual income. Suppose, further, that 30% of respondents did not report their income. Multiple imputation or ML, when done correctly, can produce approximately unbiased estimates of the regression coefficients when income data are MAR. But so will listwise deletion. And, remarkably, listwise deletion will produce unbiased estimates even if the data are not missing at random.
You can find a simple proof of this result in footnote 1 of my book Missing Data, although I was certainly not the first to prove it. What this means is that even if people with high income are less likely to report their income (a violation of MAR), that won’t lead to bias if you use listwise deletion. But it certainly could lead to bias if you use standard implementations of multiple imputation or ML.
There are two important caveats here. First, the probability that predictors are missing cannot depend on the dependent variable. Thus, in our example, the probability that income is missing cannot depend on the number of children.
Second, this property of listwise deletion presumes that you are estimating a correctly specified regression model. In particular, if the regression coefficients are actually different for different subgroups but your model doesn’t allow for this, then listwise deletion can skew your results more toward one subgroup or the other. Consequently, your estimates won’t be unbiased estimates of the (misspecified) regression in the population.
The robustness of listwise deletion to “not missing at random” extends to any kind of regression, not just linear. For predicting number of children, for example, you might prefer a negative binomial regression to a linear regression, and the same result would apply.
For logistic regression, the result is even stronger. You can have missing data that are not missing at random on the dependent variable, and logistic regression using listwise deletion will still give approximately unbiased estimates of the regression coefficients (but not the intercept). For example, suppose the dependent variable is whether or not people graduate from college, and people who graduate are much more likely to report their graduation status than those who do not. That’s not a problem for listwise deletion, but it would definitely be a problem for multiple imputation or ML.
For this property of logistic regression, the caveat is that missingness on the dependent variable cannot depend on the predictor variables. Incidentally, the proof of this result is also the justification of the famous case-control method in epidemiology.
Here’s another little-known fact about listwise deletion. If data are missing only on the dependent variable and the data are missing at random, then listwise deletion is equivalent to maximum likelihood. And ML can’t be improved upon by multiple imputation, so you might as well just do listwise deletion. Multiple imputation would only add more sampling variation to the estimates.
So the upshot is this. If listwise deletion still leaves you with a large sample, you might reasonably prefer it over maximum likelihood or multiple imputation. At the least, you should think carefully about the relative advantages and disadvantages of these methods, and not dismiss listwise deletion out of hand.
Finally, if you compare listwise deletion with other traditional methods like pairwise deletion, dummy variable adjustment, or conventional imputation, there’s really no contest. The other methods either get the standard errors wrong, the parameter estimates wrong, or both. At a minimum, listwise deletion gives you “honest” standard errors that reflect the actual amount of information used. And it’s by far the easiest method to apply.
A bibliographic note. The King et al. conference paper did *not* have 12 co-authors. That appears to be a web scraping error. The revised paper was published in the APSR with the original set of authors.
King, G., Honaker, J., Joseph, A., & Scheve, K. (2001, March). Analyzing incomplete political science data: An alternative algorithm for multiple imputation. In American Political Science Association (Vol. 95, No. 01, pp. 49-69).
Honaker, J., King, G., & Blackwell, M. (2011). Amelia II: A program for missing data. Journal of Statistical Software, 45(7), 1-47.
Thanks. The post has now been corrected.