Missing Data

A 3-Day Remote Seminar Taught by
Paul Allison, Ph.D.

Read reviews of this course

To see a sample of the course materials, click here.

If you’re using conventional methods for handling missing data, you may be missing out. Conventional methods for missing data, like listwise deletion or regression imputation, are prone to three serious problems:

  • Inefficient use of the available information, leading to low power and Type II errors.
  • Biased estimates of standard errors, leading to incorrect p-values.
  • Biased parameter estimates, due to failure to adjust for selectivity in missing data.

More accurate and reliable results can be obtained with maximum likelihood or multiple imputation.

Although these newer methods for handling missing data have been around for more than two decades, they have only become practical with the introduction of widely available and user friendly software.

Starting September 9, we are offering this seminar as a 3-day synchronous*, remote workshop for the first time. Each day will consist of a 4-hour live lecture held via the free video-conferencing software Zoom. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

Each lecture session will conclude with a hands-on exercise reviewing the content covered, to be completed on your own. An additional lab session will be held Thursday and Friday afternoons, where you can review the exercise results with the instructor and ask any questions.

*We understand that scheduling is difficult during this unpredictable time. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.

Closed captioning is available for all live and recorded sessions.


Maximum likelihood and multiple imputation have very similar statistical properties. If the assumptions are met, they are approximately unbiased and efficient. What’s remarkable is that these newer methods depend on less demanding assumptions than those required for older methods for handling missing data.

Maximum likelihood is available for logistic regression, Cox regression, and regression for count data. Multiple imputation can be used for virtually any statistical problem.

This course will cover the theory and practice of both maximum likelihood and multiple imputation. Maximum likelihood for linear models will be demonstrated with SAS, Stata, and Mplus. Multiple imputation will be demonstrated with both SAS and Stata. Slides and exercises using R are also available to participants on request.


This is a hands-on course. To optimally benefit, you are strongly encouraged to use a computer with a recent version of SAS or Stata installed. Mplus and LEM will also be used in the section on maximum likelihood estimation, but will not be used for exercises. For those who prefer R, R slides and exercises using these packages can be downloaded from the course site. 

There is now a free version of SAS, called the SAS OnDemand for Academics, that works in your web browser. 

Seminar participants who are not yet ready to purchase Stata could take advantage of StataCorp’s free 30-day evaluation offer or their 30-day software return policy.

Who should register?

Virtually anyone who does statistical analysis can benefit from new methods for handling missing data. To optimally benefit from this course, you should have a good working knowledge of the principles and practice of linear regression, as well as elementary statistical inference. But you do not need to know matrix algebra, calculus, or likelihood theory. You should be at least moderately proficient at using one of these packages: SAS, Stata or R. 


  1. Assumptions for missing data methods
  2. Problems with conventional methods
  3. Maximum likelihood (ML)
  4. ML with EM algorithm
  5. Direct ML with Mplus, Stata and SAS
  6. ML for contingency tables
  7. Multiple Imputation (MI)
  8. MI under multivariate normal model
  9. MI with SAS and Stata
  10. MI with categorical and nonnormal data
  11. Interactions and nonlinearities
  12. Using auxiliary variables
  13. Other parametric approaches to MI
  14. Linear hypotheses and likelihood ratio tests
  15. Nonparametric and partially parametric methods
  16. Fully conditional models
  17. MI and ML for nonignorable missing data

Reviews of missing data

“I have only recently become aware of the importance of missing data. This course is a great introduction to a topic I knew very little about. I feel like it has opened up a whole new frontier in how I handle data.”
  Bob Reed, University of Canterbury

“Although I have struggled to understand how to handle missing data for several years, I could not get a clear understanding until I attended this course. The depth and breadth of the ways to deal with missing data taught by Professor Allison are beyond rival!”
  Yunhwan Lee, Ajou University School of Medicine

“I’ve been struggling with how to deal with missing data in an analysis and have been putting off that analysis because of this. I had some missing data techniques training in grad school, but it was super helpful to have an in-depth review of strategies for handling missing data in this short course. I now feel prepared to tackle the analysis I’ve been avoiding after this course, especially how to handle categorical variables and interactions. I highly recommend this course!”
  Sylvia Badon, Kaiser Permanente

“My graduate program minimally covered missing data. When you get into real-world analysis, especially research and health care data, missing data is a problem. It is difficult to fully grasp the complexities of the underlying mechanisms to know the best approach when many data guides and manuals are very technical and few resources offer compare and contrast. I learned information and concepts I would have never known otherwise and certainly it would make any published findings problematic. I highly recommend this course to ensure high quality analysis when encountering missing data. You get enough foundation to take this back to your workplace in the amount of time offered for the workshop.”
  Deejay Zwaga, University of Wisconsin