Missing Data Using SAS®

A Self-Paced, Online Seminar Designed and Narrated by
Paul D. Allison, Ph.D.

Read reviews of the in-person version of this seminar
Watch a sample of this seminar

For more than 10 years, Dr. Paul Allison has been presenting a 2-day, in-person seminar on Missing Data at various locations around the US.  Based in part on his book Missing Data (Sage 2001), this seminar covers both the theory and practice of two modern methods for handling missing data: multiple imputation and maximum likelihood.

Many researchers have told us that they would love to take the course but just can’t manage the time or the money to attend the live sessions. So, for the past two years, we have been working on a web-based version of this seminar. It’s finally up and running, and we are really pleased at how it turned out.

The course is completely self paced and can be accessed with any recent web browser on almost any platform, including iPhone, iPad, and Android devices. It consists of 12  modules:

  1. Basic principles and assumptions.
  2. Conventional methods for missing data.
  3. Maximum likelihood (ML) for categorical variables.
  4. ML and the EM algorithm.
  5. Direct ML with SEM software and with mixed models.
  6. Basic principles of multiple imputation (MI).
  7. MI for non-monotone data using MCMC.
  8. MCMC options and complications.
  9. Fully conditional specification.
  10. Multivariate inference, interactions, and nonlinearities.
  11. Other methods, panel data, clustered data.
  12. Non-ignorable missing data.

Each module begins with an introductory video, followed by a narrated PowerPoint presentation. The modules contain all the slides in the live, 2-day version of the course. But there are also many additional slides that wouldn’t fit into the live course, including several slides on imputation with clustered data.

Each module is followed by a short multiple-choice quiz to test your knowledge. Half of the modules are also followed by exercises that ask you to apply what you’ve learned to a real data set using SAS.

There is also an online discussion board where you can post questions or comments about any aspect of the course. All questions will be promptly answered by Dr. Allison.

Estimated time to complete the course is 15-20 hours.


Conventional methods for missing data, like listwise deletion or regression imputation, are prone to three serious problems:

  • Inefficient use of the available information, leading to low power and Type II errors.
  • Biased estimates of standard errors, leading to incorrect p-values.
  • Biased parameter estimates, due to failure to adjust for selectivity in missing data.

More accurate and reliable results can be obtained with maximum likelihood or multiple imputation.

These new methods for handling missing data have been around for at least a decade, but have only become practical in the last few years with the introduction of widely available and user friendly software. Maximum likelihood and multiple imputation have very similar statistical properties. If the assumptions are met, they are approximately unbiased and efficient–that is, they have minimum sampling variance. 

What’s remarkable is that these newer methods depend on less demanding assumptions than those required for conventional methods for handling missing data. Maximum likelihood is available for linear models, logistic regression and Cox regression. Multiple imputation can be used for virtually any statistical problem.

This course will cover the theory and practice of both maximum likelihood and multiple imputation. All the methods will be demonstrated using SAS, with detailed code for PROC MI, PROC CALIS, PROC MIXED, and PROC GLIMMIX. 

WHO SHOULD sign up?

Virtually anyone who does statistical analysis can benefit from new methods for handling missing data. To take this course, you should have a good working knowledge of the principles and practice of multiple regression, as well as elementary statistical inference. But you do not need to know matrix algebra, calculus, or likelihood theory. 

reviews of the live version of Missing data

“Professor Allison’s short courses have always been very practical. The math was discussed at the right level and the time on application was very well spent. His notes on when one should and shouldn’t use certain methods are also very important. I would recommend his short courses to others.”
  Yihua Gu, AbbVie

“Paul is very knowledgeable. The workshop has a good balance of theory and application, including instruction in various software programs. If you want to improve your understanding of missing data treatments and/or receive the latest information on such methods, I highly recommend this workshop.”
  Keenan Pituch, University of Texas at Austin

“This is an excellent course that provides participants with a comprehensive review of all important methods about missing data. It is also an amazing course covering many statistical models (though the linear model or regression serves as the key model), and almost all available software packages. I highly recommend this course to every researcher, from beginners to sophisticated analysts!”
  Shenyang Guo, University of North Carolina at Chapel Hill

“Excellent course – great opportunity to learn many aspects of data development and model development using many different software and statistical methods.”
  Paul Holness, Statistics Canada

“While I always knew missing data issues were a problem, they were only mentioned in passing in other statistics course. This course was a great broad and also in depth tour of the issues and how best to handle them in different situations. I now feel equipped to apply these methods in both basic and complex analyses and with some confidence.”
  Alison Papadakis, Loyola University Maryland