Missing Data

A 2-Day Seminar Taught by Paul Allison, Ph.D.

Read reviews of this course

To see a sample of the course materials, click here.

If you’re using conventional methods for handling missing data, you may be missing out. Conventional methods for missing data, like listwise deletion or regression imputation, are prone to three serious problems:

  • Inefficient use of the available information, leading to low power and Type II arlyrrors.
  • Biased estimates of standard errors, leading to incorrect p-values.
  • Biased parameter estimates, due to failure to adjust for selectivity in missing data.

More accurate and reliable results can be obtained with maximum likelihood or multiple imputation.

These new methods for handling missing data have been around for at least a decade, but have only become practical in the last few years with the introduction of widely available and user friendly software. Maximum likelihood and multiple imputation have very similar statistical properties. If the assumptions are met, they are approximately unbiased and efficient–that is, they have minimum sampling variance. 

What’s remarkable is that these newer methods depend on less demanding assumptions than those required for conventional methods for handling missing data. Maximum likelihood is available for linear models, logistic regression and Cox regression. Multiple imputation can be used for virtually any statistical problem.

This course will cover the theory and practice of both maximum likelihood and multiple imputation. Maximum likelihood for linear models will be demonstrated with SAS, Stata, and Mplus. Mplus will also be used for maximum likelihood with logistic regression. Multiple imputation will be demonstrated with both SAS and Stata.


To optimally benefit, you are strongly encouraged to bring your own laptop with a recent version of SAS or Stata installed. However, no previous experience with either software is assumed. Power outlets will be provided at each seat.

There is now a free version of SAS, called the SAS University Edition, that is available to anyone. It has everything needed to run the exercises in this course, and it will run on Windows, Mac or Linux computers. However, you do need a 64-bit machine with at least 1 GB of RAM. You also have to download and install virtualization software that is available free from third-party vendors. The SAS Studio interface runs in your browser, but you do not have to be connected to the Internet. The download and installation are a bit complicated, but well worth the time and effort.  

Seminar participants who are not yet ready to purchase Stata could take advantage of StataCorp’s free 30-day evaluation offer or their 30-day software return policy. 

Who should attend?

Virtually anyone who does statistical analysis can benefit from new methods for handling missing data. To take this course, you should have a good working knowledge of the principles and practice of multiple regression, as well as elementary statistical inference. But you do not need to know matrix algebra, calculus, or likelihood theory. 


The class will meet from 9 am to 5 pm each day with a 1-hour lunch break at Temple University Center City, 1515 Market Street, Philadelphia, PA 19103. 

Participants receive a bound manual containing detailed lecture notes (with equations and graphics), examples of computer printout, and many other useful features. This book frees participants from the distracting task of note taking. 

Registration and lodging

The fee of $995.00 includes all seminar materials.

Refund Policy
If you cancel your registration at least two weeks before the course is scheduled to begin, you are entitled to a full refund (minus a processing fee of $50). 

Lodging Reservation Instructions
A block of guest rooms has been reserved at the Club Quarters Hotel, 1628 Chestnut Street, Philadelphia, PA at a special rate of $154. This location is about a 5 minute walk to the seminar location. In order to make reservations, call 203-905-2100 during business hours and identify yourself by using group code STA607 or click here. For guaranteed rate and availability, you must reserve your room no later than Monday, May 8, 2017. 

If you make reservations after the cut-off date ask for the Statistical Horizons room rate (do not use the code) and they will try to accommodate your request. 


  1. Assumptions for missing data methods
  2. Problems with conventional methods
  3. Maximum likelihood (ML)
  4. ML with EM algorithm
  5. Direct ML with Mplus, Stata and SAS
  6. ML for contingency tables
  7. Multiple Imputation (MI)
  8. MI under multivariate normal model
  9. MI with SAS and Stata
  10. MI with categorical and nonnormal data
  11. Interactions and nonlinearities
  12. Using auxiliary variables
  13. Other parametric approaches to MI
  14. Linear hypotheses and likelihood ratio tests
  15. Nonparametric and partially parametric methods
  16. Fully conditional models
  17. MI and ML for nonignorable missing data

Comments by recent participants

“The structure and pacing of this course is the best I have ever encountered. The break schedule once per hour is ideal. The level of complexity is just right, and the rate of progression through the material is comfortable. I never felt bored or overwhelmed. Plus, Paul Allison is a great speaker, and genuinely cares about the topic-as well as having extensive personal and professional experience with missing data methods and applications. In the midst of all this, there was plenty of opportunity for questions and discussion.”
  Clark Andersen, University of Texas Medical Branch, Shriners Hospital

“I am very happy I took this course. It addresses a number of myths and common misconceptions about missing data and valid ways of dealing with it. I learned a great deal. In addition, I feel confident that I can implement what I learned in my own work. I highly recommend this course.”
  John Jemmott, University of Pennsylvania 

“Paul’s Missing Data workshop provides a comprehensive overview of missing data theories and practical advice on how to deal with this often difficult matter. What impressed me the most is how effective a teacher Paul is. Having taken many statistics classes, I found it is extremely rare to find a great instructor. Thank you for being excellent at both statistics and teaching.”
  Jing Li, Rice University

“The “Missing Data” course is a great introductory class to understand how to deal with missing data in real life. Dr. Paul Allison is a wonderful teacher. He has structured the course in such a manner that one can understand the concepts and then be able to perform the analyses for the dataset in their hand.”
  Nandita Biswas, GlaxoSmithKlein

“”Paul is a great instructor, and uses multiple software programs to demonstrate the main ideas behind data analyses when data are missing at random.”

“The course is even-paced, and the content is appropriate for mid and high-level research. You will be exposed to various techniques handling missing data and learn to analyze missing data with poplular statistical software. You’ll grasp the gist of missing data analysis in the two-day workshop!
  Ling Na, University of Pennsylvania 

“This course offered practical applications and methods to a common problem in analysis of data. I will come back to the course materials and examples when analyzing my own data in the future.”
  Derek Brown, Murtha Cancer Center