2017 Stata Summer School:

Missing Data Using Stata

Taught by Paul Allison, Ph.D.
August 15-16, Hotel Birger Jarl Conference
Stockholm, Sweden 

Read reviews of this seminar 

To see a sample of the course materials, click here.

If you’re using conventional methods for handling missing data, you may be missing out. Conventional methods for missing data, like listwise deletion or regression imputation, are prone to three serious problems:

  • Inefficient use of the available information, leading to low power and Type II errors.
  • Biased estimates of standard errors, leading to incorrect p-values.
  • Biased parameter estimates, due to failure to adjust for selectivity in missing data.

More accurate and reliable results can be obtained with maximum likelihood or multiple imputation.

These new methods for handling missing data have been around for at least a decade, but have only become practical in the last few years with the introduction of widely available and user friendly software. Maximum likelihood and multiple imputation have very similar statistical properties. If the assumptions are met, they are approximately unbiased and efficient–that is, they have minimum sampling variance. 

What’s remarkable is that these newer methods depend on less demanding assumptions than those required for conventional methods for handling missing data. Maximum likelihood is available for linear models, logistic regression and Cox regression. Multiple imputation can be used for virtually any statistical problem.

This course will cover the theory and practice of both maximum likelihood and multiple imputation using Stata. It will focus on the mi command for multiple imputation and the sem command for maximum likelihood. 


This seminar will use Stata 14 for the many empirical examples and exercises. However, no previous experience with Stata is assumed. Lecture notes using SAS are available on request to registered participants. To participate in the hands-on exercises, you are strongly encouraged to bring a laptop computer.  

If you do not already have Stata installed, a temporary license will be provided free of change. A power outlet and wireless access will be available at each seat.


Virtually anyone who does statistical analysis can benefit from new methods for handling missing data. To take this course, you should have a good working knowledge of the principles and practice of multiple regression, as well as elementary statistical inference. But you do not need to know matrix algebra, calculus, or likelihood theory. 


Participants receive a bound manual containing detailed lecture notes (with equations and graphics), examples of computer printout, and many other useful features. This book frees participants from the distracting task of note taking. 


 Please go to the Metrika website for information on registration, and discounted hotel accommodations.

Seminar OUTLINE 

  1. Assumptions for missing data methods
  2. Problems with conventional methods
  3. Maximum likelihood (ML)
  4. ML with EM algorithm
  5. Direct ML
  6. ML for contingency tables
  7. Multiple Imputation (MI)
  8. MI under multivariate normal model
  9. MI with Stata
  10. MI with categorical and nonnormal data
  11. Interactions and nonlinearities
  12. Using auxiliary variables
  13. Other parametric approaches to MI
  14. Linear hypotheses and likelihood ratio tests
  15. Nonparametric and partially parametric methods
  16. Fully conditional models
  17. MI and ML for nonignorable missing data


“The structure and pacing of this course is the best I have ever encountered. The break schedule once per hour is ideal. The level of complexity is just right, and the rate of progression through the material is comfortable. I never felt bored or overwhelmed. Plus, Paul Allison is a great speaker, and genuinely cares about the topic- as well as having extensive personal and professional experience with missing data methods and applications. In the midst of all this, there was plenty of opportunity for questions and discussion.”
  Clark Andersen, University of Texas Medical Branch, Shriners Hospital

“I am very happy I took this course. It addresses a number of myths and common misconceptions about missing data and valid ways of dealing with it. I learned a great deal. In addition, I feel confident that I can implement what I learned in my own work. I highly recommend this course.”
  John Jemmott, University of Pennsylvania 

“Paul’s Missing Data workshop provides a comprehensive overview of missing data theories and practical advice on how to deal with this often difficult matter. What impressed me the most is how effective a teacher Paul is. Having taken many statistics classes, I found it is extremely rare to find a great instructor. Thank you for being excellent at both statistics and teaching.”
  Jing Li, Rice University

“The “Missing Data” course is a great introductory class to understand how to deal with missing data in real life. Dr. Paul Allison is a wonderful teacher. He has structured the course in such a manner that one can understand the concepts and then be able to perform the analyses for the dataset in their hand.”
  Nandita Biswas, GlaxoSmithKlein

“The course is even-paced, and the content is appropriate for mid and high-level research. You will be exposed to various techniques handling missing data and learn to analyze missing data with poplular statistical software. You’ll grasp the gist of missing data analysis in the two-day workshop!
  Ling Na, University of Pennsylvania 

“This course offered practical applications and methods to a common problem in analysis of data. I will come back to the course materials and examples when analyzing my own data in the future.”
  Derek Brown, Murtha Cancer Center