Missing Data

A 2-Day Seminar Taught by Paul Allison, Ph.D.

Read reviews of this course

To see a sample of the course materials, click here.

If you’re using conventional methods for handling missing data, you may be missing out. Conventional methods for missing data, like listwise deletion or regression imputation, are prone to three serious problems:

  • Inefficient use of the available information, leading to low power and Type II errors.
  • Biased estimates of standard errors, leading to incorrect p-values.
  • Biased parameter estimates, due to failure to adjust for selectivity in missing data.

More accurate and reliable results can be obtained with maximum likelihood or multiple imputation. Although these newer methods for handling missing data have been around for more than two decades, they have only become practical with the introduction of widely available and user friendly software.

Maximum likelihood and multiple imputation have very similar statistical properties. If the assumptions are met, they are approximately unbiased and efficient. What’s remarkable is that these newer methods depend on less demanding assumptions than those required for older methods for handling missing data.

Maximum likelihood is available for linear models, logistic regression and Cox regression. Multiple imputation can be used for virtually any statistical problem.

This course will cover the theory and practice of both maximum likelihood and multiple imputation. Maximum likelihood for linear models will be demonstrated with SAS, Stata, and Mplus. Multiple imputation will be demonstrated with both SAS and Stata. Slides and exercises using R are also available to participants on request.


This is a hands-on course with at least one hour each day devoted to carefully structured and supervised assignments. To optimally benefit, you are strongly encouraged to bring your own laptop with a recent version of SAS or Stata installed. 

There is now a free version of SAS, called the SAS University Edition, that is available to anyone. It has everything needed to run the exercises in this course, and it will run on Windows, Mac or Linux computers. However, you do need a 64-bit machine with at least 1 GB of RAM. You also have to download and install virtualization software that is available free from third-party vendors. The SAS Studio interface runs in your browser, but you do not have to be connected to the Internet. The download and installation are a bit complicated, but well worth the time and effort.  

Seminar participants who are not yet ready to purchase Stata could take advantage of StataCorp’s free 30-day evaluation offer or their 30-day software return policy.

Who should attend?

Virtually anyone who does statistical analysis can benefit from new methods for handling missing data. To take this course, you should have a good working knowledge of the principles and practice of multiple regression, as well as elementary statistical inference. But you do not need to know matrix algebra, calculus, or likelihood theory. 

Location, Format, and materials

The class will meet from 9 am to 5 pm each day with a 1-hour lunch break at the SpringHill Suites San Diego Downtown/Bayfront, 900 Bayfront Court, San Diego, CA 92101. 

Participants receive a bound manual containing detailed lecture notes (with equations and graphics), examples of computer printout, and many other useful features. This book frees participants from the distracting task of note taking. 

Registration and Lodging

The fee of $995 includes all course materials. The early registration fee of $895 is available until January 22.

Refund Policy

If you cancel your registration at least two weeks before the course is scheduled to begin, you are entitled to a full refund (minus a processing fee of $50). 

Lodging Reservation Instructions

A block of guest rooms has been reserved at the SpringHill Suites San Diego Downtown/Bayfront, 900 Bayfront Court, San Diego, CA 92101, where the seminar takes place, at a special rate of $209 per night. In order to make reservations, call 888-287-9400 during business hours and identify yourself as part of the Statistical Horizons LLC group staying at the SpringHill Suites San Diego Downtown/Bayfront, or click here. For guaranteed rate and availability, you must reserve your room no later than Tuesday, January 22, 2019.


  1. Assumptions for missing data methods
  2. Problems with conventional methods
  3. Maximum likelihood (ML)
  4. ML with EM algorithm
  5. Direct ML with Mplus, Stata and SAS
  6. ML for contingency tables
  7. Multiple Imputation (MI)
  8. MI under multivariate normal model
  9. MI with SAS and Stata
  10. MI with categorical and nonnormal data
  11. Interactions and nonlinearities
  12. Using auxiliary variables
  13. Other parametric approaches to MI
  14. Linear hypotheses and likelihood ratio tests
  15. Nonparametric and partially parametric methods
  16. Fully conditional models
  17. MI and ML for nonignorable missing data

Comments by recent participants

“I learned a lot from this course. Paul’s expertise helped translate this complicated topic into manageable action steps for real world data analysis.”
  Laura Finan, Prevention Research Center

“Very informative, thorough; covers all conventional methods of addressing missing data. Paul explains the questions clearly and also taps into other indirectly relevant but fundamental principles in statistics.”
  Yang Yang, University of Louisiana at Lafayette

“Dr. Allison is an excellent instructor with a thorough knowledge of missing data methods. The course cemented my understanding of multiple imputations. With examples given in-class and as exercises, it is possible to perform many analyses and ask questions for clarification. Dr. Allison was available to answer in-class and work questions about missing data. Great course!”
  Elaine Hoffman, PPD

“The Missing Data course provided an excellent overview of both ML and MI. The inclusion of SAS, Stata, and Mplus allowed for practical implementation of the concepts.”

“Examples were extremely well thought out and particularly useful to understand the theoretical background of missing data.”
  Luis Ahumada, Johns Hopkins

“It’s the best way to learn a subject in a minimal amount of time. I’ve taken a few courses from Statistical Horizons. The quality has been consistently high. The efficiency drives me to come back for more courses.”
  Sylvia Shen, Verscend Inc.

“Great SAS examples with explanation of each step. Highly applied to my field of work and not ‘abstract.’”
  Salma Musaad, University of Illinois

“The course is very instructive and comprehensive. Handout activities are very useful for software methodologies at work. Conceptual problems and data issues are treated in a very detailed and understandable way.”
  Carlo Di Chiacchio, Istituto Nazionale per la Valutazione del Sistema dell’Istruzione