Missing Data

A 2-Day Seminar Taught by Paul Allison, Ph.D.

Read reviews of this course

To see a sample of the course materials, click here.


If you’re using conventional methods for handling missing data, you may be missing out. Conventional methods for missing data, like listwise deletion or regression imputation, are prone to three serious problems:

  • Inefficient use of the available information, leading to low power and Type II errors.
  • Biased estimates of standard errors, leading to incorrect p-values.
  • Biased parameter estimates, due to failure to adjust for selectivity in missing data.

More accurate and reliable results can be obtained with maximum likelihood or multiple imputation. Although these newer methods for handling missing data have been around for more than two decades, they have only become practical with the introduction of widely available and user friendly software.

Maximum likelihood and multiple imputation have very similar statistical properties. If the assumptions are met, they are approximately unbiased and efficient. What’s remarkable is that these newer methods depend on less demanding assumptions than those required for older methods for handling missing data.

Maximum likelihood is available for linear models, logistic regression and Cox regression. Multiple imputation can be used for virtually any statistical problem.

This course will cover the theory and practice of both maximum likelihood and multiple imputation. Maximum likelihood for linear models will be demonstrated with SAS, Stata, and Mplus. Multiple imputation will be demonstrated with both SAS and Stata. Slides and exercises using R are also available to participants on request.


Computing

This is a hands-on course with at least one hour each day devoted to carefully structured and supervised assignments. To optimally benefit, you are strongly encouraged to bring your own laptop with a recent version of SAS or Stata installed. 

There is now a free version of SAS, called the SAS University Edition, that is available to anyone. It has everything needed to run the exercises in this course, and it will run on Windows, Mac or Linux computers. However, you do need a 64-bit machine with at least 1 GB of RAM. You also have to download and install virtualization software that is available free from third-party vendors. The SAS Studio interface runs in your browser, but you do not have to be connected to the Internet. The download and installation are a bit complicated, but well worth the time and effort.  

Seminar participants who are not yet ready to purchase Stata could take advantage of StataCorp’s free 30-day evaluation offer or their 30-day software return policy.


Who should attend?

Virtually anyone who does statistical analysis can benefit from new methods for handling missing data. To take this course, you should have a good working knowledge of the principles and practice of multiple regression, as well as elementary statistical inference. But you do not need to know matrix algebra, calculus, or likelihood theory. 


Location, Format, and materials

The class will meet from 9 am to 5 pm each day with a 1-hour lunch break at Temple University Center City, 1515 Market Street, Philadelphia, PA 19103. 

Participants receive a bound manual containing detailed lecture notes (with equations and graphics), examples of computer printout, and many other useful features. This book frees participants from the distracting task of note taking. 


Registration and Lodging

The fee of $995.00 includes all seminar materials.

Refund Policy

If you cancel your registration at least two weeks before the course is scheduled to begin, you are entitled to a full refund (minus a processing fee of $50). 

Lodging Reservation Instructions

A block of guest rooms has been reserved at the Club Quarters Hotel, 1628 Chestnut Street, Philadelphia, PA at a special rate of $164 per night. This location is about a 5-minute walk to the seminar location. In order to make reservations, call 203-905-2100 during business hours and identify yourself by using group code SH1003 or click here. For guaranteed rate and availability, you must reserve your room no later than Tuesday, September 3, 2019.

If you need to make reservations after the cut-off date, you may call Club Quarters directly and ask for the “Statistical Horizons” rate (do not use the code or mention a room block) and they will try to accommodate your request.


SEMINAR OUTLINE 

  1. Assumptions for missing data methods
  2. Problems with conventional methods
  3. Maximum likelihood (ML)
  4. ML with EM algorithm
  5. Direct ML with Mplus, Stata and SAS
  6. ML for contingency tables
  7. Multiple Imputation (MI)
  8. MI under multivariate normal model
  9. MI with SAS and Stata
  10. MI with categorical and nonnormal data
  11. Interactions and nonlinearities
  12. Using auxiliary variables
  13. Other parametric approaches to MI
  14. Linear hypotheses and likelihood ratio tests
  15. Nonparametric and partially parametric methods
  16. Fully conditional models
  17. MI and ML for nonignorable missing data

Comments by recent participants

“Great introduction to these materials. The course provided a great introductory foundation to these methods and a good theoretical understanding for future self-study and follow up.”
  Daniel Chu, University of Southern California

“Come here for two days and you will learn more about missing data (and what the right and wrong things to do about missing data) than you can learn by yourself in more than a month.”
  Li Chao, University of Pennsylvania

“Paul is an erudite scholar who understands the importance of speaking plainly and with patience to his students. This comprehensive overview provides beginners and advanced students alike a veritable toolbox to handle the problems associated with missing data.”
  William Resh, University of Southern California

“I really benefited from this course. It was a lot of information to take in, but was presented in a clear, organized format. There was also a good balance between content on detailed methods, real-world examples, and consideration of modifications for various data types or analyses. The content provided will be a valuable resource for me as I work with multiple datasets with missing data.”
  Jessi Tobin, University of Southern California

“This course was good to review important statistical concepts combined with very practical examples using provided data. Writing SAS code and interpreting the results really solidified the concepts.”
  Meghan Warren, Northern Arizona University

“The MD seminar has rich contents. I was able to learn this topic systematically.”
  Pey-Jiuan Lee, University of Southern California

“It gives a practical guidance on how to analyze real-life data with missingness as well as the brief foundation of the algorithm behind each method.”
  Anonymous

“This was my first course on missing data since my graduate school days more than 15 years ago. Paul did an excellent job at taking the time to explain every concept and every step behind the different approaches to dealing with missing data. The two-day course was intensive but Paul’s teaching style made it very enjoyable. Thank you. I had a great time!”
  Michaël Bonnal, University of Tennessee, Chattanooga

“Very knowledgeable instructor who communicated clearly and had well-organized, detailed materials. Would recommend the course.”
  Wilson Vincent, University of California, San Francisco 

“I really like this course. The materials are very helpful in terms of directing how to actually perform the tests.”
  Xiaoyan Wang, Washington University in St. Louis

“I highly recommend this course since it covers from the theory to practical points. Also the hands-on experience with various syntax for several softwares (SAS, Stata, R, Mplus) was the most helpful point of this course.”
  Hae Sun Suh, Pusan National University

“Dr. Allison’s explanations are clear and effectively balance theory with application. It was especially helpful that syntax was provided for SAS, Stata, and R.”
  Emily Dworkin, University of Washington