Longitudinal Data Analysis Using Stata

A 2-Day Seminar Taught by Paul Allison, Ph.D.

Read reviews of this course

To see a sample of the course materials, click here.


The most common type of longitudinal data is panel data, consisting of measurements of predictor and response variables at two or more points in time for many individuals. Such data have two major attractions: the ability to control for unobservables, and the investigation of causal ordering.

However, there is also a major difficulty with panel data: repeated observations are typically correlated, and this invalidates the usual assumption that observations are independent. As a result, confidence intervals and p-values can be severely biased. In some cases, coefficients may also be biased downward.

This course covers four methods for solving the problem of dependent observations: robust standard errors, generalized estimating equations, random effects models and fixed effects models. You’ll learn how to use these methods for quantitative outcomes, categorical outcomes, and count data outcomes. You’ll also learn which methods are best suited for different kinds of applications.

This is a hands-on seminar with ample opportunities to practice these new methods.

Here are a few of the topics you won’t want to miss:

  • How to use panel data to control for unobserved variables.
  • Why fixed effects methods often give very different results from random effects methods.
  • How to reshape data from long form to wide form and back again.
  • Why the default correlation structure for GEE is usually not the best.
  • The difference between maximum likelihood and restricted maximum likelihood.
  • How to estimate and interpret random coefficient models.
  • Why first-order autoregressive structures are usually unsatisfactory.
  • The difference between subject-specific coefficients and population-averaged coefficients, and why it matters.
  • How to do longitudinal analysis using ordered logit or multinomial logit.

In this seminar, we will use the following Stata commands: reg, reshape, xtreg, areg, mixed, xtset, xtgee, logit, xtlogit, clogit, melogit, meologit, nbreg, menbreg, lrtest, margins, marginsplot, hausman, xthybrid, and xtdpdml. Lecture notes using SAS are available on request from registered participants.


This seminar will use Stata for the many empirical examples and exercises. However, no previous experience with Stata is assumed. Lecture notes and exercises using SAS and R are also available on request. To participate in the hands-on exercises, you are strongly encouraged to bring a laptop computer with Stata installed (release 13 or higher; IC, SE, or MP versions are all acceptable). A power outlet and wireless access will be available at each seat.

Seminar participants who are not yet ready to purchase Stata could take advantage of StataCorp’s free 30-day evaluation offer or their 30-day software return policy.


If you need to analyze longitudinal data and have a basic statistical background, this course is for you. You should have a good working knowledge of the principles and practice of multiple regression, as well as elementary statistical inference. It is also helpful to have some familiarity with logistic regression. But you do not need to know matrix algebra, calculus, or likelihood theory. 

LOCAtion, Format, And Materials 

The class will meet from 9 am to 5 pm each day with a 1-hour lunch break at the Courtyard by Marriott Washington Embassy Row, 1600 Rhode Island Ave NW, Washington, D.C. 20036.

Participants receive a bound manual containing detailed lecture notes (with equations and graphics), examples of computer printout, and many other useful features. This book frees participants from the distracting task of note taking. 

Registration and lodging

The fee of $995.00 includes all course materials. The early registration fee of $895 is available until October 30.

Refund Policy

If you cancel your registration at least two weeks before the course is scheduled to begin, you are entitled to a full refund (minus a processing fee of $50). 

Lodging Reservation Instructions 

A block of guest rooms has been reserved at the Courtyard by Marriott Washington Embassy Row, 1600 Rhode Island Ave NW, Washington, D.C. 20036, where the seminar takes place, at a special rate of $115. In order to make reservations, call 888-236-2427 or 202-448-8004 during business hours and identify yourself as part of the Statistical Horizons LLC group or click here. For guaranteed rate and availability, you must reserve your room no later than Thursday, November 1, 2018.


1. Opportunities and challenges of panel data.
        a. Basic data structure and notation
        b. Why do we want panel data?
        c. Problem of dependence
        d. Software considerations

2. Linear models
        a. Robust standard errors
        b. Generalized estimating equations
        c. Random effects models
        d. Fixed effects models
        e. Between-within (hybrid) models

3. Logistic regression models
       a. Robust standard errors
       b. Generalized estimating equations
       c. Subject-specific vs. population averaged methods
       d. Random effects models
       e. Fixed effects models
        f. Between-within (hybrid) models

4. Methods for count data
       a. Poisson and negative binomial models.
       b. Robust standard errors.
       c. GEE
       d. Random effects
       e. Fixed Effects
        f. Between-within (hybrid) models

5. Linear structural equation models
     a. Fixed and random effects in the SEM framework
     b. xtdpdml command
     c. Models for reciprocal causation with lagged effects


“I felt like my ability to model moved to the next level. Dr. Allison was very clear and the resources included are going to be referenced by me for decades to come! I look forward to my next Statistical Horizons class!”
  Albert Do, Yale University

“As an epidemiologist, this course has helped me to appreciate the depth and possibilities that are available with advanced stats.”
  Sebhat Erqou, University of Pittsburgh

“Excellent introduction to longitudinal analysis with different applications. Builds knowledge and competence to analyze data appropriately.”
  Israel Sánchez-Cardona, Carlos Albizu University

“Paul provides clear explanations for complex topics, which is reinforced by hands-on practice in Stata with syntax you can use in the future.”
  Samantha Farris, Brown University

“If you are like me, it is difficult to learn statistical methods from a book. These two days were better than weeks trying to glean the information from books. Try asking a question to a book!”
  James Nebus, Suffolk University

“The course provided a deep dive into longitudinal analysis and made me feel comfortable with swimming in that new ocean. Great details and well thought out lectures and examples.”
  Yochai Eisenberg, University of Illinois at Chicago 

“Best stats course on this topic that I have taken. For the first time, I feel comfortable trying out the methods with my own data. My background in longitudinal analysis was minimal and I was able to follow along well. Highly recommend.”
  Erika Bloom, Brown University

“The longitudinal data analysis course helped me immensely with my dissertation. I learned enough in 2 days to tackle issues I had been having with my analysis for weeks. I feel like it saved me time and headaches. I actually can’t wait to get back home and implement what I’ve learned.”
  Sarah Rutland, The University of Alabama at Birmingham