Longitudinal Data Analysis Using Stata

A 3-Day Remote Seminar Taught by
Paul Allison, Ph.D.

For many years, Dr. Paul Allison has been teaching his acclaimed two-day seminar on Longitudinal Data Analysis Using Stata to audiences around the world. This course covers several popular methods for the analysis of longitudinal data with repeated measures: robust standard errors, generalized least squares, generalized estimating equations, random effects models and fixed effects models

Starting January 28, we are offering this seminar as a 3-day synchronous*, remote workshop. Each day will consist of a 4-hour live lecture held via the free video-conferencing software Zoom. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

Each lecture session will conclude with a hands-on exercise reviewing the content covered, to be completed on your own. An additional session will be held Thursday and Friday afternoons as an “office hour”, where you can review the exercise results with the instructor and ask any questions.

*We understand that scheduling is difficult during this unpredictable time. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for two weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously. 



The most common type of longitudinal data is panel data, consisting of measurements of predictor and response variables at two or more points in time for many individuals. Such data have two major attractions: the ability to control for unobservables, and the determination of causal ordering.

However, there is also a major difficulty with panel data: repeated observations are typically correlated, and this invalidates the usual assumption that observations are independent. As a result, confidence intervals and p-values can be severely biased. In some cases, coefficients may also be biased downward.

You’ll learn how to use these methods (robust standard errors, generalized estimating equations, random effects models and fixed effects models) for quantitative outcomes, categorical outcomes, and count data outcomes. You’ll also learn which methods are best suited for different kinds of applications.

This is a hands-on seminar with ample opportunities to practice these new methods.

Here are a few of the topics you won’t want to miss:

  • How to use panel data to control for unobserved variables.
  • Why fixed effects methods often give very different results from random effects methods.
  • How to reshape data from long form to wide form and back again.
  • Why the default correlation structure for GEE is usually not the best.
  • The difference between maximum likelihood and restricted maximum likelihood.
  • How to estimate and interpret random coefficient models.
  • Why first-order autoregressive structures are usually unsatisfactory.
  • The difference between subject-specific coefficients and population-averaged coefficients, and why it matters.
  • How to do longitudinal analysis using ordered logit or multinomial logit.

In this seminar, we will use the following Stata commands: reg, reshape, xtreg, areg, mixed, xtset, xtgee, logit, xtlogit, clogit, melogit, meologit, nbreg, menbreg, lrtest, margins, marginsplot, hausman, xthybrid, and xtdpdml. Lecture notes using SAS and R are available on request from registered participants.


This seminar will use Stata for the many empirical examples and exercises. However, no previous experience with Stata is assumed. Lecture notes and exercises using SAS and R are also available on request. To participate in the hands-on exercises, you are strongly encouraged to use a computer with Stata installed (release 13 or higher; IC, SE, or MP versions are all acceptable).

Seminar participants who are not yet ready to purchase Stata could take advantage of StataCorp’s free 30-day evaluation offer or their 30-day software return policy.

WHO SHOULD Register?

If you need to analyze longitudinal data and have a basic statistical background, this course is for you. You should have a good working knowledge of the principles and practice of multiple regression, as well as elementary statistical inference. It is also helpful to have some familiarity with logistic regression. But you do not need to know matrix algebra, calculus, or likelihood theory.


1. Opportunities and challenges of panel data.
        a. Basic data structure and notation
        b. Why do we want panel data?
        c. Problem of dependence
        d. Software considerations

2. Linear models
        a. Robust standard errors
        b. Generalized least squares
        c. Random effects models
        d. Fixed effects models
        e. Between-within (hybrid) models

3. Logistic regression models
       a. Robust standard errors
       b. Generalized estimating equations
       c. Subject-specific vs. population averaged methods
       d. Random effects models
       e. Fixed effects models
        f. Between-within (hybrid) models

4. Methods for count data
       a. Poisson and negative binomial models.
       b. Robust standard errors.
       c. GEE
       d. Random effects
       e. Fixed Effects
        f. Between-within (hybrid) models

5. Linear structural equation models
     a. Fixed and random effects in the SEM framework
     b. xtdpdml command
     c. Models for reciprocal causation with lagged effects

