Longitudinal Data Analysis Using Stata

A 2-Day Seminar Taught by Paul Allison, Ph.D.

Read reviews of this course

To see a sample of the course materials, click here.

This course is currently full. If you would like to be added to the waitlist, please send us an email at ashley@statisticalhorizons.com.


The most common type of longitudinal data is panel data, consisting of measurements of predictor and response variables at two or more points in time for many individuals. Such data have two major attractions: the ability to control for unobservables, and the investigation of causal ordering.

However, there is also a major difficulty with panel data: repeated observations are typically correlated, and this invalidates the usual assumption that observations are independent. As a result, confidence intervals and p-values can be severely biased. In some cases, coefficients may also be biased downward.

This course covers four methods for solving the problem of dependent observations: robust standard errors, generalized estimating equations, random effects models and fixed effects models. You’ll learn how to use these methods for quantitative outcomes, categorical outcomes, and count data outcomes. You’ll also learn which methods are best suited for different kinds of applications.

This is a hands-on seminar with ample opportunities to practice these new methods.

Here are a few of the topics you won’t want to miss:

  • How to use panel data to control for unobserved variables.
  • Why fixed effects methods often give very different results from random effects methods.
  • How to reshape data from long form to wide form and back again.
  • Why the default correlation structure for GEE is usually not the best.
  • The difference between maximum likelihood and restricted maximum likelihood.
  • How to estimate and interpret random coefficient models.
  • Why first-order autoregressive structures are usually unsatisfactory.
  • The difference between subject-specific coefficients and population-averaged coefficients, and why it matters.
  • How to do longitudinal analysis using ordered logit or multinomial logit.

In this seminar, we will use the following Stata commands: reg, reshape, xtreg, areg, mixed, xtset, xtgee, logit, xtlogit, clogit, melogit, meologit, nbreg, menbreg, lrtest, margins, marginsplot, hausman, xthybrid, and xtdpdml. Lecture notes using SAS and R are available on request from registered participants.


This seminar will use Stata for the many empirical examples and exercises. However, no previous experience with Stata is assumed. Lecture notes and exercises using SAS and R are also available on request. To participate in the hands-on exercises, you are strongly encouraged to bring a laptop computer with Stata installed (release 13 or higher; IC, SE, or MP versions are all acceptable). A power outlet and wireless access will be available at each seat.

Seminar participants who are not yet ready to purchase Stata could take advantage of StataCorp’s free 30-day evaluation offer or their 30-day software return policy.


If you need to analyze longitudinal data and have a basic statistical background, this course is for you. You should have a good working knowledge of the principles and practice of multiple regression, as well as elementary statistical inference. It is also helpful to have some familiarity with logistic regression. But you do not need to know matrix algebra, calculus, or likelihood theory. 

LOCAtion, Format, And Materials 

The class will meet from 9 am to 5 pm each day with a 1-hour lunch break at Jamaica Bay Inn, 4175 Admiralty Way, Marina Del Rey, CA 90292. 

Participants receive a bound manual containing detailed lecture notes (with equations and graphics), examples of computer printout, and many other useful features. This book frees participants from the distracting task of note taking. 

Registration and lodging

The fee of $995.00 includes all seminar materials. 

Refund Policy

If you cancel your registration at least two weeks before the course is scheduled to begin, you are entitled to a full refund (minus a processing fee of $50). 

Lodging Reservation Instructions 

Nearby hotel options include:

Jamaica Bay Inn, seminar location, 4175 Admiralty Way, Marina del Rey, CA 90292
Marina del Rey Marriott, 4100 Admiralty Way, Marina del Rey, California 90292
Hilton Garden Inn Los Angeles Marina Del Rey, 4200 Admiralty Way, Marina del Rey, California 90292
Foghorn Harbor Inn, 4140 Via Marina, Marina del Rey, California 90292
Inn at Venice Beach, 327 W Washington Blvd, Venice, California 90291

No reserved room blocks are currently available at these hotels. We recommend going directly to the hotel’s website or checking other online travel sites. Airbnb’s may also be available in this vicinity.


1. Opportunities and challenges of panel data.
        a. Basic data structure and notation
        b. Why do we want panel data?
        c. Problem of dependence
        d. Software considerations

2. Linear models
        a. Robust standard errors
        b. Generalized least squares
        c. Random effects models
        d. Fixed effects models
        e. Between-within (hybrid) models

3. Logistic regression models
       a. Robust standard errors
       b. Generalized estimating equations
       c. Subject-specific vs. population averaged methods
       d. Random effects models
       e. Fixed effects models
        f. Between-within (hybrid) models

4. Methods for count data
       a. Poisson and negative binomial models.
       b. Robust standard errors.
       c. GEE
       d. Random effects
       e. Fixed Effects
        f. Between-within (hybrid) models

5. Linear structural equation models
     a. Fixed and random effects in the SEM framework
     b. xtdpdml command
     c. Models for reciprocal causation with lagged effects


“I felt like my ability to model moved to the next level. Dr. Allison was very clear and the resources included are going to be referenced by me for decades to come! I look forward to my next Statistical Horizons class!”
  Albert Do, Yale University

“As an epidemiologist, this course has helped me to appreciate the depth and possibilities that are available with advanced stats.”
  Sebhat Erqou, University of Pittsburgh

“Excellent introduction to longitudinal analysis with different applications. Builds knowledge and competence to analyze data appropriately.”
  Israel Sánchez-Cardona, Carlos Albizu University

“Paul provides clear explanations for complex topics, which is reinforced by hands-on practice in Stata with syntax you can use in the future.”
  Samantha Farris, Brown University

“If you are like me, it is difficult to learn statistical methods from a book. These two days were better than weeks trying to glean the information from books. Try asking a question to a book!”
  James Nebus, Suffolk University

“The course provided a deep dive into longitudinal analysis and made me feel comfortable with swimming in that new ocean. Great details and well thought out lectures and examples.”
  Yochai Eisenberg, University of Illinois at Chicago 

“Best stats course on this topic that I have taken. For the first time, I feel comfortable trying out the methods with my own data. My background in longitudinal analysis was minimal and I was able to follow along well. Highly recommend.”
  Erika Bloom, Brown University

“The longitudinal data analysis course helped me immensely with my dissertation. I learned enough in 2 days to tackle issues I had been having with my analysis for weeks. I feel like it saved me time and headaches. I actually can’t wait to get back home and implement what I’ve learned.”
  Sarah Rutland, The University of Alabama at Birmingham