Longitudinal Data Analysis Using Stata

A 2-Day Seminar Taught by Paul Allison, Ph.D.

Read reviews of this course

To see a sample of the course materials, click here.

This course is currently full. If you would like to be added to the waitlist, please send us an email at info@statisticalhorizons.com.


The most common type of longitudinal data is panel data, consisting of measurements of predictor and response variables at two or more points in time for many individuals. Such data have two major attractions: the ability to control for unobservables, and the determination of causal ordering.

However, there is also a major difficulty with panel data: repeated observations are typically correlated and this invalidates the usual assumption that observations are independent. There are four widely available methods for dealing with dependence: robust standard errors, generalized estimating equations, random effects models and fixed effects models. This course examines each of these methods in some detail, with an eye to discerning their relative advantages and disadvantages. Different methods are considered for quantitative outcomes and categorical outcomes.

This is a hands-on course with ample opportunity for participants to practice the different methods. We’ll cover the following Stata commands: reg, areg, xtreg, mixed, logit, ologit, clogit, mlogit, xtlogit, melogit, meologit, xtgee, xtpoisson, mepoisson, menbreg, sem, and reshape. 


This seminar will use Stata for the many empirical examples and exercises. However, no previous experience with Stata is assumed. Lecture notes and exercises using SAS are also available on request. To participate in the hands-on exercises, you are strongly encouraged to bring a laptop computer with Stata installed (release 13 or higher; IC, SE, or MP versions are all acceptable). A power outlet and wireless access will be available at each seat.

Seminar participants who are not yet ready to purchase Stata could take advantage of StataCorp’s free 30-day evaluation offer or their 30-day software return policy.


If you need to analyze longitudinal data and have a basic statistical background, this course is for you. You should have a good working knowledge of the principles and practice of multiple regression, as well as elementary statistical inference. It is also helpful to have some familiarity with logistic regression. But you do not need to know matrix algebra, calculus, or likelihood theory. 

LOCAtion, Format, And Materials 

The class will meet from 9 am to 5 pm each day with a 1-hour lunch break at the Boston Marriott Copley Place, 110 Huntington Avenue, Boston, MA 02116.

Participants receive a bound manual containing detailed lecture notes (with equations and graphics), examples of computer printout, and many other useful features. This book frees participants from the distracting task of note taking. 

Registration and lodging

The fee of $995.00 includes all course materials.

Refund Policy

If you cancel your registration at least two weeks before the course is scheduled to begin, you are entitled to a full refund (minus a processing fee of $50). 

Lodging Reservation Instructions 

A block of guest rooms has been reserved at the Boston Marriott Copley Place, 110 Huntington Avenue, Boston, MA 02116, where the seminar takes place, at a special rate of $269. In order to make reservations, call 800-228-9290 or 617-236-5800 during business hours and identify yourself as part of the Statistical Horizons LLC group. For guaranteed rate and availability, you must reserve your room no later than Thursday, May 24, 2018.


1. Opportunities and challenges of panel data.
        a. Basic data structure and notation
        b. Why do we want panel data?
        c. Problem of dependence
        d. Software considerations

2. Linear models
        a. Robust standard errors
        b. Generalized estimating equations
        c. Random effects models
        d. Fixed effects models
        e. Between-within (hybrid) models

3. Logistic regression models
       a. Robust standard errors
       b. Generalized estimating equations
       c. Subject-specific vs. population averaged methods
       d. Random effects models
       e. Fixed effects models
        f. Between-within (hybrid) models

4. Methods for count data
       a. Poisson and negative binomial models.
       b. Robust standard errors.
       c. GEE
       d. Random effects
       e. Fixed Effects
        f. Between-within (hybrid) models

5. Linear structural equation models
     a. Fixed and random effects in the SEM framework
     b. xtdpdml command
     c. Models for reciprocal causation with lagged effects


“Dr. Allison was able to convey an enormous amount of material in an accessible, practical way without sacrificing depth. Perfect complement to self instruction on a topic.”
  William Parker, University of Chicago

“This course helped me realize all the information I can get from panel data and when to believe in my results or to keep looking for a better way to move forward. Thanks for being open to questions and for a great course!”
  Estibalitz Laresgoiti, Tecnológico de Monterrey

“I highly recommend this course for quantitative sociologists who are working with longitudinal data sets. Given an increase in the availability of large-scale longitudinal data sets in one Irish context (TILDA, GUI, Administrative Linked Data) developing the skills to analyze this data is essential. The course was challenging and provided a guide on how to approach the analysis of such data. I would highly recommend this course. Paul’s teaching approach is excellent, balancing the ability to cover a depth of materials on the topic of repeated measures while taking questions from participants in the class. This coupled with excellent course material made the trip from Ireland to Chicago worthwhile!”
  Delma Byrne, Maynooth University

“This course was approachable for an individual with minimal experience in longitudinal modeling. It was well organized and provided a succinct yet informative overview of seminal concepts in longitudinal modeling. Course materials were complementary to the lectures and manageable for the learners.”
  Kelly Brassil, University of Texas, MD Anderson Cancer Center

“Terrific course! I felt like we covered a semester’s worth of information in two days.”
  Jeffrey Bridge, Nationwide Children’s Hospital

“This was such an informative course, even for someone with a working knowledge of this material. The specific examples and clear comparisons of models (e.g. robust SE, GEE, RE, FE) were incredibly helpful. I found the course really solidified my understanding and gave me new ideas on how to examine change and analyze data to uncover new understanding of phenomena. Highly recommend this course!”
  Karen Lyons, Oregon Health & Science University

“Paul is extremely well organized and easy to follow. The pace was good, not too fast. The exercises helped us to apply the new concepts we learned. This course is definitely a must if you are dealing with data that is measured at various time points.”
  Sylvain Fiset, Université de Moncton

“This course was very helpful, especially appreciated the notes and documentations that came with the course.”
  David Wang, Biola University

“Really enjoyed this course. Well-organized materials and very informative.”
  Chen Chen, University of Southern Indiana

“For scholars with specific questions on longitudinal data analysis- this is the course for you! The course provides an introduction (or re-introduction) to the main ways to test longitudinal data in an easy to understand format. The course pace is fast but thorough. I got a lot out of this course.”
  Davia Downey, Grand Valley State University

“This is a broad, fast paced class that is highly relevant to the social science, as well as in fields such as medical data. Both the theory and examples in the class cover a range of material that is useful to a broad audience.”
  Daniel Bliss, Illinois Institute of Technology