Longitudinal Data Analysis Using R

A 2-Day Seminar Taught by Stephen Vaisey, Ph.D.

The most common type of longitudinal data is panel data, consisting of measurements of predictor and response variables at two or more points in time for many individuals (or other units of observation). Panel data enable two major advances over cross-sectional data: 1) the ability to control for unobserved differences across units, and 2) the ability to investigate questions of causal ordering.

Because panel data violate the standard assumption of independent observations, researchers must choose a strategy to deal with (and, ideally, make use of) this non-independence. In this course we will cover four approaches:

  1. Robust standard errors
  2. Random effects models
  3. Fixed effects models
  4. “Between/within” models that combine fixed and random effects

We will cover each of these methods in some detail, considering their advantages and disadvantages. We will also consider different methods for quantitative and categorical outcomes.

This is a hands-on course with opportunity for participants to practice the different methods using various R packages.


To participate in the hands-on exercises, you are strongly encouraged to bring a laptop computer. The vast majority of what you will learn in this course can be applied in any software package. This seminar will mostly use R for empirical examples and exercises. To replicate the instructor’s workflow in the course, you are strongly encouraged to come with R and RStudio already installed on your computer. However, no previous experience with R is needed because all code will be provided. Although the course will be taught in R, complete Stata and SAS syntax are available upon request.

If you’d like to take this course but are concerned that you don’t know enough R, there are excellent on-line resources for learning the basics. Here are our recommendations.

1. Opportunities and challenges of panel data
2. Linear models
     a. Robust standard errors
     b. Generalized least squares with maximum likelihood
     c. Random effects models
     d. Fixed effects models
     e. Between-within models
3. Logistic regression models
     a. Robust standard errors
     b. Subject-specific vs. population averaged estimates
     c. Random effects models
     d. Fixed effects models
     e. Between-within models
4. Extensions to count data models
5. Introduction to structural equation models for panel data


