Multilevel and Mixed Models with Stata and ChatGPT - Online Course
A 3-Day Livestream Seminar Taught by
Stephen VaiseyWednesday, February 18 –
Friday, February 20, 2026
10:00am-12:30pm (convert to your local time)
1:30pm-3:30pm
This seminar provides an intensive introduction to multilevel and mixed models, a class of regression models for data that have a hierarchical (or nested) structure. Common examples of such data structures are students nested within classrooms, patients nested within hospitals, or survey respondents nested within countries.
Using techniques that ignore this hierarchical structure (such as ordinary least squares) can lead to incorrect results because such methods assume that all observations are independent. Perhaps more important, using inappropriate techniques prevents researchers from asking substantively interesting questions about how processes work at different levels and how effects may vary across units in a population.
In addition to providing a solid foundation in using mixed models in Stata, this course will also equip you with a set of structured prompts to use with your Large Language Model (LLM) of choice. LLMs like ChatGPT can serve as invaluable “research assistants” but need to be prompted in a skillful way to maximize their usefulness and avoid pitfalls. You will learn how to use ChatGPT to help design, estimate, interpret, and understand the assumptions of your models. Explicit discussion of LLM prompting will comprise approximately 15-20% of course time.
Starting February 18, this seminar will be presented as a 3-day synchronous, livestream workshop via Zoom. Each day will feature two lecture sessions with hands-on exercises, separated by a 1-hour break. Live attendance is recommended for the best experience. But if you can’t join in real time, recordings will be available within 24 hours and can be accessed for four weeks after the seminar.
Closed captioning is available for all live and recorded sessions. Captions can be translated to a variety of languages including Spanish, Korean, and Italian. For more information, click here.
ECTS Equivalent Points: 1
More details about the course content
After introducing the key concepts of within and between variance, we will begin with simple multilevel variance components models that can tell us how much of the variance in a measure can be allocated to different levels of observation. We will then move on to mixed models (random effects models with fixed covariates) that allow us to ask how factors at different levels can influence the outcome.
Next, we will investigate how using random coefficients and cross-level interactions can help us discover hidden structure in our data and help us investigate how individual-level processes work differently in different contexts. We will also briefly consider how these techniques can be applied to cases where we have repeated observations of individuals or other entities over time.
Although the course will focus primarily on the continuous outcome case, we will also cover how these models can easily be extended for use with categorical and limited dependent variables.
The seminar will focus on a hands-on understanding and draw from examples across the social and behavioral sciences. After completing the course, you will know:
-
- The technical and substantive difference between fixed and random effects and how these terms relate to complete, partial, and no pooling estimators.
- The meanings of random intercepts and random slopes are and when to use each one.
- How to use cross-level interactions to investigate effect heterogeneity.
- How to combine the strengths of random-effects and fixed-effects approaches into a single “between-within” model.
- How to estimate these models and interpret the results with the assistance of LLMs.
Although these techniques apply to both clustered and longitudinal data, in the interest of time we will focus almost exclusively on the clustered case. For courses focused on longitudinal data analysis, check out Longitudinal Data Analysis Using R or Longitudinal Data Analysis Using Stata.
After introducing the key concepts of within and between variance, we will begin with simple multilevel variance components models that can tell us how much of the variance in a measure can be allocated to different levels of observation. We will then move on to mixed models (random effects models with fixed covariates) that allow us to ask how factors at different levels can influence the outcome.
Next, we will investigate how using random coefficients and cross-level interactions can help us discover hidden structure in our data and help us investigate how individual-level processes work differently in different contexts. We will also briefly consider how these techniques can be applied to cases where we have repeated observations of individuals or other entities over time.
Although the course will focus primarily on the continuous outcome case, we will also cover how these models can easily be extended for use with categorical and limited dependent variables.
The seminar will focus on a hands-on understanding and draw from examples across the social and behavioral sciences. After completing the course, you will know:
-
- The technical and substantive difference between fixed and random effects and how these terms relate to complete, partial, and no pooling estimators.
- The meanings of random intercepts and random slopes are and when to use each one.
- How to use cross-level interactions to investigate effect heterogeneity.
- How to combine the strengths of random-effects and fixed-effects approaches into a single “between-within” model.
- How to estimate these models and interpret the results with the assistance of LLMs.
Although these techniques apply to both clustered and longitudinal data, in the interest of time we will focus almost exclusively on the clustered case. For courses focused on longitudinal data analysis, check out Longitudinal Data Analysis Using R or Longitudinal Data Analysis Using Stata.
Computing
The majority of what you will learn in this course can be applied in any software package. This seminar will use the most recent version of Stata for empirical examples and exercises. (Nearly all commands will work in Stata 14+ as well.)
For LLM support, the instructor will use the most recent paid version of ChatGPT. However, most modern LLMs (e.g., Claude, Gemini) will be useful for understanding, modifying, and interpreting mixed models.
Basic familiarity with Stata is highly desirable, but even novice Stata users should be able to follow the presentation and do the exercises.
R notes and syntax are available upon request.
If you’d like to familiarize yourself with Stata basics before the seminar begins, we recommend following along with a “getting started” video like the one here.
Seminar participants who are not yet ready to purchase Stata could take advantage of StataCorp’s 30-day software return policy.
The majority of what you will learn in this course can be applied in any software package. This seminar will use the most recent version of Stata for empirical examples and exercises. (Nearly all commands will work in Stata 14+ as well.)
For LLM support, the instructor will use the most recent paid version of ChatGPT. However, most modern LLMs (e.g., Claude, Gemini) will be useful for understanding, modifying, and interpreting mixed models.
Basic familiarity with Stata is highly desirable, but even novice Stata users should be able to follow the presentation and do the exercises.
R notes and syntax are available upon request.
If you’d like to familiarize yourself with Stata basics before the seminar begins, we recommend following along with a “getting started” video like the one here.
Seminar participants who are not yet ready to purchase Stata could take advantage of StataCorp’s 30-day software return policy.
Who should register?
This course is for anyone who wants to learn to apply multilevel models to observational data. You should have a basic foundation in linear regression.
This course is for anyone who wants to learn to apply multilevel models to observational data. You should have a basic foundation in linear regression.
Seminar outline
Module 1: Preliminaries
-
- Course goals and the “big picture” of multilevel/mixed modeling
- Variance as the core idea: residual variance and variance “explained”
- Why multilevel models: separating within vs. between variability
- Common data structures: clusters (e.g., students-in-schools) vs. repeated measures (panels)
- Quantifying clustering with VPC/ICC (including worked examples)
- LLMs: giving the context of your data
Module 2: Models with one dimension
-
- Pooling decisions: complete pooling vs. no pooling vs. partial pooling
- Partial pooling and shrinkage intuition (Empirical Bayes/BLUP ideas)
- Random effects vs. fixed effects: meaning in one dimension
- Building and interpreting a basic random-intercept model in Stata (mixed)
- A full working example (data description, distributions, country/sample-size issues)
- Visualizing group effects with caterpillar plots
- Basic model selection (BIC, with interpretation)
- LLMs: interrogating sample sizes and shrinkage estimates
Module 3: Two dimensions
-
- Moving to two dimensions: random intercepts and random slopes
- When do you need variable slopes? Diagnostics and decision logic
- Practical modeling details: scaling predictors, quadratics, and interpretation of coefficients
- LLMs: model assumptions and interpretation; how (and whether) to “relax” model assumptions
Module 4: Multivariate models
-
- Adding predictors at Level 1 vs. Level 2
- Variance explained by level; reporting R²-style summaries
- Cross-level interactions as “manifest” moderators of random slopes
- RE vs. FE vs. correlated random effects (CRE)/between-within approaches
- Hausman logic and pitfalls
- LLMs: key prompts for assessing model accuracy and robustness
Module 5: Additional topics
-
- Three-level models: equation, Stata syntax, and three-level VPC
- Binary outcomes: mixed logistic regression (melogit), ICC/VPC issues, odds ratios, visualization, and CRE logit variant
- Presenting results: conventional table sequence (null → L1 → L2 → cross-level), plus plotting interactions and suggested references
Module 6: Repeated measures data
-
- Extension of core ideas to panel data using a worked example
- LLMs: useful prompts for repeated measures data
Module 1: Preliminaries
-
- Course goals and the “big picture” of multilevel/mixed modeling
- Variance as the core idea: residual variance and variance “explained”
- Why multilevel models: separating within vs. between variability
- Common data structures: clusters (e.g., students-in-schools) vs. repeated measures (panels)
- Quantifying clustering with VPC/ICC (including worked examples)
- LLMs: giving the context of your data
Module 2: Models with one dimension
-
- Pooling decisions: complete pooling vs. no pooling vs. partial pooling
- Partial pooling and shrinkage intuition (Empirical Bayes/BLUP ideas)
- Random effects vs. fixed effects: meaning in one dimension
- Building and interpreting a basic random-intercept model in Stata (mixed)
- A full working example (data description, distributions, country/sample-size issues)
- Visualizing group effects with caterpillar plots
- Basic model selection (BIC, with interpretation)
- LLMs: interrogating sample sizes and shrinkage estimates
Module 3: Two dimensions
-
- Moving to two dimensions: random intercepts and random slopes
- When do you need variable slopes? Diagnostics and decision logic
- Practical modeling details: scaling predictors, quadratics, and interpretation of coefficients
- LLMs: model assumptions and interpretation; how (and whether) to “relax” model assumptions
Module 4: Multivariate models
-
- Adding predictors at Level 1 vs. Level 2
- Variance explained by level; reporting R²-style summaries
- Cross-level interactions as “manifest” moderators of random slopes
- RE vs. FE vs. correlated random effects (CRE)/between-within approaches
- Hausman logic and pitfalls
- LLMs: key prompts for assessing model accuracy and robustness
Module 5: Additional topics
-
- Three-level models: equation, Stata syntax, and three-level VPC
- Binary outcomes: mixed logistic regression (melogit), ICC/VPC issues, odds ratios, visualization, and CRE logit variant
- Presenting results: conventional table sequence (null → L1 → L2 → cross-level), plus plotting interactions and suggested references
Module 6: Repeated measures data
-
- Extension of core ideas to panel data using a worked example
- LLMs: useful prompts for repeated measures data
Payment information
The fee of $995 includes all course materials.
PayPal and all major credit cards are accepted.
Our Tax ID number is 26-4576270.
The fee of $995 includes all course materials.
PayPal and all major credit cards are accepted.
Our Tax ID number is 26-4576270.