Categorical Data Analysis

A 3-Day Remote Seminar Taught by
Trenton Mize, Ph.D.

Many—perhaps even most— behavioral, health, and social science questions include outcome variables that are categorical. E.g. Which political candidate will win the next election? How does a parent’s social class influence children’s educational attainment? How many publications does it take to receive tenure? Do men or women drink more alcoholic drinks? Is a vaccine effective at preventing disease? Answering these—and countless other—questions cannot be adequately accomplished via the linear regression model and instead require the more advanced techniques covered extensively in this seminar.

Categorical Data Analysis is a seminar in applied statistics that primarily deals with regression models in which the dependent variable is binary, nominal, ordinal, or count. Many common statistical issues including interpretation of coefficients, calculation of predictions, testing of interaction effects, testing for mediation or other cross-model comparisons, and assessing model fit, require a different approach for models with categorical dependent variables. The focus of the course is on interpretation and learning to deal with the complications introduced by the nonlinearity of the models.

Specific models considered include: probit and logit for binary outcomes; ordered logit/probit and the generalized ordered logit model for ordinal outcomes; multinomial logit for nominal outcomes; and Poisson, negative binomial, and zero inflated models for counts.

Starting January 27, we are offering this seminar as a 3-day synchronous*, remote workshop for the first time. Each day will consist of a 4-hour live lecture held via the free video-conferencing software Zoom. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

Each day will include a hands-on exercise to be completed on your own after the lecture session is over. An additional lab session will be held Thursday and Friday afternoons, where you can review the exercise results with the instructor and ask any questions.

*We understand that scheduling is difficult during this unpredictable time. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.

Closed captioning is available for all live and recorded sessions.


The vast majority of what you will learn in this course can be applied in any software package. However, this seminar will mostly use Stata for empirical examples and exercises. To replicate the instructor’s examples in the course, you should have Stata already installed on your computer when the course begins. No previous experience with Stata is needed, however, because all necessary code will be provided.

For Stata users, version 17 will be used for the examples, but the exercises can also be done with versions 14-16. 

Seminar participants who are not yet ready to purchase Stata could take advantage of StataCorp’s free 30-day evaluation offer or their 30-day software return policy.

Resources to replicate the primary methods covered in the course in R will also be provided.

Who should Register? 

If you need to analyze categorical outcome data (i.e. binary, ordinal, nominal, or count dependent variables) and have a basic statistical background, this seminar is for you. The seminar is helpful for graduate students, applied researchers, faculty, and others who want to learn these methods for the first time—but also for researchers who have some familiarity with the methods but want to learn the contemporary techniques now widely available for analyzing categorical data.

If you have a good working knowledge of linear regression, you are well-prepared for this seminar.

Seminar outlinE

Day 1

     • Why can’t I use OLS for all dependent variables?
     • Nonlinear effects, interaction effects, and nonlinear interaction effects
     • Count dependent variables: Poisson and negative binomial models
     • Binary dependent variables: logit and probit models

Day 2

     • Interpreting categorical dependent variable models: coefficients, multiplicative effects, predictions, marginal effects, and visualizations
     • Zero-inflated count models
     • Nominal dependent variables: multinomial logit models
     • Ordinal models: ordinal logit and probit, generalized ordered logit models

Day 3

     • Interaction / moderation for categorical models
     • Comparing predictions and effects across categorical models (e.g. mediation)
     • Absolute and comparative model fit for categorical models
     • Model diagnostics for categorical models