Econometrics: A Causal Approach

A 3-Day Remote Seminar Taught by
Nick Huntington-Klein

To see a sample of the course materials, click here.

This course offers a survey of econometrics. Econometrics is a broad category of data analysis that focuses on trying to use data to understand how the world works, even in cases where you can’t run an experiment. The seminar puts an emphasis on practical understanding and use of these concepts, as opposed to statistical proofs.

Over the course of three four-hour sessions, we will cover regression analysis, identification, and some common research designs.

Starting January 20, we are offering this seminar as a 3-day synchronous*, remote workshop for the first time. Each day will consist of a 4-hour live lecture held via the free video-conferencing software Zoom. You are encouraged to join the lecture live, but will have the opportunity to view the recorded session later in the day if you are unable to attend at the scheduled time.

Each day will include a hands-on exercise to be completed on your own after the lecture session is over. An additional lab session will be held Thursday and Friday afternoons, where you can review the exercise results with the instructor and ask any questions.

*We understand that scheduling is difficult during this unpredictable time. If you prefer, you may take all or part of the course asynchronously. The video recordings will be made available within 24 hours of each session and will be accessible for four weeks after the seminar, meaning that you will get all of the class content and discussions even if you cannot participate synchronously.

Closed captioning is available for all live and recorded sessions.


Regression is the primary tool that econometricians use to evaluate data. We’ll be going over how regression is used in econometrics, including the many ways econometricians grapple with the parts of the world we don’t understand yet – error terms. Identification is how econometricians link theory (economic theory or otherwise) to data, and determine not just the difference between correlation and causation, but more broadly whether our analysis is actually answering the question we want it to. Identification is a broad idea, but there are a few standard research designs that can help us a lot, and we’ll be covering modern developments in fixed effects, difference-in-differences, regression discontinuity, and instrumental variables.

Specific topics covered will be linear regression, heteroskedasticity-, autocorrelation-, and cluster-robust standard errors, identification, omitted variable bias, directed acyclic graphs, fixed effects, difference-in-differences, regression discontinuity, and instrumental variables. There will also be a brief overlook on how machine learning is likely to change how econometrics is performed.

We will be focusing on the R programming language in the lab portions of the class. However, all materials will also be made available in Stata and Python, and assistance will also be available in those languages.


This is a hands-on course with instructor-led software demonstrations and guided exercises. These guided exercises will be primarily designed for the R language, and so you should use a computer with a recent version of R (version 4.0.0 or later) and RStudio (version 1.4 or later). However, if you prefer and do not mind deviating slightly from the guided exercise, all exercises will also be available for Stata (version 13 or later) and Python (version 3.7 or later).

If you’d like to use R for this course but don’t yet have much experience with that package, here are some excellent on-line resources for building your R skills.

WHO SHOULD Register? 

You should take this course if you want to understand the how and why of econometric analysis of observational data. If you want to understand what these tools actually do and how they answer important questions, you should enroll. You should have a basic working knowledge of your language of choice (R, Stata, or Python). Extensive programming experience is not necessary.

This course does not require calculus or familiarity with statistical proofs, and is appropriate for researchers (in public-sector, or private-sector domains, or students or faculty in academia) who have a working knowledge of statistics. Familiarity with linear regression is even better.

Seminar Outline

1. Linear regression
     • Theoretical and statistical models
     • Line-fitting
     • Interpreting regressions

2. Standard errors
     • Sampling variation in regression
     • Heteroskedasticity
     • Autocorrelation
     • Clustering

3. Identification
     • What is identification?
     • Causal diagrams
     • Back-door paths and omitted variable bias
     • Identification using control variables
     • Placebo tests

4. Common back-door designs:
     • Fixed effects
     • Difference-in-differences
     • Estimation
     • Common problems and solutions

5. Common front-door designs:
     • Instrumental variables
     • Regression discontinuity
     • Estimation
     • Common problems and solutions

6. Machine learning: a glimpse of the future