Discussion Forum for Treatment Effects Analysis

June 1, 2018 By Stephen Vaisey

This page is for participants in Stephen Vaisey’s online course “Treatment Effects Analysis.”  Please post any questions or comments you have about the course. Dr. Vaisey will respond to questions, but you should also feel free to respond to other people’s posts. 

110 Responses

  1. Alana says:

    Hello Dr. Vaisey and online class,
    I am excited to learn more about these methods to help me design better observational clinical studies in my work at the UIC College of Nursing. My first question is — where can I find the readings? Will you provide a link or a full reference?

  2. Mike Withers says:

    Hi, my name is Mike Withers. I work in the Management Department at Mays Business School, Texas A&M University. I teach a PhD seminar in research methods and would look to gain more insights into treatment effects analysis. Looking forward to the class.

  3. Blair Darney says:

    I’m an health services researcher and Asst Prof at Oregon Health & Science University School of Medicine in Portland OR. I need a refresher on matching and weighting and would like to learn more to improve causal inference using observational data/non-random assignments. I work on reproductive health in the US and Mexico.

  4. Jeff Conlin says:

    Hi Dr. Vaisey and fellow classmates,

    My name is Jeff Conlin. I’m a 3rd year PhD student at Penn State University studying mass communications. I’d like to improve my knowledge and skills related to propensity score and matching techniques. Broadly, I’m interested in science and risk-related media messages. Thanks and I’m looking forward to the course!


  5. Greg Petroski says:

    I am an applied statistician with the U. of Missouri School of Medicine. We see a lot of secondary data analyses and while I have used propensity methods a few times, I want a deeper understanding of the methods and a better appreciation for the strengths and limitations of the methods. Learning new things always sounds like fun but time is limited and so this class is a sort of self-inflicted enrichment opportunity.

  6. Ryan Mullins says:

    Hi everyone —

    My name is Ryan Mullins and I’m a marketing professor at Clemson University. I’m looking forward to the course and takeaways. These types of analyses are a growing trend in my field as top journals are expecting stronger causal explanations for proposed relationships. I hope to add these approaches to my toolkit so I can ask (and execute on) more impactful research questions!


  7. Wendy Zeitlin says:

    Hi! I’m an assistant professor at Montclair State University in New Jersey. My research is in the field of social work where it is typically difficult to have true experiments. I have participated in research in the past that has used propensity score analysis, but I only understand the most basic theory behind it, and I would like to know more!

  8. Alissa Knowles says:

    My name is Alissa and I am a fifth year graduate student at UC Irvine in the department of Psychology and Social Behavior. I primarily study adolescents who have been arrested and sanctioned by the juvenile justice system. Due to the observational nature of my research, random assignment is not feasible and I think these techniques will be very helpful to study the influence of varying degrees of justice system involvement on developmental outcomes.

  9. Bettye Apenteng says:

    Hello. My name is Bettye Apenteng. I am a faculty member at Georgia Southern University. I am taking this course to enhance my knowledge and skill set in treatment effects analysis.

  10. Samuel Opoku says:

    I am interested in this technique because I think it will be useful in my research work

  11. Dina Tell says:

    Hello, my name is Dina and I am a research assistant professor at Loyola University Chicago School of Nursing. I study the effects of psycho-social stress on immune function, stress response and potential health disparities. I am new to the treatment effects analysis and hope to use these techniques in my work as it applies to the observational nature of my research.

  12. Sarah Vidal says:

    Hi Everyone! My name is Sarah Vidal and I’m a senior analyst at Westat. My research interests include adolescent development, juvenile delinquency, and community corrections. I do a lot of evaluation research and would love to learn more about quasi-experimental approaches, including PSM.

  13. Dina Tell says:

    Dr. Vaisey,
    I am enjoying the Modules for this week!
    Any chance we can get copies of your slides that are used during your lectures?
    Thank you for considering this request,

  14. Esther Chan says:

    Hi everyone. My name is Esther Chan and I am a graduate student in the sociology department at Yale. I was in a course that spent one lecture on TEA and want to learn more about the technique.

  15. Ryan Wells says:

    Hello. Ryan Wells, College of Education at UMass Amherst. Lots of observational data, and trying to be sure I’m up on the latest ways to use them for causal estimation.

  16. Dan Flack says:

    Hi, all:

    I’m currently a graduate student at Drexel University and am looking into quasi-experimental program evaluations for a dissertation. I’m hoping to learn more about matching and causal inference when pure randomization isn’t feasible.

  17. Carlos Rios-Bedoya says:

    Hi everyone,

    My names is Carlos F. Rios-Bedoya and my training is quantitative epidemiology. I am the Corporate Director of Scholarly Inquiry at McLaren Health Care in Michigan, I supervise and monitor the research projects of over 500 DO residents. I’m hoping to learn more about this technique to offer it to our residents for their research projects.

  18. Ben Fisher says:

    My name is Ben Fisher and I’m an assistant professor of criminal justice at the University of Louisville. I’ve been casually introduced to a few elements of treatment effects analysis, but have had no formal training in it.

  19. David Miller says:

    I work at a public health agency in Virginia and am interested particularly in how treatment effects and counterfactuals is or might be applied to observational studies in public health/epidemiology.

  20. Marie Evertsson says:

    Hi all!
    I am a professor of Sociology at Stockholm University. I am looking forward to the course but will be busy and may lag behind a bit (these are vacation times here in Sweden and I have two kids to attend to).
    Do I understand it correctly if I can look at the videos throughout the course period (and that they are gone only thereafter)?

    Thanks! Look forward!

  21. Mark Deneau says:

    Hi all,
    Mark Deneau, physician at the University of Utah, taking this class as part of a career development grant to learn more about propensity score analysis, among other techniques. I research primary sclerosing cholangitis, a liver disease affecting children with inflammatory bowel disease, and I manage a large international consortium and research database.

  22. Lauren D'Arinzo says:

    Hello all,

    My name is Lauren D’Arinzo, I work in infectious disease epidemiology and antibiotic stewardship research at the Children’s Hospital of Philadelphia. We use a lot of observational clinical data in our work, so I am excited to learn about the relative strengths and limitations of treatment effect approaches for secondary analyses.

  23. Alice Welch says:

    Hello everyone!

    My name is Alice Welch, I am an epidemiologist and the Director of Evaluation for the Bureau of Alcohol and Drug Use at the NYC Department of Health and Mental Hygiene.

  24. Bryn King says:

    Hi All,

    My name is Bryn, and I’m an assistant professor in social work at the University of Toronto. Much of my work uses admin data from child welfare systems, where random assignment to core interventions is impossible, so I’m very interested in the topic/methods. Sorry for the late post, I’ve been sick for the last week.

  25. Irena Dushi says:

    Hi Everyone,
    My name is Irena, and I am an economist with Social Security Administration. I am very interested to learn and apply this techniques/methods in my research when examining respondents with and without disability.
    Thanks and I look forward to learning from this class.

  26. Zachary Rowan says:

    Hi All –

    I’m currently a postdoc at the University of California, Irvine in the Department of Psychology and Social Behavior. I’ll be starting as an AP at Simon Fraser University in the Fall. I work with a lot of observational data and am exploring the best ways to estimate treatment effects and want to understand the latest in these methods!

  27. Ryan says:

    Hi Dr. Vaisey —

    I’ve just completed the first exercise in the class and thought it was great to work through. However, I’m actually conducting my work through R, rather than Stata, so I cannot access the Do files provided for the code. While I completed the exercise fine in R, it would be helpful to have some feedback on whether I’m getting the right estimates.

    Would you be able to provide some sort of answer key for those of us who aren’t using Stata? Or perhaps even a text/word/pdf of the Stata file if that includes the estimates?


    • Stephen Vaisey says:

      Ryan: here’s a link to the output for Exercise 1:


      Later on, for most (not all) of the estimators, you won’t actually be able to easily replicate the Stata output using R. The most used packages for R (e.g., matchit) implement very different defaults than the Stata -teffects- suite. There’s more to it than I can easily explain here, but the programs have very different defaults when it comes to PS and NN matching. For example, do we reused matched controls? Do we take all equally good matches? The conventions aren’t well established enough in this area to completely standardize all these decisions. To get the exact Stata results for (e.g.) teffects psmatch would take a lot of extra coding in R.

      When we get to more advanced stuff (e.g., coarsened exact matching or entropy balancing), it will be possible to get directly comparable estimates again.

      You can get a 30-day trial version of Stata to run the Stata code. That’s what I’d recommend for the duration of the course just to compare and contrast the two.

  28. Ryan says:

    Thank you very much, this helps a lot. And I really appreciate you pointing out some of the differences b/w Stata and R for matching. I’ll definitely make a run using the Stata trial, and play around with R at the same time to replicate things on my end.

  29. Ashlee Barnes says:

    Good morning!

    It’s a pleasure to virtually meet you all. I am Ashlee Barnes and I am an assistant professor in criminal justice at Virginia Commonwealth University. I am taking this course because I am interested in evaluating the effectiveness of an intervention for young offenders and a quasi experimental design is the most feasible.

  30. Chris Campbell says:

    Hi Dr. Vaisey and fellow attendees,

    My name is Chris and I’m an assistant professor of criminology and criminal justice at Portland State University. I’ve been using PSM for a while in my research and am always interested in gaining more/different insight. I’m particularly interested in the use of estimators such as teffects. I’m looking forward to this.

    Best to you all!

  31. Dina says:

    Hi Dr. Vaisey,
    I have just completed the Day 1 – Part 3A (Module 5). Toward the end of the Module (slides 91 and 92) you have a histogram that demonstrates the logic of propensity scores for the example of race and income. You highlight the cases of the groups (in the ‘bins’) that should or could be compared, and I believe I understand the overall conceptual reasoning. However, we cannot see your pointer. I would like to know for sure that I understand and follow your example. Is there a way for you to provide us with those 2 slides and perhaps circle the bars you are referring to during the lecture.
    Any clarification would be much appreciated!

    • Stephen Vaisey says:

      Hi Dina,

      I see what you’re talking about. You do have access to all the slides already, so I assume you’re just looking for clarification about what I’m comparing.

      In all cases, I am referring to a comparison between a treatment “bin” and the control “bin” directly above it. That is, I’m comparing cases that have the same probability of treatment but are in different treatment statuses.

      Almost all bins have cases in both the treatment and control group. But, as I point out, the farthest right group among the treatment group has no comparison group (i.e., the height of the corresponding control bar directly above it is zero). And, similarly, the farthest left group among the controls should be matched to treated cases in the same region but the height of the bar directly below it is zero (i.e., no treated cases in that region).

      I hope that answers your question!


  32. Dina says:

    Yes, thank you very much for this clarification.

  33. Variable selection for PS model says:

    What variables go into the propensity model? I have seen apparently conflicting advice on this. Some suggest variables that predict treatment selection, which seems obvious, but others argue that variables also predictive of the outcome are needed.

    Focusing strictly on predictors of treatment assignment risks including variables on which groups differ but that do not impact the outcome — essentially instruments.

    This might appear to be a silly question. If groups differ on a variable that doesn’t effect the outcome, then who cares! But sometimes we have a large pool of candidates and even the experts can not confidently specify which variables drive assignment & which variable will not effect outcomes. The best one can get from the subject specialists is “well maybe”, and sample size issues aggravate the problem. Any thoughts on this?

    • Stephen Vaisey says:

      Hey there! Sorry for the delay in responding. I’ve been traveling all day.

      In real life, we can’t always know which variables should go into the treatment selection model compared to which variables don’t affect treatment selection but might be correlated with the outcome. So it’s better to be safe than sorry as long as you’re not using post-treatment variables in the treatment model. But other than that risk, your intuition is right that we generally end up using the “well maybe” standard.

      On the issue of “if groups differ on a variable that doesn’t affect the outcome then who cares?” you’re nearly right. But if the variable shares an unobserved common cause with the outcome, then the relationship will be confounded even if there’s no direct effect. So that back-door path needs to be closed one way or the other.

      Does that clarify this issue? Feel free to ask a follow up.

  34. Nasrin says:

    Dear Dr. Vaisey and other course participants,

    I am a researcher in Health economics and would like to gain more insights into treatment effects analysis and to improve my knowledge with regard to mayching techniques.


  35. AP says:

    Hi: This may be premature, but I have a question related to a project where I am looking at the effect of a mandate to improve arrests by testing DNA in ALL crimes reported after a certain date. Prior to that date, only some cases with DNA would be tested (for a variety of reasons). I have data on cases before and after mandate and would like to see if odds of arrest improve for cases with similar traits after mandate compared to those before mandate and not tested..so would I use time (before/after mandate) as a variable in the equation to generate propensity scores?

    • Stephen Vaisey says:

      AP: the problem is that after the mandate, the “propensity” becomes 1.0, which means you can’t find cases with a similar propensity from the previous period.

      Here’s one idea: generate a propensity score based only on the pre-mandate selection process. This will tell you how likely cases were to be DNA tested before it was mandatory. Then you can apply the coefficients from this model to the post-mandate case. Of course, you’ll only have treatment cases from the post-mandate observations, but this will still enable you to match similar cases without the issue of perfect prediction.

      • April says:

        OK Thank you

        • April says:

          Hi again:
          So just as a follow-up to your suggestion. I would use all pre-mandate cases to generate propensity scores, so how would I then apply those coefficients to post mandate cases?


          • Stephen Vaisey says:

            OK, let’s say you have a variable “postmandate” which is 1 after the mandate and 0 before, a “dnatest” variable that’s 1 if the case was tested, and covariates x1-x5.

            You could estimate the propensity score manually with:

            logit dnatest x1-x5 if postmandate == 0

            Then you can apply the predictions to all cases with:

            predict pscore, p

            This will apply the model coefficients to all cases, even those who weren’t used in the regression. This would give you the probability that a case would be tested in pre-mandate conditions (“pscore”). You could then supply this propensity score directly to any of the estimation commands.

            Hope that’s clear!

  36. Irena says:

    Hi Dr. Vaisey,

    Regarding Quiz #4, last question “Under what condition would regression produce the same estimate of a causal effect as matching?”. I think the answer is “When there is no heterogeneity in the treatment effect” . However, the correct answer from the quiz results indicates that it is option a. “When the ATET is greater than the ATEU. Can you explain why? Based on the lecture notes or videos I do not see where you may have indicated or explained it.

    Thank you

    • Stephen Vaisey says:

      Whoops. That is an error! You are right that the answer is “when there is no heterogeneity in the TE.” That must have been input incorrectly. Thanks!

  37. Wendy says:

    I just did Quiz 5, and I have a question about why the answer I provided was wrong. The question asks: “Which of the following assumptions is NOT necessary for propensity-score matching to be valid?” I understand why A is the correct answer, but why is D not also correct: Cells must be empty because of sparse data.

    I understand your explanation in the example referred to empty cells, but wouldn’t this even apply when you have a small number of observations and the cells are not empty?

  38. Greg Petroski says:

    We often encounter nested data, for example patients within therapists, or surgeons within hospitals, and often the data is too think to do a credible job of fitting a propensity model by each Level 2 unit. Any thoughts on how the ideas we’re discussing are best adapted to the nested data context?

    • Stephen Vaisey says:

      A lot depends on the specifics here, for example, how many treatment and control cases there are within each level 2 unit.

      Let’s take the easiest case: assume you have 30 or so cases in each of 30 or more clusters and there are some treatments and some controls in each cluster. The simplest thing to do here would be to estimate a p-score using random effects multilevel logit with random intercepts. You wouldn’t just use this propensity score, however. You’d use it to match treatment and control cases *within* each cluster.

      Matching cases within clusters reduces bias in an outcome model because it ensures that matched treatment and control cases share all observed and unobserved characteristics of their (shared) level 2 units. The logic here is similar (identical, really) to the way a fixed-effects model reduces bias.

      We would have to tweak various aspects of this if the data conditions changed. For example, if some smaller clusters had only treatment OR control cases, we’d either need to drop them (to possibly reduce bias) or match them to cases in level 2 units that are as similar as possible.

      A lot would depend on the specifics of the situation but that should give you some idea. My main point is that matching within clusters is strongly preferable when possible.

      • Greg Petroski says:

        That helps and once said seems obvious! But as a follow-up,suppose we are doing this entirely without canned packages. Are the random effects (probably just a random intercept) only used in creating the PS, or reused in the final analytical model? Or do we turn to GEE to account for the matched sets?

        • Stephen Vaisey says:

          Good questions. I would estimate the propensity model using a RE logit (so melogit probably in Stata). The random intercept will “cancel out” in the cluster-specific matching because every case in the same cluster will have the same random intercept. Then you’d want to exact match on cluster. So with -teffects nnmatch- (which you’ll get to starting this week), you could specify the manually estimated propensity score as the distance variable and enforce exact matching on clusters. That’s not the only way, but it’s one way.

  39. Irena says:

    After watching Module 6 I am not allowed to proceed to Quiz 6, in other words Module 6 does not show the green check sign. Even after watching the video again, the same problem.

    What should I do?
    Thanks you

  40. Irena says:

    After watching Module 6 I am not allowed to proceed to Quiz 6, in other words Module 6 does not show the green check sign. Even after watching the video again, the same problem.

    What should I do?
    Thank you

  41. Ryan Wells says:

    For doubly robust strategies… is there any requirement that the X covariates for the regression need to be pre-treatment? I’ve gotten mixed advice on this…and also got advice that the X covariates MUST be the same variables as used in the propensity model (but I believe you stated that this is common, but not required.) Is it just whatever variables you would have used if this were a typical regression without having pre-processed the data?

    Any clarity you can add to variable selection for the regression portion of the doubly robust strategy would be helpful. Thanks!

    • Stephen Vaisey says:

      Good question!

      The X covariates don’t NEED to be pre-treatment but then what you’re getting may be a different sort of effect than the one you have in mind. If some X variables are mechanisms through which treatment D has its effects, then you will be underestimating the total effect of the treatment. After all, the total treatment effect is the sum of its direct and indirect effects.

      I wouldn’t say the X and S variables need to be the same. In practice, this may happen because we don’t have a clear sense of what’s S and what’s X so it’s safer (in one sense anyway) to let the variables play both roles.

      In sum, you want to think as clearly as you can about whether each variable plays a role in the selection process or in the outcome process and model things accordingly. So this isn’t much different than what you would attempt to do in a typical regression. Except now you have two things to model.

      I hope that helps. Feel free to follow up.

  42. Irena says:


    Quiz 10, question #3: Can you elaborate on the correct answer to this question. I thought exclusion of a variable that is related to selection may cause unbalance and thus it would be problematic for the propensity score approach. Hence, I thought the correct answer was option 4 instead of option 1.


    • Stephen Vaisey says:

      Hi Irena,

      The question is about which problems would be bad for PS matching but not necessarily for Mahalanobis distance matching. You’re right that #4 would be a problem for PS matching but it would ALSO be a problem for any form of matching. The reason #1 is correct is because functional form matters for PS matching in a way it doesn’t for Mahalanobis distance matching.

      Does that clarify the issue for you? Let me know.

  43. Caitlin says:

    For Quiz 4, Question 4, how is the language in answer choice c different than d? (i.e. mediate vs. associate)

    • Stephen Vaisey says:

      Hi Caitlin. Welcome!

      Maybe I’m looking in the wrong place but I don’t see that language in Quiz 4, question 4. What’s the question exactly?

  44. Caitlin says:

    Hi all, I am an assistant prof of sociology at UC Davis. I was out of town the last two weeks so just starting now.

  45. Blair Darney says:

    I am curious about PS matching for 3+ groups. We did it using guidance from a paper from RAND), but found it hard to interpret. Sort of like how multinomial logit and ordered logit sometimes get so hard to describe that I fall back on 2 logistic models. Is there a disadvantage to doing 2 logistic PS models instead of the 3+ model (we only want 2 groups compared to 1 group, we don’t care about the other pairwise comparison). Tanks for your guidance!

  46. Bryn King says:

    I started late (illness and travel for the first two plus weeks) and am slowly getting caught up. I’m sure it’s noted somewhere (which I haven’t been able to find), but how long will the lectures be available? When does the course actually end, and when it ends, does my access to the videos end too?

    Thanks, and my apologies if this has been answered elsewhere.

  47. Ryan Wells says:

    Do you have any advice about using the teffects suite with multiply imputed data? Do the commands integrate well with the “mi” commands in stata?

  48. Ben Fisher says:

    Related to Blair’s question about treatment as a multinomial variable, are there treatment effects approaches for treatments measured as continuous variables? For example, if we were studying the effect of the punitiveness of schools’ approaches to discipline (measured as a continuous variable) on bullying rates, how might the techniques we have been learning apply here?

    • Stephen Vaisey says:

      There are lots of ways to approach this general issue and not a ton of consensus. (Searching for “dose-response” approaches will get you a foothold.)

      One basic way to think about it is that you want to match cases on their *predicted* punitiveness (based on other characteristics) and then see how *actual* differences in punitiveness are associated with the outcome. The difficulty here is that is that you can’t easily divide the world into “treated” and “untreated” for matching.

      My student Andrew Miles (now a prof at Toronto) has a paper where he essentially does interval matching by grouping cases into bins with similar predicted values of the continuous treatment and then estimating fixed-effects regressions clustering at the bin level. That’s the spirit of what you’re trying to do, anyway.

      You can check out the paper here: http://journals.sagepub.com/doi/10.1177/0003122415591800

      If you are OK grouping the schools into, say, low, medium, and high levels of punitiveness, then you could just use a Stata command that allows for multivalued treatments. Sure, you’d be treating them as nominal but if you have enough data that would be OK.

  49. Blair Darney says:

    I work with Health economists and they are all about D-in-D for causal estimation. Do I understand correctly that the “ra” approach in module 14 is an extension of interactions/DinD because it allows interactions with each variable, not just one (tx var)? Just trying to think how best to communicate with economist colleagues – is this a substantive difference or a jargon difference?

    • Stephen Vaisey says:

      They are similar techniques in that both use a control group to estimate a counterfactual for the treatment group. That is, both ask “what would a treated case look like if were a control case?” The difference is in the nature of the predictors. D-in-D uses time: what happened to the control group between T1 and T2? RA uses other respondent characteristics to make a prediction about what a treated case would have been like without treatment. But they are both using a counterfactual approach. Does that help?

      • Blair Darney says:

        Yes. I get the counterfactual set-up is the same. The key distinction seems to be DnD is always about time – interacting time with Tx but not looking at Tx with other factors. Thanks!

  50. CP says:

    Hi Steve, I am through Module 7 and have a couple of questions:

    1) What if your main DV is binary or categorical instead of continuous (i.e. goes to college instead of birthweight)?

    2) and similarly, will PSM work if treatment group is not binary, e.g. a hypothetical program participation variable with three categories (didn’t participate, participated in online training, participated in in-person training) vs. just the binary mbsmoke ). I found “teffects multivalued” but not sure you are going to go over that?)

    3) What do you do about other control vars that may predict the outcome but not the treatvar (e.g. downstream vars that might predict birthweight but not smoking)?

    4) Are you going to send us the code for generating the examples on the slides? You mentioned it in the video but I don’t see it on the website.

    • Stephen Vaisey says:

      1) That’s not a problem. Since the outcome is just a difference in means, you could convert that into an odds ratio or risk ratio if you wanted.

      2) You can do PSM with multivalued treatments. We’ll cover it briefly a bit later in the course but the main idea is that you estimate p-scores using a multinomial logistic regression.

      3) If you’re interested in estimating the effect of the treatment on the outcome, the fact that there are other variables that predict the outcome and are uncorrelated with the treatment won’t bias that estimate. These methods are focused on getting ONE treatment effect of interest.

      4) I think you’re right! I just added it to the course elements list toward the bottom. Thanks for catching that.

  51. Nicolas says:

    Dear Steve,

    I’m a part-time lecture at Université Libre de Bruxelles and Monitoring and Evaluation Manager at Modus Vivendi. I have a lot of catching-up to do. I apologize but the end of the academic year was though and I’ve been quite busy working in music festivals as well where Modus Vivendi does harm reduction.


  52. Nicolas says:

    Hi Stephen,

    This is probably off topic but I’m interested in using PS or alternatives covered in this course in cross-cultural research. For instance, in one of my studies, we compared Belgian and Italian participants and used sem to test our hypotheses. However, and despite the fact we used the same recruitment strategies in both countries, the two samples differ (e.g., in terms of age and eduction). Can we apply the techniques you teach us in this course to other statistical techniques than teffects, for instance with sem?

    Thanks in advance for your answer,

    • Stephen Vaisey says:

      Nicolas: there is no reason you couldn’t use any of the techniques we have discussed to make a comparison between (say) two countries. It would help you assess common support in a more straightforward way and would require fewer (no no) parametric assumptions in terms of your control variables. That said, you could (say) use something like entropy balancing and just weight the SEM analysis that way. This would give you a doubly robust approach that combines weighting and regression.

  53. Mark says:

    Towards end of module 3, on slide starting ‘Excursus: Three assumptions of these models’
    “these models” here implies the exact models, or also the other models we will use?

    In clinical medicine it would seem that the SUTVA is always violated when analyzing retrospective data because success or failure or side effects of a trial of therapy in one patient causes a clinician to reconsider treating or not in the next patient. Is such a thing ever accounted for?


    • Mark Deneau says:

      sorry for late question, I was unable to being the course until now, and catching up…

      • Stephen Vaisey says:

        Mark: No problem on being late. Thanks for your questions.

        These three assumptions apply to all the models (and the non-parametric techniques) covered in the course.

        On your question about clinical medicine: it depends on what you mean by “retrospective.” If you mean that the analysis of ONE dataset might affect LATER practice, that wouldn’t be a violation of SUTVA. It might mean that the treatment effect would change because those being treated are now different than those who were treated in the last sample. If, for example, a treatment threshold gets relaxed based on a previous successful finding, future treatment effects might be lower if the original ATT was higher than the original ATC.

        If you meant that people are making decisions along the duration of data collection based on what happened to previous patients during the same data collection, then this issue is more complicated. There are still a couple of ways to deal with a situation like this. One would be random intercepts at the clinician level and perhaps even random slopes on time for each clinician in a propensity model (i.e., a random growth-curve model). This would allow clinicians to change their baseline probability of treatment assignment over the course of a study. You could conceivably add an exponential autoregressive term to a selection model like that as well.

  54. Alissa says:

    Hi Steve,

    Similar to other students, I have a question about matching when your treatment has more than 2 groups.

    I am working with data from a study using an accelerated cohort design (participants were between 13-17 at the first wave, 14-18 at the second wave, etc), and ultimately want to use the data to look at developmental trajectories (e.g., self-reported offending from 13-22, which would cover the full span of ages in the study).

    One problem is that I’m making the assumption that the 13-year-old’s data is representative of a participant who was 16 when he enrolled in the study. Research would suggest kids who commit crime at a very young age are likely quite different from kids who began commiting crime during adolescence.

    I came across a paper that essentially used propensity weighting with cohort as the treatment to balance the different cohorts. The authors used a package in R called “TWANG” which can now also be run through STATA. I’m trying to figure out if there is some benefit of trying to learn the code for TWANG rather than use one of the methods discussed in this course.

    Thanks in advance for your input!


    • Stephen Vaisey says:

      Hi Alissa,

      I don’t know much about TWANG though what I do know suggests the main difference is in the estimation of the p-score via boosted regression instead of logistic regression. I don’t know enough about your particular case to know whether this might provide an advantage. The spirit would certainly be the same in either case.

      I’m not sure what the treatment is in your question. Do you mean the study cohort or the birth year? In any case, you can use these techniques to make any groups comparable with each other on covariates. You could do something like this with pretty much any of the techniques but the easiest to describe would be making all the cohorts match the full-sample target of the covariate moments using -ebalance-.

      If you provide more information about your actual research question I might be able to provide a bit more targeted guidance.


      • Alissa says:

        Hi Steve,

        Thanks for your quick response.

        The broad, most simple description of my research question is: does perceived life expectancy change across development.

        Because I plan to model perceived life expectancy using a latent growth curve framework, I’ll be reformatting the data to model this variable developmentally (in almost a piecewise fashion where some participants provide data from ages 13-17, whereas others provide data between 17-21, etc). My problem is that I’m making the assumption all of these cohorts can be considered part of the same developmental trajectory. Davis (2017; Effect of Victimization on Impulse Control and Binge Drinking among Serious Juvenile Offenders from Adolescence to Young Adulthood) offers a great description of this exact problem on page 6 of this article, in case that helps to clarify.

        You are correct in that the treatment would be birth year, with the idea being that youth who enrolled in the study at younger ages may differ on important covariates than youth enrolled at older ages. Ultimately I’d want to match the different cohorts on covariates that differ across these cohorts (parent education, age at first offense, etc), before modeling perceived life expectancy.

        Hopefully that provides a bit more clarification! Thanks so much.


        • Stephen Vaisey says:

          Alissa: given what you describe, you can definitely use some technique to ensure that groups of respondents of different ages all match on time-constant or pre-observation characteristics. You wouldn’t want to do this for time-varying characteristics, however, because that would probably be soaking up some of the treatment effect you are after.

          Practically, you would do this by matching every cohort to some sort of target distribution. Ideally, the target would be the population means (and variances) of the population to which you wanted to generalize. The -ebalance- function makes it easy to match a group to target distribution. So you’d just want to do this separately for each group so that they all contain the same balance. There are other ways to do this, too, but this would be the most straightforward in my view. After balancing, you would just run the growth curve model with the weights.

          Good luck!

  55. Blair Darney says:

    We have been playing with ebalance in our own data (where we are pooling 3 datasets and want to balance the samples) and get a message that variables have been dropped due to collinearity – what does this mean in the context of balancing/weighting. We took it as an indication that the samples were very different on these variables and balance was not possible. But I think I’m missing something…

    • Stephen Vaisey says:

      Without knowing any of the details of the data, I’m afraid there’s not much I can say. Maybe try asking the program just to match on the first moments (means). If it can’t do that, then there’s something you’re just not thinking of about your data. That is, you might be asking it to do something that doesn’t make sense.

  56. Ben Fisher says:

    I had a bit of a difficult time following the conversation about applying treatment effects to panel data. Is the gist that they’re not really appropriate? Or that matching doesn’t make sense, but maybe balancing does? Are there any readings you would suggest for further understanding how and/or when one might incorporate TEA into panel data?

    • Stephen Vaisey says:

      Ben: I really didn’t have time in the course to develop these ideas in any detail, so I just wanted to give a flavor in the lecture.

      The key thing is to bear in mind is that we have a much better counterfactual with panel data than with cross-sectional data: the respondent herself in the other treatment status!

      The idea here is the same — we want a counterfactual comparison. But rather than comparing a treated case to another case similar on observables, we compare the respondent to herself in both statuses. This is a much better comparison because it rules out confounding on all time-constant characteristics (measured or not).

      So really all panel data is attempting some kind of “treatment effects analysis” but they don’t have to resort to covariate matching.

      Here are a few reading suggestions to try to think about panel data in a potential outcomes framework:

      Blackwell, M. (2013). A Framework for Dynamic Causal Inference in Political Science. American Journal of Political Science, 57(2), 504–520.

      Chapter 11 of Counterfactuals and Causal Inference by Morgan and Winship.

      Check out the literature on marginal structural models (e.g., Williamson, T., & Ravani, P. (2017). Marginal structural models in clinical research: when and how to use them?)

  57. Mark says:

    In module 7, and on slide 121, in a discussion of overfitting the regression models, you make reference to not adding variables downstream, ‘be careful not to control for the consequences of your treatment’. Can you give an example of what this means? What sort of variable would fit this criteria and not be appropriate for a regression model to generate propensity scores.

    • Stephen Vaisey says:

      Good question! Here’s one example. Let’s say you wanted to know the effect of having a college degree on mental health. You’d want to control for things (or match on things) that predict why some people get a college degree and others don’t. But if you control for current income (which is, in part, a function of your education), then you’re controlling away one of the mechanisms that may mediate the relationship between a college degree and the outcome. That is, you’d be reducing some of the total treatment effect by closing down a pathway through which college would have its effects. Does that help?

  58. Sarah says:

    This question may be a little too late– do you have any thoughts about sensitivity analysis for outcomes of propensity matched cases? A reviewer suggested that we conduct one to strengthen our PSM analysis, but I’ve only seen this method applied in logistic regression analysis (Stata’s mhbouns and Sensatt module, I believe address this_. Are you aware of any literature on sensitivity analysis for time-to-event models?

Post a question or comment