Statistical Horizons Blog


Making Your First GitHub R Project

Increasingly, academic scholars, data scientists, and quantitative researchers are turning to GitHub for collaboration and to share data, code, and results. GitHub allows people to host public and private “repositories” that allow for the easy communication of research procedures and results. Underlying the GitHub architecture is the version control system, git, which provides further benefits to researchers.

Read More »

An Update on The MatchIt Package in R

One of the things we hope to do at Code Horizons is help steer you toward the best tools to meet your needs. Here’s a guest post by Noah Greifer, a postdoctoral fellow at Johns Hopkins and the developer of WeightIt and cobalt. Noah has just completed a massive overhaul of the workhorse MatchIt package for matching in R. This post […]

Read More »

Better Predicted Probabilities from Linear Probability Models

In two earlier posts on this blog (here and here), my colleague Paul von Hippel made a strong case for using OLS linear regression instead of logistic regression for binary dependent variables. When you do that, you are implicitly estimating what’s known as a linear probability model (LPM), which says that the probability of some […]

Read More »

Introducing Code Horizons

We are thrilled to introduce a new addition to the Statistical Horizons family — Code Horizons. This new initiative emerges from a simple realization. When we do our research, most of the things we actually do are not things we learned in graduate school. These include writing scripts to parse textual data, creating effective visualizations, […]

Read More »

How many imputations do you need?

When using multiple imputation, you may wonder how many imputations you need. A simple answer is that more imputations are better. As you add more imputations, your estimates get more precise, meaning they have smaller standard errors (SEs). And your estimates get more replicable, meaning they would not change too much if you imputed the […]

Read More »

R Should Be Your Second Language (If It’s Not Already Your First)

When R first came out, around the year 2000, I was really excited.  Here was a powerful, programmable statistical package that was free to anyone. I thought “This could revolutionize data analysis.”  But when I gave it a test run, I quickly got discouraged. All the routine data management tasks seemed much harder in R […]

Read More »

Asymmetric Fixed Effects Models for Panel Data

Standard methods for the analysis of panel data depend on an assumption of directional symmetry that most researchers don’t even think about. Specifically, these methods assume that if a one-unit increase in variable X produces a change of B units in variable Y, then a one-unit decrease in X will result in a change of […]

Read More »

Instrumental Variables in Structural Equation Models

When I teach courses on structural equation modeling (SEM), I tell my students that any model with instrumental variables can be estimated in the SEM framework. Then I present a classic example of simultaneous causation in which X affects Y, and Y also affects X. Models like this can be estimated if each of the […]

Read More »

For Causal Analysis of Competing Risks, Don’t Use Fine & Gray’s Subdistribution Method

Competing risks are common in the analysis of event time data. The classic example is death, with distinctions among different kinds of death: if you die of a heart attack, you can’t then die of cancer or suicide. But examples also abound in other fields. A marriage can end either by divorce or by the […]

Read More »

Using “Between-Within” Models to Estimate Contextual Effects

In my courses and books on longitudinal data analysis, I spend a lot of time talking about the between-within model for fixed effects. I used to call it the hybrid model, but others have convinced me that “between-within” provides a more meaningful description. Last week my long-time collaborator, Paula England, asked me a question about […]

Read More »
Older Entries