#### Better Predicted Probabilities from Linear Probability Models

In two earlier posts on this blog (here and here), my colleague Paul von Hippel made a strong case for using OLS linear regression instead of logistic regression for binary dependent variables. When you do that, you are implicitly estimating what’s known as a linear probability model (LPM), which says that the probability of some […]

Read More »#### Introducing Code Horizons

We are thrilled to introduce a new addition to the Statistical Horizons family — Code Horizons. This new initiative emerges from a simple realization. When we do our research, most of the things we actually do are not things we learned in graduate school. These include writing scripts to parse textual data, creating effective visualizations, […]

Read More »#### How many imputations do you need?

When using multiple imputation, you may wonder how many imputations you need. A simple answer is that more imputations are better. As you add more imputations, your estimates get more precise, meaning they have smaller standard errors (SEs). And your estimates get more replicable, meaning they would not change too much if you imputed the […]

Read More »#### R Should Be Your Second Language (If It’s Not Already Your First)

When R first came out, around the year 2000, I was really excited. Here was a powerful, programmable statistical package that was free to anyone. I thought “This could revolutionize data analysis.” But when I gave it a test run, I quickly got discouraged. All the routine data management tasks seemed much harder in R […]

Read More »#### Asymmetric Fixed Effects Models for Panel Data

Standard methods for the analysis of panel data depend on an assumption of directional symmetry that most researchers don’t even think about. Specifically, these methods assume that if a one-unit increase in variable X produces a change of B units in variable Y, then a one-unit decrease in X will result in a change of […]

Read More »#### Instrumental Variables in Structural Equation Models

When I teach courses on structural equation modeling (SEM), I tell my students that any model with instrumental variables can be estimated in the SEM framework. Then I present a classic example of simultaneous causation in which X affects Y, and Y also affects X. Models like this can be estimated if each of the […]

Read More »#### For Causal Analysis of Competing Risks, Don’t Use Fine & Gray’s Subdistribution Method

Competing risks are common in the analysis of event time data. The classic example is death, with distinctions among different kinds of death: if you die of a heart attack, you can’t then die of cancer or suicide. But examples also abound in other fields. A marriage can end either by divorce or by the […]

Read More »#### Using “Between-Within” Models to Estimate Contextual Effects

In my courses and books on longitudinal data analysis, I spend a lot of time talking about the between-within model for fixed effects. I used to call it the hybrid model, but others have convinced me that “between-within” provides a more meaningful description. Last week my long-time collaborator, Paula England, asked me a question about […]

Read More »#### The Peculiarities of Missing at Random

I thought I knew what it meant for data to be missing at random. After all, I’ve written a book titled Missing Data, and I’ve been teaching courses on missing data for more than 15 years. I really ought to know what missing at random means. But now that I’m in the process of revising […]

Read More »#### In Defense of Logit – Part 2

In my last post, I explained several reasons why I prefer logistic regression over a linear probability model estimated by ordinary least squares, despite the fact that linear regression is often an excellent approximation and is more easily interpreted by many researchers. I addressed the issue of interpretability by arguing that odds ratios are, in […]

Read More »