#### R Should Be Your Second Language (If It’s Not Already Your First)

When R first came out, around the year 2000, I was really excited. Here was a powerful, programmable statistical package that was free to anyone. I thought “This could revolutionize data analysis.” But when I gave it a test run, I quickly got discouraged. All the routine data management tasks seemed much harder in R […]

Read More »#### Asymmetric Fixed Effects Models for Panel Data

Standard methods for the analysis of panel data depend on an assumption of directional symmetry that most researchers don’t even think about. Specifically, these methods assume that if a one-unit increase in variable X produces a change of B units in variable Y, then a one-unit decrease in X will result in a change of […]

Read More »#### Instrumental Variables in Structural Equation Models

When I teach courses on structural equation modeling (SEM), I tell my students that any model with instrumental variables can be estimated in the SEM framework. Then I present a classic example of simultaneous causation in which X affects Y, and Y also affects X. Models like this can be estimated if each of the […]

Read More »#### For Causal Analysis of Competing Risks, Don’t Use Fine & Gray’s Subdistribution Method

Competing risks are common in the analysis of event time data. The classic example is death, with distinctions among different kinds of death: if you die of a heart attack, you can’t then die of cancer or suicide. But examples also abound in other fields. A marriage can end either by divorce or by the […]

Read More »#### Using “Between-Within” Models to Estimate Contextual Effects

In my courses and books on longitudinal data analysis, I spend a lot of time talking about the between-within model for fixed effects. I used to call it the hybrid model, but others have convinced me that “between-within” provides a more meaningful description. Last week my long-time collaborator, Paula England, asked me a question about […]

Read More »#### The Peculiarities of Missing at Random

I thought I knew what it meant for data to be missing at random. After all, I’ve written a book titled Missing Data, and I’ve been teaching courses on missing data for more than 15 years. I really ought to know what missing at random means. But now that I’m in the process of revising […]

Read More »#### In Defense of Logit – Part 2

In my last post, I explained several reasons why I prefer logistic regression over a linear probability model estimated by ordinary least squares, despite the fact that linear regression is often an excellent approximation and is more easily interpreted by many researchers. I addressed the issue of interpretability by arguing that odds ratios are, in […]

Read More »#### In Defense of Logit – Part 1

In a recent guest blog, Paul von Hippel extended his earlier argument that there are many situations in which a linear probability model (estimated via ordinary least squares) is preferable to a logistic regression model. In his two posts, von Hippel makes three major points: Within the range of .20 to .80 for the predicted […]

Read More »#### When Can You Fit a Linear Probability Model? More Often Than You Think

In July 2015 I pointed out some advantages of the linear probability model over the logistic model. The linear model is much easier to interpret, and the linear model runs much faster, which can be important if the data set is large or the model is complicated. In addition, the linear probability model often fits […]

Read More »#### Causal Mediation Analysis

On April 21-22, 2017, I will be offering a seminar on causal mediation analysis in Philadelphia with Statistical Horizons. The course will cover very recent developments in this area. Mediation is about the mechanisms or pathways by which some treatment or exposure affects an outcome. Questions about mediation arise with considerable frequency in the biomedical […]

Read More »