#### The Peculiarities of Missing at Random

I thought I knew what it meant for data to be missing at random. After all, I’ve written a book titled Missing Data, and I’ve been teaching courses on missing data for more than 15 years. I really ought to know what missing at random means. But now that I’m in the process of revising […]

Read More »#### In Defense of Logit – Part 2

In my last post, I explained several reasons why I prefer logistic regression over a linear probability model estimated by ordinary least squares, despite the fact that linear regression is often an excellent approximation and is more easily interpreted by many researchers. I addressed the issue of interpretability by arguing that odds ratios are, in […]

Read More »#### In Defense of Logit – Part 1

In a recent guest blog, Paul von Hippel extended his earlier argument that there are many situations in which a linear probability model (estimated via ordinary least squares) is preferable to a logistic regression model. In his two posts, von Hippel makes three major points: Within the range of .20 to .80, the linear probability […]

Read More »#### When Can You Fit a Linear Probability Model? More Often Than You Think

In July 2015 I pointed out some advantages of the linear probability model over the logistic model. The linear model is much easier to interpret, and the linear model runs much faster, which can be important if the data set is large or the model is complicated. In addition, the linear probability model often fits […]

Read More »#### Causal Mediation Analysis

On April 21-22, 2017, I will be offering a seminar on causal mediation analysis in Philadelphia with Statistical Horizons. The course will cover very recent developments in this area. Mediation is about the mechanisms or pathways by which some treatment or exposure affects an outcome. Questions about mediation arise with considerable frequency in the biomedical […]

Read More »#### Teaching Stata in Bangladesh

When Paul Allison asked me if I wanted to teach a course in Bangladesh, my first reaction was confusion. Other than knowing a few basic facts about the place, I had spent little time thinking about the country and none at all imagining myself going there. And here, suddenly, was an opportunity to spend a […]

Read More »#### Linear vs. Logistic Probability Models: Which is Better, and When?

In his April 1 post, Paul Allison pointed out several attractive properties of the logistic regression model. But he neglected to consider the merits of an older and simpler approach: just doing linear regression with a 1-0 dependent variable. In both the social and health sciences, students are almost universally taught that when the outcome variable in […]

Read More »#### Don’t Put Lagged Dependent Variables in Mixed Models

When estimating regression models for longitudinal panel data, many researchers include a lagged value of the dependent variable as a predictor. It’s easy to understand why. In most situations, one of the best predictors of what happens at time t is what happened at time t-1. This can work well for some kinds of models, […]

Read More »#### Maximum Likelihood is Better than Multiple Imputation: Part II

In my July 2012 post, I argued that maximum likelihood (ML) has several advantages over multiple imputation (MI) for handling missing data: ML is simpler to implement (if you have the right software). Unlike multiple imputation, ML has no potential incompatibility between an imputation model and an analysis model. ML produces a deterministic result rather than […]

Read More »#### What’s So Special About Logit?

For the analysis of binary data, logistic regression dominates all other methods in both the social and biomedical sciences. It wasn’t always this way. In a 1934 article in Science, Charles Bliss proposed the probit function for analyzing binary data, and that method was later popularized in David Finney’s 1947 book Probit Analysis. For many […]

Read More »