Statistical Horizons Blog


Imputation by Predictive Mean Matching: Promise & Peril

Predictive mean matching (PMM) is an attractive way to do multiple imputation for missing data, especially for imputing quantitative variables that are not normally distributed. But, as I explain below, it’s also easy to do it the wrong way.  Compared with standard methods based on linear regression and the normal distribution, PMM produces imputed values […]

Read More »

Getting the Lags Right

In my November and December posts, I extolled the virtues of SEM for estimating dynamic panel models. By combining fixed effects with lagged values of the predictor variables, I argued that this approach offers the best option for making causal inferences with non-experimental panel data. It controls for all time-invariant variables, whether observed or not, […]

Read More »

More on Causal Inference With Panel Data

This is a follow-up to last month’s post, in which I considered the use of panel data to answer questions about causal ordering: does x cause y or does y cause x?  In the interim, I’ve done many more simulations to compare the two competing methods, Arellano-Bond and ML-SEM, and I’m going to report some […]

Read More »

Using Panel Data to Infer Causal Direction: ML vs. Arellano-Bond

Does x cause y or does y cause x? Virtually everyone agrees that cross-sectional data are of no use in answering this question. The ideal, of course, would be to do two randomized experiments, one examining the effect of x on y, and the other focused on the reverse effect. Absent this, most social scientists […]

Read More »

Sensitivity Analysis for Not Missing at Random

When I teach my seminar on Missing Data, the most common question I get is “What can I do if my data are not missing at random?” My usual answer is “Not much,” followed by “but you can do a sensitivity analysis.” Everyone agrees that a sensitivity analysis is essential for investigating possible violations of […]

Read More »

Problems with the Hybrid Method

For several years now, I’ve been promoting something I called the “hybrid method” as a way of analyzing longitudinal and other forms of clustered data. My books Fixed Effects Regression Methods for Longitudinal Data Using SAS (2005) and Fixed Effects Regression Models (2009) both devoted quite a few pages to this methodology. However, recent research […]

Read More »

Free SAS!

UPDATE, 11 February 2021.  SAS University Edition will soon be phased out.  It will end completely on Aug. 2, 2021. You will no longer be able to download it after Apr. 30, 2021. It’s being replaced by SAS OnDemand for Academics, which is now available here. The user experience is very similar, but there are […]

Read More »

Prediction vs. Causation in Regression Analysis

In the first chapter of my 1999 book Multiple Regression, I wrote “There are two main uses of multiple regression: prediction and causal analysis. In a prediction study, the goal is to develop a formula for making predictions about the dependent variable, based on the observed values of the independent variables….In a causal analysis, the […]

Read More »

Listwise Deletion: It’s NOT Evil

At the 1998 Annual Meeting of the American Political Science Association, Gary King and three co-authors presented a paper titled “Listwise deletion is evil: What to do about missing data in political science.” The paper was later published under a different title in the American Political Science Review, but the original title has stuck in my head […]

Read More »

Another Goodness-of-Fit Test for Logistic Regression

In my April post, I described a new method for testing the goodness of fit (GOF) of a logistic regression model without grouping the data. That method was based on the usual Pearson chi-square statistic applied to the ungrouped data. Although Pearson’s chi-square does not have a chi-square distribution when data are not grouped, it […]

Read More »
Older Entries Newer Entries