Statistical Horizons Blog


Problems with the Hybrid Method

For several years now, I’ve been promoting something I called the “hybrid method” as a way of analyzing longitudinal and other forms of clustered data. My books Fixed Effects Regression Methods for Longitudinal Data Using SAS (2005) and Fixed Effects Regression Models (2009) both devoted quite a few pages to this methodology. However, recent research […]

Read More »

Free SAS!

Let me tell you about my favorite new toy, the SAS® University Edition, which was just released on May 28. It’s essentially free SAS for anybody who wants it, and it has the potential to be a real game changer. SAS has long had a reputation for being one of the best statistical packages around, but […]

Read More »

Prediction vs. Causation in Regression Analysis

In the first chapter of my 1999 book Multiple Regression, I wrote “There are two main uses of multiple regression: prediction and causal analysis. In a prediction study, the goal is to develop a formula for making predictions about the dependent variable, based on the observed values of the independent variables….In a causal analysis, the […]

Read More »

Listwise Deletion: It’s NOT Evil

At the 1998 Annual Meeting of the American Political Science Association, Gary King and three co-authors presented a paper titled “Listwise deletion is evil: What to do about missing data in political science.” The paper was later published under a different title in the American Political Science Review, but the original title has stuck in my head […]

Read More »

Another Goodness-of-Fit Test for Logistic Regression

In my April post, I described a new method for testing the goodness of fit (GOF) of a logistic regression model without grouping the data. That method was based on the usual Pearson chi-square statistic applied to the ungrouped data. Although Pearson’s chi-square does not have a chi-square distribution when data are not grouped, it […]

Read More »

Alternatives to the Hosmer-Lemeshow Test

In my post of March 2013, I pointed out some of the deficiencies of the Hosmer-Lemeshow test for goodness-of-fit (GOF) of logistic regression models. Most alarmingly, the p-values produced by the HL statistic can differ dramatically depending on the arbitrary choice of the number of groups. What I didn’t say in that post is that […]

Read More »

Rejected? Keep at it.

Journal editors are people, just like the researchers who submit their work for publication.  People make mistakes.  Journal editors often fail to correctly forecast the impact of a paper submitted for publication, and they often reject submissions that end up well received when eventually published.  As a journal editor myself, I get the occasional opportunity […]

Read More »

Why I Don’t Trust the Hosmer-Lemeshow Test for Logistic Regression

The Hosmer-Lemeshow (HL) test for logistic regression is widely used to answer the question “How well does my model fit the data?”  But I’ve found it to be unsatisfactory for several reasons that I’ll explain in this post. First, some background. Last month I wrote about several R2 measures for logistic regression, which is one […]

Read More »

What’s the Best R-Squared for Logistic Regression?

One of the most frequent questions I get about logistic regression is “How can I tell if my model fits the data?” There are two general approaches to answering this question. One is to get a measure of how well you can predict the dependent variable based on the independent variables. The other is to […]

Read More »

Everybody’s Gonna Have a Number

This month I’m going to take a break from statistical methods and focus, instead, on a particular statistic: citation counts. Recently I discovered Google Scholar Citations, and I was blown away. Introduced in November 2011, this web-based service has the potential to revolutionize the counting of citations to individual persons and their published works. And […]

Read More »
Older Entries Newer Entries