%macro predict_ldm (data=_last_, y=, x=, pred=yhat, out=predict_ldm);
/*********************************************************************
Version 1.0, 22 April 2020
MACRO PREDICT_LDM produces a data set containing predicted values for a
linear probability model that are never less than 0 or greater than 1.
It is an implementation of the method described in my blog post at
https://statisticalhorizons.com/better_predicted_probabilities. The method is
based on the assumption of a linear discriminant model (LDM) which implies
a logistic regression model. The LDM was first described by R.A. Fisher in 1936.
The macro does the following:
1. It uses PROC REG to estimate the linear regression of a binary Y variable
on a set of X variables, and generates predicted values in the usual way.
The default output from PROC REG is reported.
2. It then rescales the regression coefficients to correspond to a logistic
regression model.
3. It generates predicted values based on a logistic regression models with the
coefficients produced in step 2. These are written to a new data set that
contains all the variables in the original data set, plus two new variables:
--the usual linear predictions, which may be greater than 1 or less than 0.
--the transformed LDM predictions, which are bounded by 0 and 1.
The macro argument has five parameters, only two of which are required:
DATA is the name of the data set to be analyzed.
The default is the most recently created data set.
Y is the dependent variable, which must have values of 0 or 1.
X is a list of predictor variables.
PRED is the name of the variable that will contain the predicted values.
The default is YHAT. That will be the name for the usual linear
predictions. The LDM predictions will be contained in a variable
called YHAT_LDM.
OUT is the name of the output data set. The default is PREDICT_LDM.
Example of usage:
%predict_ldm(data=my.data, y=var1, x=var2 var3 var4, out=newdata)
Author: Paul D. Allison, Statistical Horizons LLC
allison@statisticalhorizons.com
*******************************************************************/
proc reg data=&data outest=a plots=none;
model &y=&x;
ods output fitstatistics=fit;
ods output anova=anova;
output out=_predict_ pred=&pred;
quit;
data &out;
if _n_=1 then merge a anova(where=(source='Error') keep=ss source df)
fit(where=(label1='Dependent Mean') rename=(cvalue1=ybar)
keep=cvalue1 label1);
n=df+1;
k=n/ss;
a = log(ybar/(1-ybar)) + k*(intercept-.5) + .5*(1/ybar-1/(1-ybar));
set _predict_;
logodds=k*(&pred-intercept)+ a;
&pred._ldm = 1/(1+exp(-logodds));
drop label1 source n k a df ss ybar logodds;
run;
%mend predict_ldm;