July 24, 2014 By Paul Allison
Let me tell you about my favorite new toy, the SAS® University Edition, which was just released on May 28. It’s essentially free SAS for anybody who wants it, and it has the potential to be a real game changer. SAS has long had a reputation for being one of the best statistical packages around, but also one of the most expensive. Last I checked, the starting price for a single-user license was around $10,000. Not surprisingly, virtually everyone who uses SAS gets their license through their employer or their university.
So why is SAS now offering the core of its product line for free? For many years, SAS has made tons of money selling software to big companies, but its popularity among academics has been steadily waning. The decline in the academic market share has been especially steep in statistics departments where R has now become the preferred programming environment. This has created a serious problem for the SAS business model because the students of today are the business analysts of tomorrow. If they graduate with no experience using SAS, they will be far less likely to insist that their companies pay for a very costly software package. And the many companies that currently use SAS are finding it increasingly difficult to find new hires with SAS skills.
SAS has made some previous attempts to solve this dilemma. Several years ago they released the SAS Learning Edition, which individuals could buy for around $100. But the functionality of that product was so limited that it was really only good for learning how to code in SAS. More recently, they introduced SAS On Demand which enabled academic users to access SAS via a web server. I tried using this system for a couple of courses, but I found it way too cumbersome, both for me and for my students.
With the University Edition (UE), SAS has finally produced a winner. Here are some things I like about it:
- UE includes most of the SAS products that statistical analysts will need: BASE, STAT, IML, and ACCESS.
- It’s a completely local package and does NOT have to be connected to the Internet.
- UE can handle fairly large data sets (more on that later).
- When you sign on with an Internet connection, you are notified if an update is available. You can then update with the click of a button.
- The browser-based interface, called SAS Studio, is a snap to learn and use.
- SAS Studio will run in recent editions of all popular browsers, including Internet Explorer, Chrome, Safari and Firefox.
- UE can run on Macs, Windows, and Linux machines.
- It runs smoothly and speedily, although not quite as fast as a regular installed version of SAS.
- And did I mention that it’s absolutely free for anyone who wants it?
The license agreement states that UE can be used “solely for your own internal, non-commercial academic purposes.” As far as I can tell, there’s nothing to prevent someone in a business setting from downloading, installing, and running UE. But business users should bear in mind that the SAS Institute is known for zealously protecting its intellectual property.
You’re probably wondering, what’s the catch? Well, there are a few things not to like, but they are relatively minor in my opinion:
- UE only installs on 64-bit machines with at least 1 gig of memory.
- UE doesn’t have SAS/ETS (econometrics & time series), SAS/OR (operations research) or SAS/QC (quality control). Most importantly, it doesn’t have SAS/GRAPH, although it does have ODS graphics. So you can’t use PROC GPLOT, but you can use PROC SGPLOT.
- If you’re not connected to the Internet, it can take up to two minutes to start up, compared to only 10 seconds if you are connected. Weird, huh?
- Installation can be a little tricky, so you need to follow all the instructions carefully.
- It took me nearly two hours to download UE, but that was over a not-so-speedy WIFI connection.
Now for a few details and suggestions. UE runs as a virtual machine, so you first need to download and install a free copy of Oracle’s VirtualBox software. (UE also runs with VMware Player or VMware Fusion, but those cost real money). After downloading UE, you open VirtualBox and then install UE as a virtual machine. With VirtualBox still open, you can start up UE by pointing your web browser to http://localhost:10080. For more details, check out the FAQs on the SAS support site.
I was warned by a SAS tech support person that UE may not work on “very large” data sets. But it worked fine with the biggest data set that I have, which has 414,000 cases, 674 variables, and takes up 888 MB on my computer.
If you want to use existing SAS data sets and programs, the most straightforward approach is to copy them into a dedicated folder for UE. Alternatively, you can create a folder shortcut to your existing data sets–but the process is a bit tricky.
When I ran UE using a SAS data set that had been created by SAS 9.3 on a Windows machine, I got a warning in the Log window that the data set “is in a format that is native to another host, or the file encoding does not match the session encoding. Cross Environment Data Access will be used, which might require additional CPU resources and might reduce performance.” I’m guessing that this happens because VirtualBox creates a Linux environment for UE to run in. And SAS data sets in Windows are not identical to SAS data sets in Linux.
In any case, this difference in file formats can really slow things down. When I ran a logistic regression with five predictors on the aforementioned data set, it took 47 seconds of real time and 38 seconds of CPU time. My solution was to use a DATA step to copy the old data set into a new data set (presumably in UE’s preferred format). When I re-ran the logistic regression on the new data set, execution improved dramatically: real time declined to 18 seconds and CPU time to 6.5 seconds. By comparison, when I ran the same regression on my standard installed version of SAS 9.3, the real time was 12 seconds and the CPU time was 2 seconds. So UE is definitely slower than “real” SAS, but the difference seems tolerable for most applications.
SAS Studio is the slick new interface for accessing SAS via a web browser. It’s designed not just for UE, but for any environment where users need to access SAS on a remote server. SAS Studio will be instantly familiar to anyone who has used the traditional SAS Display Manager with its editor window (now called Code), Log window, and Results window. As with PC SAS, you can have multiple program windows open in SAS Studio. But unlike PC SAS, each program window has its own Log and Results window. If you’re accustomed to using SAS on a PC, you can immediately start doing things the way you’ve always done them. However, there are lots of cool new features, most of which are easily learned by pointing and clicking on icons. For example, when you’re in the Results window, there are buttons that will save your output to an HTML file, a PDF file, or an RTF file.
Here’s a hint that you may find useful: by default, SAS Studio is in batch mode. That means that whenever you run a block of code, whatever is already in the Log and Results windows will get overwritten. If you want your results to accumulate, click on the “go interactive” icon in the Code window. You can also change your Preferences to start each session in interactive mode.The downside to the interactive mode is that temporary data sets and macros produced in one program window are not available to any other program window.
If you plan to use UE a lot, it’s worth investing some time to learn the ins and outs of SAS Studio. A good introductory article (22 pages) can be found here. Or click here for an 8-minute video tutorial. If total mastery is your thing, you can download the 300-page manual here.
So there you have it, free SAS in a (virtual) box. I would guess that at least 95% of the statistical analyses that I’ve done using SAS over the last 10 years could have been done with UE. That’s great news for potential users who don’t currently have access to SAS. But it must be a little scary for the SAS Insititute. Will this free product cannibalize existing sales? Loss leaders are always risky, and it will be interesting to see how this plays out. Personally, I’m rooting for UE to be a big success, both for users and for SAS.