Data Science

First things first. I have never been employed by SAS and have no commercial relationship with the company. The opinions below are purely my own.

Take a look at any of the polls querying which tools data scientists report using. Highest on the list would probably be the R language, followed by Python. Next might be MATLAB, SPSS, Julia…… and pretty far down on the list would be SAS Software. You know, the statistical programming language that has a high learning curve, can’t do machine learning, and has pedestrian graphics. Well, be prepared as I’m about to debunk those myths, and give you five reasons why SAS is what I use for my data science environment (notice I didn’t say “tool” or “solution”.)

Slaying old dragons

The stigmas about SAS that I gave above were true a number of years ago. But SAS has done a remarkable job responding to the proliferation of big data and the need to manage it intelligently. For starters, it’s now possible to call R functions within SAS. In the near future it will be possible to include Python modules as well as those written in .NET languages. SAS has measurably improved its data mining, both within the statistics portion of SAS as well as within its Enterprise Miner platform. Finally, SAS has embraced visual analytics as an integral part of data science.

But what about learning SAS; isn’t it difficult? Not for data science projects, where a basic knowledge of SAS statements is all one needs. In fact, it’s much easier to begin using SAS in a meaningful way than it is with either Python or R.

Five reasons data scientists should use SAS

1.       SAS is an environment, not a tool. It isn’t a toolbox of machine learning techniques, a database engine, or data pre-processor; it’s all of the above. In fact, I can take my data in its native format, import it into SAS, then go through the whole analytical process. Manipulating data bases with or without using SQL is a snap, and there are a number of procedures to help in pre-processing data. Of course SAS has loads of statistical procedures, all of which can be customized to do exactly what I want. If you have SAS Enterprise Miner then a whole cornucopia of techniques (including model comparison, boosting, and ensemble methods) is available. Want to do genetic algorithms? That’s possible in SAS also.

2.       It’s been around longer than you’ve been living. Forty years and counting. The language syntax as well as the library of methods (“PROCs”) have been vetted extensively. Many major organizations (such as the Food and Drug Administration) use SAS. It’s been battle-tested and is as reliable as you could want.

3.       It’s great for building ensembles. I particularly like to use more than one method for intractable problems. SAS’ Enterprise Miner lets you use multiple methods, and arrive at a prediction using averaging or majority voting.

4.       Terrific customer support. Python and R have great user communities. So does SAS. In addition, you can submit problems to SAS Institute directly and get a response within 24 hours. Their technical support staff has helped me on more occasions than I care to count.

5.       Viya. O.K., I’ve saved the best for last. SAS has completely revamped its architecture to enable cloud analytics. If you want to use legacy SAS then that’s possible. But Viya brings visual and data mining analytics to your desktop requiring a minimum amount of coding on your part. By running analytics in the cloud you can access and run your programs from any computer. Plus, the programs execute in a fraction of the time it previously took. For example, I logged on to Viya at home and ran a back prop neural network on 600,000 observations in 40 seconds!

Yes, I know Python and R are free. So if money is a major consideration, by all means use those tools. But if you’re in a university or corporation, then SAS should absolutely be your data science environment.