Suppose you’re about to undergo a complicated surgical procedure to remove an embolism in your carotid artery. The chief surgeon tells you that he has a new high-powered laser that is great at dissolving clots. Sounds great, right? But when you ask if the operation is really going to take place in the hospital’s basement without any other physicians, nurses, IV lines, or anesthesia available, you get a shocking answer: Yes, but don’t worry, the most advanced instrument for your situation is being used.

Obviously the above situation will never actually occur in the surgical realm. However, it’s not uncommon to see such a scenario play out in the data science sphere. It might go something like this – A biomedical engineer develops an innovative algorithm in R that (fill in the blanks) and it’s being distributed now on the CRAN package repository. While the new algorithm may be helpful to a certain extent, it doesn’t assist with data cleansing, pre-processing, visual analytics, data partitioning or post-processing the results. For those, you’re on your own.

tool versus environment

Hospitals and data science both need more than a single tool.

In a larger sense we’re talking about the difference between a “tool” and an “environment”. A tool might be best of breed at what it does. Unfortunately, when it comes down to actually being useful, it’s like trying to carry out a surgery with a single instrument. Conversely, an environment contains an end-to-end solution that provides all of the tools necessary for a project. That’s why the notion of a polyglot has garnered attention. In brief, polyglots combine various technologies in an attempt to create an environment. An example might be using SQL for data munging, Python libraries for machine learning algorithms, and R libraries for visualizing the results from a predictive model. It sounds good in theory. In practice it can be hellacious trying to cobble together all of these components.

That’s why I prefer an analytical ecosystem in which all of the parts are connected. This not only allows me to have a single code source, it also makes it much easier to manage a project from start to finish. Further, with the major commercial analytical systems one also gets the benefit of having code that has been validated all along the analytic pipeline.

An environment such as SAS’ Viya fits in nicely with the above concept. A user also doesn’t have to leave the friendly confines of the SAS Application Server. Visual analytics, statistical techniques, database manipulation, and machine learning procedures are all there. However, if a new algorithm is available in Python, it can be called from within Viya. This decreases development time substantially and makes it easier to repurpose the involved processes later on. Other environments that have gained followers include SPSS and Weka.

So the next time you’re deliberating between a tool and an environment for your analytical needs, remember “Doctor’s Orders”.