Data Science Toolkit: SAS
By Kat Campise, Data Scientist, Ph.D.
It cannot be overemphasized that the primary tools used by data scientists for accessing, cleaning, and analyzing data — along with constructing predictive models — are SQL, Python, and R. Yet, there are employers who, due to their particular industries, use other analytical tools such as SAS (the healthcare sectors, including pharmaceuticals, tend to require SAS proficiency). The main takeaway here is that SAS is not a consistent benchmark for data science qualifications. You’ll be far more marketable — and, perhaps, useful — as a data scientist if you master advanced math, stats, expertise in one or more industry sectors, and the traditional programming language triumvirate: SQL, Python, R. Learn the essential components first; then, if required by a potential employer or if you’ve reached the height of expertise in all of the aforementioned aspects, expand your software and programming language skill set.
What is SAS?
In terms of software for analytics, SAS is one of the oldest; it was originally developed between the late 1960’s and early 1970’s by North Carolina State University. The initial objective for creating SAS was the analysis of agricultural data. It’s evolution was, and continues to be, relatively slow compared to the speedy upgrades and library additions for R and Python. As the 1970’s, 1980’s, and 1990’s dragged on, SAS became the SAS Institute, and new statistical packages were incorporated. As with any software package that is expected to generate revenue for its developers on a continual basis, the SAS Institute releases upgrades on a regular basis (at least once per year since 2010). While SAS’s revenue continues to climb upward, it is neither the leading software for business intelligence (SAP is the top vendor with 16% of the market as of 2017) nor is it the most preferred statistical tool by individual analysts and data scientists. But, SAS does provide users with a plethora of products — over 200, in fact — which includes:
- Asset Performance Analytics
- Analytics for IoT
- Business Rules Manager
- Customer Intelligence Solutions
- Decision Management Solutions
Thus, SAS maintains a competitive advantage through a neatly packaged array of user options that are based on several essential elements: data access and management, reporting and graphics, business solutions, analytics, visualization and discovery. While SAS isn’t forthcoming with the cost to license its software — you need to contact them for a quote — prior reports via social media and other forums indicate that users should expect to pay upwards of $8,700 or more to access the SAS software. Granted, all of these quotes were sourced from four to five years ago. But, it’s reasonable to expect the cost to be higher now than in prior years. You can, however, request a free demo and SAS offers a University Edition for users to download for free.
How is SAS Different from Python and R?
SAS has its own programming language that resembles SQL (but, it’s not considered SQL-like), and it deploys a graphical user interface (GUI). There are GUIs for Python and R, but SAS’s is built-in rather than being primarily optional. However, to date, for everything you can do in Python and R — specifically related to data science — SAS also has similar, if not the same, capabilities. Honestly, it comes down to personal or industry-specific preferences and/or requirements. Since Python and R are open source, it’s far less costly to deploy those programming languages. Open source programming languages also have ample benefits associated with quickly incorporating the most up to date techniques (machine learning, AI, deep learning, etc.); whereas SAS and other software possess a more extended developmental latency period. One of the SAS upsides is the dedicated support team that is available to answer user queries directly. R and Python do have community support mechanisms, such as Quora, Stack Overflow, and Reddit. But, the quality (and tone) of the responses are not as consistent as having access to a centralized and highly organized tech support team.
Where Can I Learn SAS?
As mentioned above, unlike its statistical analytics cousins — Python and R — SAS doesn’t hold the same level of popularity. Considering that Python is used in software development, it has broader application possibilities beyond merely being deployed as a statistical analysis tool. R was designed explicitly for implementing statistical techniques — whether basic or advanced — but R enthusiasts are now venturing into software engineering using R as their programming language of choice. The fundamental message here is that there are far fewer resources for learning SAS. There are learning avenues available, they’re just a bit more challenging to find (especially if you’re a budget conscious learner).
- YouTube: SAS has a YouTube Tutorial channel where learners who are interested in dipping their toe into the SAS pool can gain “how to” knowledge directly from the SAS Institute.
- SAS: SAS provides a selection of free e-learning resources that will take learners through the basics of SAS Programming, Data Management, Business Intelligence and Analytics, and SAS for Hadoop.
- Coursera: Wesleyan University offers a specialization in Data Analysis and Interpretation that allows learners to use SAS or Python to complete the assignments required for certification. A single course can either be audited for free, or the entire specialization can be completed for $49 per month.
- SASCrunch: SASCrunch Training offers a compendium of free resources for learning the ins and outs of SAS that incorporates static, step-by-step guides along with video tutorials and interactive learning modules. Beyond the SAS learning options indicated above, SASCrunch is one of the most comprehensive SAS learning tools available.
Prediction isn’t an easy task, and disruptive innovation is often a confounding variable when attempting to determine how an industry will evolve. One thing is for sure: data science tools will follow the evolutionary trajectory of the technology sector. SAS has had an extensive proof of concept spanning 40-plus years, making it less likely that the software will disintegrate into a mere memory of defunct technology. However, data scientists only have so many hours in a day to get their jobs done while also keeping up on the latest shiny tools promising to make their work easier and more efficient. While there are many positives to learning SAS, and continuing education is highly encouraged, breaking into a data science career means you may need to save your SAS adventure for when a particular job requires the software as a minimum qualification.