Data Science Toolkit: SAS
By Kat Campise, Data Scientist, Ph.D.
It cannot be overemphasized that the primary tools used by data scientists for accessing, cleaning, and analyzing data — along with constructing predictive models — are SQL, Python, and R. Yet, there are employers who, due to their particular industries, use other analytical tools such as SAS (the healthcare sectors, including pharmaceuticals, tend to require SAS proficiency). The main takeaway here is that SAS is not a consistent benchmark for data science qualifications. You’ll be far more marketable — and, perhaps, useful — as a data scientist if you master advanced math, stats, expertise in one or more industry sectors, and the traditional programming language triumvirate: SQL, Python, R. Learn the essential components first; then, if required by a potential employer or if you’ve reached the height of expertise in all of the aforementioned aspects, expand your software and programming language skill set.What is SAS?
In terms of software for analytics, SAS is one of the oldest; it was originally developed between the late 1960’s and early 1970’s by North Carolina State University. The initial objective for creating SAS was the analysis of agricultural data. It’s evolution was, and continues to be, relatively slow compared to the speedy upgrades and library additions for R and Python. As the 1970’s, 1980’s, and 1990’s dragged on, SAS became the SAS Institute, and new statistical packages were incorporated. As with any software package that is expected to generate revenue for its developers on a continual basis, the SAS Institute releases upgrades on a regular basis (at least once per year since 2010). While SAS’s revenue continues to climb upward, it is neither the leading software for business intelligence (SAP is the top vendor with 16% of the market as of 2017) nor is it the most preferred statistical tool by individual analysts and data scientists. But, SAS does provide users with a plethora of products — over 200, in fact — which includes:- Asset Performance Analytics
- Analytics for IoT
- Business Rules Manager
- Customer Intelligence Solutions
- Decision Management Solutions
- Econometrics
How is SAS Different from Python and R?
SAS has its own programming language that resembles SQL (but, it’s not considered SQL-like), and it deploys a graphical user interface (GUI). There are GUIs for Python and R, but SAS’s is built-in rather than being primarily optional. However, to date, for everything you can do in Python and R — specifically related to data science — SAS also has similar, if not the same, capabilities. Honestly, it comes down to personal or industry-specific preferences and/or requirements. Since Python and R are open source, it’s far less costly to deploy those programming languages. Open source programming languages also have ample benefits associated with quickly incorporating the most up to date techniques (machine learning, AI, deep learning, etc.); whereas SAS and other software possess a more extended developmental latency period.
Where Can I Learn SAS?
As mentioned above, unlike its statistical analytics cousins — Python and R — SAS doesn’t hold the same level of popularity. Considering that Python is used in software development, it has broader application possibilities beyond merely being deployed as a statistical analysis tool. R was designed explicitly for implementing statistical techniques — whether basic or advanced — but R enthusiasts are now venturing into software engineering using R as their programming language of choice. The fundamental message here is that there are far fewer resources for learning SAS. There are learning avenues available, they’re just a bit more challenging to find (especially if you’re a budget conscious learner).- YouTube: SAS has a YouTube Tutorial channel where learners who are interested in dipping their toe into the SAS pool can gain “how to” knowledge directly from the SAS Institute.
- SAS: SAS provides a selection of free e-learning resources that will take learners through the basics of SAS Programming, Data Management, Business Intelligence and Analytics, and SAS for Hadoop.
- Coursera: Wesleyan University offers a specialization in Data Analysis and Interpretation that allows learners to use SAS or Python to complete the assignments required for certification. A single course can either be audited for free, or the entire specialization can be completed for $49 per month.
- SASCrunch: SASCrunch Training offers a compendium of free resources for learning the ins and outs of SAS that incorporates static, step-by-step guides along with video tutorials and interactive learning modules. Beyond the SAS learning options indicated above, SASCrunch is one of the most comprehensive SAS learning tools available.