The Data Scientist’s Toolkit: R Programming Language
- Both basic and advanced statistics
- Data cleaning
- Data mining and scraping
- Machine Learning
Grand Canyon University
Saint Mary's University of Minnesota
Featured Program: Online M.S. in Business Intelligence & Data Analytics
George Mason University
Johns Hopkins University
University of Bath
Featured Program: Business Analytics online MSc; Artificial Intelligence online MSc
Why Use R?Let’s get this fact out of the way: choosing a programming language is equal parts personal preference and what an employer requires. If you are an expert in R, but Corporation X uses Python, unless you can convince them otherwise or they are flexible about programming languages for data science, then you’ll need to make the switch. In fact, knowing how to perform the same (or similar) functions in both languages is ideal. But, if you’re just getting started in all things data science, and you are new to programming, then R is a great gateway for learning how to merge the worlds of statistics and programming:
- R has a robust community that is constantly developing new packages and maintains several user groups where newbies, intermediates, and experts can exchange ideas and support new R users. Rather than slogging through R “How to” blogs, Stack Exchange, and Stack Overflow — which are quickly out of date due to consistent package updates and the release of brand new packages — R users can direct their questions precisely towards the R community. There is, however, one exception: R-bloggers is an excellent resource for all levels of R users. Keep in mind that R’s popularity is still growing while also being used primarily within the academic, healthcare, and government sectors (though many Google jobs call for R expertise). So, the more well-known resources for all things programming related aren’t as reliable for helping you to increase your understanding of how R can be used within data science.
- In terms of statistical packages, R is completely free; it costs you nothing to download and get started. This is in contrast to software such as SPSS, SAS, and STATA, which can cost you hundreds if not thousands of dollars for a license. While some employers still use the aforementioned statistical programs, and becoming familiar with them is recommended, most data science courses — whether via massive online open courses (MOOCs) or the increasing number of degree programs — will use either R or Python for statistical analysis.
Where to Learn RIn this age of open source and self-directed learning, there are a dizzying array of resources for learning R. It really comes down to which learning method most suits you.
- Coursera, one of the largest MOOCs, has a number of R programming and R-focused data science courses available. Most courses you can audit for free, which means you may or may not have access to the quizzes and peer review process that lead to earning a certificate. If you’re strapped for cash and prefer to have a verified certificate to beef up your resume, you can apply for “Financial Aid” and take the entire course for free.
- EdX, another MOOC, also offers self-study courses in Programming R for Data Science, Statistics and R, Introduction to R for Data Science, among many others. EdX also offers course auditing, where you’ll have access to the video lectures and quizzes but won’t earn a certificate unless you pay for the course.
- DataCamp has a free Introduction to R learning module that takes you step by step through basic R functions. You have the ability to either practice online or via the DataCamp app (so you can learn on the go). As with Coursera and EdX, they limit the number of modules that you can access without cost.
- In addition to Nanodegree programs such as Data Scientist Foundation, Udacity offers a free a Data Analysis with R course that focuses on exploratory data analysis (EDA).
- Pluralsight is yet another resource for kick starting your R journey via their Try R module, where they will take you through expressions, logical values, data frames, and how to apply what you’ve learned to real-world data.