R is a language and environment for statistical computing, data science, machine learning, and graphics. Many data scientists and researchers are learning R as their first language for statistical data analysis. It is arguably the easiest programming language to learn as a data scientist although programmers that come from a Python or Java background might find R confusing due to the distinct syntax that R uses. R does not rely heavily on if conditions or loops. Instead, it uses data constructs like vectors, tables, matrices, and frames to perform bulk transformations on data.
Although at the time of writing this, Python is the most popular language in the field of data science, R is still widely used. There are plenty of companies including tech sector leaders such as Microsoft, Google, and Twitter that continue to use R.
If you are just starting out learning a programming language, start with the language that appears most intuitive to you. Just like with human languages, learning a new language can help with learning others faster, especially if they are related. So if that language is R, choose it. Machine learning and scientific computing libraries do exist in R, so do not think you will be excluding yourself by choosing R. In the long term, learning both Python and R will make you more adaptable and desirable for employment, plus It never hurts to continue learning!
The courses I have listed below are centered around learning R and its application to data science and statistics. Practical applications are the best way to learn a new programming language, so I have chosen courses that make them a priority. It is useful to have book companions alongside you as you work through your course of choice. The two books I recommend for companions are as follows:
- Introduction to Statistical Learning – This book provides a great introduction to statistical learning methods, which are essential tools for gaining insight into vast and complex datasets. It is a very clear book with plenty of straightforward explanations. If you want to grasp the theory and mathematics behind ML techniques, this is the book to get.
- Practical Statistics for Data Science – This book is perfect as a formal statistics resource and reference. It will give you examples of applications of statistical methods in data science, explaining proper use in a modern and accessible way. It serves more as an introduction to a wide range of topics for data science than in-depth coverage of all topics.
Now on to the online courses!
TL;DR
- Statistics With R Specialization – Duke University
- R Programming – John Hopkins University
- Data Science: R Basics – Harvard University
- Introduction to Data Analysis in R – Dataquest
Statistics with R Specialization – Mine Çetinkaya-Rundel, Coursera
Pricing: Free to audit, $39/month for certification
Course Material:
- Introduction Probability and Data with R
- Inferential Statistics
- Linear Regression and Modeling
- Bayesian Statistics
- Statistics with R Capstone
This course offers a comprehensive set of modules on the key areas of statistics. It is particularly useful for students who need an introduction to using R for statistical analysis and inference. The strong point of this course is the Capstone project, which offers an excellent opportunity to use the skills and knowledge gained from the previous four modules.
R Programming – John Hopkins University, Coursera
Pricing: Free to audit, $39/month for certification
Course Material:
- Background, Getting Started and Nuts & Bolts
- Programming with R
- Loop Functions and Debugging
- Simulation & Profiling
This course forms part of the John Hopkins University data science course. You will gain entry-level knowledge taking this course, starting from installing and configuring the necessary tools to build the environment for R programming. It is a very practical course that provides working examples of statistical data analysis. The problem sets provided could be more difficult than the reading material. I would suggest using R For Data Science to use alongside the reading material to help reduce the disparity between the course material and the problem sets.
Data Science: R Basics – Harvard University
Pricing: Free to audit, ~$40 for certification
Introduction to Data Analysis in R – Dataquest
Pricing: Free
This course is entirely free and serves as an introductory course to R. Dataquest provides a fully interactive learning experience, with no lecture videos. If you find you switch off during lectures and prefer to be active, Dataquest may be well suited to you. The course covers data frames, the basics of R studio, and the different data types used in R. Once you have covered the basics of R programming, you should move on to the paid Dataquest intermediate R course.
Make Learning Programming Easy
Here are some top tips to use these courses and learn programming more effectively:
- Learn by doing. Always apply what you have learned. Just like with learning human languages, the more you practice, the more intuitive the syntax and concepts will be
- Invest in the fundamentals, so that you can learn the more advanced concepts more quickly. Do not be tempted to skip past concepts that appear elementary.
- Write out code by hand to help engrain intent and understanding of syntax.
- Seek out support, whether online or in-person. You will come across bugs or concepts you are not able to grasp, finding help allows you to solve and progress faster.
- You multiple resources at once. While it can be easy to get buried under all the possible courses to go through, choose a few that you believe are well suited to your learning process. Grab a book companion to add more clarity to course materials.
- Limit the amount of reading sample code that you do. Alter sample code you find in the course material to increase your engagement and to help understand how pieces of code work.
- Ensure that when you are on a difficult programming problem, you take time away from it. Doing so will help freshen your eyes, boost your motivation, and allow you to seek advice. I go into more detail about the importance of taking breaks during work in my blog post titled “7 Best Tips For Remote Working For Data Scientists“.
Programming is an essential skill to have in our technology-driven world. Data scientists use programming to write algorithms, explore data, scale business solutions and visualize insights. Becoming comfortable with at least one widely used language will make performing the other facets of data science easier.
For further reading on common errors encountered when first learning R, go to the article: How to Solve R Error: $ operator is invalid for atomic vectors.
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.