If you are just starting, it is essential to realize that data science is an interdisciplinary field. No one tool or method defines data science; instead, collections of tools and techniques are used in data science to gain insight into data. A data scientist utilizes the underlying foundations of machine learning, data wrangling, software engineering, and visualization in harmony. To make your journey more comfortable, in the long run, you should learn the key concepts in statistical data analysis and machine learning. Doing so will allow you to get the most of the more practical data science courses and absorb new ideas more readily. I have separated the more theoretical courses for your data science foundations on their pages. Please check the “Online Courses” link here or the search bar to find them.
Although the understanding of theory is critical, data science proficiency requires frequent practice, and it is vital that you get your hands dirty with data problems as soon and as often as possible. You should have a grasp of at least one programming language; I would suggest Python as it is the most commonly used language in data science. The majority of scientific computing and machine learning libraries written in Python, and it is the most natural and readable language, meaning you can get started writing your code very quickly. You will find that most data science and machine learning courses will use Python by default. A close second choice would be R, and of course, the best option is to learn both.
I have selected these courses because I believe they have the perfect blend of theory and practice and explain concepts intuitively. Some of these courses allow you to view and use the course material for free, but to attain a certificate, you will have to pay. A certificate from a reputable course will allow you to display your new skills and boost your employability. When going through courses, bear in mind that a single course will not provide you with everything. In order to get as much exposure and practice as possible, it is recommended that you do multiple courses. You will also find which course style suits you the most.
- Applied Data Science with Python Specialization – University of Michigan
- Data Science Specialization – John Hopkins University
- Data Science A-Z: Real-Life Data Science Exercises Included – Kirill Eremenko & SuperDataScience Team
- CS109 Data Science – Harvard
Using online courses comes with the risk of covering a wide range of topics with out enough depth. To make the most out of your online learning experience, you should pair your chosen course with one or more books that cover the course material in more detail. Visit my blog post recommending the best books for the machine learning concepts that underpin all of the listed courses.
Applied Data Science With Python Specialization – University of Michigan
- Price: Free to audit or $49/month for certification and marked assignments
- Introduction to Data Science in Python
- Applied Plotting, Charting & Data Representation in Python
- Applied Machine Learning in Python
- Applied Text Mining in Python
- Applied Social Network Analysis
This course is easy to follow, and there is a significant focus on application. Each course is broken down into four weeks, meaning the entire specialization is estimated to take 20 weeks. It is an intermediate-level specialization, where the ability to write programs in Python is assumed as well as basic knowledge of statistics. Go through my best Python courses here to learn the programming language. It is strongly project-driven, with theory kept to a minimum. So you can immediately apply your knowledge gained from the lectures and recommended reading materials. Discussion forums and mentors are available to help guide you through the course.
This course would be particularly beneficial for those who are finishing a degree program and want to specialize in data science. It is recommended that you pair this course with a more theoretical course if you do not have experience with machine learning, such as Andrew Ng’s Machine Learning course, which I include on this page.
If you are aiming to find a course to build your knowledge of statistics, this course will not provide this. Therefore I recommend looking having Statistical Learning as a book companion as you take the course.
Data Science Specialization – John Hopkins University
- Price: Free to audit or $49/month for certification and marked assignments.
- The Data Scientist’s Toolbox
- R Programming
- Getting and Cleaning Data
- Exploratory Data Analysis
- Reproducible Research
- Statistical Inference
- Regression Models
- Practical Machine Learning
- Developing Data Products
- Data Science Capstone
This specialization has a broad scope on the possible curriculum for data science and is rigorous in terms of building and delivering a data science pipeline. However, given the breadth of topics covered, you should do supplementary reading alongside the course for both statistics and machine learning. The course can be completed on a scale of six months to a year. This course has open-ended course projects to take advantage of that are portfolio-worthy. Invest in the projects to get the most out of it. The programming language used is R. I would advise some familiarity with the basics of the language. You can find the best courses to learn R here, and you can use Advanced R Programming by Hadley Wickham as a book companion.
Data Science A-Z: Real-Life Data Science Exercises Included – Kirill Eremenko & SuperDataScience Team
- Price: £44.99, but there are typically deals.
- Data Preparation
This course contains over two hundred lectures and more than twenty hours of content. It is one of the best-structured series for data science and provides the flexibility to select the module you want. It is perfect as an introduction to data science and allows you to learn how to use Tableau for visualization, SQL for database management, SSIS, and Gretl for statistical inference. As this course is introductory and focused on applications, you should do additional reading on the topics you feel could have been explored further.
CS109 Data Science – Harvard
- Price: Free
Harvard CS109 is a well known and highly rated university course for data science. The class material describes the “data science process” and breaks it up into five key concepts:
- Data Wrangling
- Data Management
- Exploratory Data Analysis
- Statistical Inference
Python the language used for assignments and projects; you should have working-level knowledge of Python to make the most of the course material. To get the most out of the content, you should focus most of your time solving the problem sets. These are all available in Jupyter notebook format: see cs109/content. You can also access material from previous years. CS109 will be more intuitive with an increasing understanding of statistics and programming experience. I recommend Python for Data Analysis as a book companion for the problem sets.
Price: Free to try. Basic plan $29/month, Premium plan $49/month
Dataquest offers a different experience to the traditional online learning experience. Instead of extensive lecture videos alongside problem sets, Dataquest operates like an interactive textbook. It aims to teach the student to be autonomous by leveraging project-based learning. It ensures you work through concepts to build your understanding as opposed to skimming through videos. All of the material and problem sets can be done via the student’s browser on the Dataquest learning interface. The data scientist course assumes a basic level of mathematics and Python and gradually builds your confidence and skills to make to perform statistical inference and use machine learning algorithms. There is the opportunity to dive into relevant mathematics topics such as probability, calculus, and linear algebra. You will go through the fundamental algorithms for machine learning, including:
- K-Means Clustering
- Decision Trees
- Neural Networks
- Linear and Logistic Regression
- K-Nearest Neighbours
In addition to the course, Dataquest has a great blog that contains plenty of great data science tips, which you can find here. Within every subscription level, you have the opportunity to take part in the Slack community, which is very active. Premium subscription students can also get access to career counseling and CV advice. If you find that you are someone who switches off with lectures and wants a fully interactive learning experience. Dataquest is one of the best options to choose from.
- Pricing Free, Basic $25/mo, Premium $250/yr. Increasing subscription price gives you access to more courses, coding challenges, projects and support.
Datacamp combines short video lessons with problem sets that require you to fill in the blanks as well as projects. Upon signing up, you can access a Course library, which contains over 340 courses including:
- Introduction to Python
- Introduction to R
- Introduction to SQL
- Introduction to Data Engineering
- Data Science for Everyone
- Introduction to Tableau (visualization)
You can design your course and target the skills you want to build. There is an instant feedback loop included in the platform, called Practice mode that allows you to perform exercises repeatedly. The focus of Datacamp is to keep exercises short and combine them with to-the-point video lessons. Within the paid plan, you can access the Slack community, and there is a community page that is accessible to everyone. If you are a visual learner and are just starting in your data science adventure, Datacamp will allow you to learn and advance quickly. However, I would advise once you have completed the course to enroll in a more in-depth course with a strong emphasis on programming.
How To Use These Courses
Each course has its strengths and weaknesses. The common factor for all of them is not all of the material explored will go into equal depth. Some modules may leave you wanting, and some modules will be too extensive for your level. To make the most out of the available courses, combine several of them, and use supplementary material where it is necessary to boost your knowledge. The key to online learning is autonomy. Think of book companions or lecture notes as friends you can refer to if you missed something during your online course. Do not expect everything to be fed into your brain passively; you will have to go outside of the curriculum for your chosen course or do preparation beforehand. The more effort you put into finding what you learn from best and what level you are at, the more you will extract from the listed courses.
Here is a video highlighting the best ways to make the most out of your online course.
Data science is a rapidly evolving and exciting field. You will need to invest a substantial part of your time and have the dedication to learning the full range of knowledge bases to be an excellent data scientist. If you are an academic building your in data science concepts, visit my post providing tips for academic readers who want to start a career in data science. You can also click here to access the best courses covering machine learning, Python and R. Have fun while learning and enjoy your journey!