resource, data-science,

Data Science Learning Resources

Jan 01, 2023 · 20 mins read
Data Science Learning Resources
Share this

Data science is a popular field and garners a lot of interest. Whether you’re looking to become a data scientist or leverage data science skills in your current role, “where should I start?” and “what are some good books?” are common questions. Data science is also a broad field, with many skill areas to continue growing in your career. Given this, it can be helpful to put together a learning plan. In this article, we hope to share some worthwhile resources that we’ve used to build a data science foundation.

In an earlier article, we shared a list of skills to be an effective data scientist. In this one, we provide resources, including ones such as courses, books, and papers, to help develop those skills. In particular, we expand here on the technical skills section of the prior article, given that business and domain resources depend on the particular domain context.

As you dive into data science courseware, you’ll see that it extends from a variety of academic departments, and that various fields of study have contributed to current techniques. Data science is a diverse, interdisciplinary field. Here is a Venn diagram to illustrate how it brings math, statistics, computer science, and domain knowledge together:

Adapted from and with credit to “Data science concepts you need to know! Part 1,” by Michael Barber, on Medium.com.

Covered topics

Here are the topic areas we cover in this article, starting at the core and working outward, in this conceptual view:

Here are quick links to each section:

Statistical concepts and techniques

Data scientists must be familiar with statistics as they collect data and information and use it to investigate problems, analyze and forecast trends, conduct significance testing, design experiments, and inform business decision-making. Here are some good resources for building a foundation in statistics:

Programming languages (SQL, Python, R, and Kusto)

As a data scientist, you can choose from a number of programming languages that are useful in your work. SQL is very important to learn because it allows you to query the data in a structured database. Python is particularly popular among data scientists today due to its wide range of uses across domains, such as data collection and cleaning, data visualization, Machine Learning, and Deep Learning. R is another popular language ideal for data science, big data, and Machine Learning. Kusto is a query language for Azure Data Explorer and related services that has a simplified syntax. Here are some training materials to learn these programming languages and their applications to data science:

SQL

Python

R

Kusto

Data analytics and forecasting

Data analytics and forecasting are fundamental tools for data scientists. As technology generates vast and growing amounts of data, analytics and forecasting are core steps to explore business opportunities, identify key trends, and find insights to enable data-driven decision making.

Machine Learning and Deep Learning

Machine Learning includes algorithms that parse data, learn from that data, and then apply what they’ve learned to make informed decisions. Deep Learning is considered an evolution of Machine Learning. It uses a programmable neural network that enables machines to make accurate decisions without help from humans. The following are helpful resources to grow skills in Machine Learning and Deep Learning.

Machine Learning foundations

  • Course [Machine Learning by Stanford University Coursera](https://www.coursera.org/learn/machine-learning): Andrew Ng’s popular course that introduces supervised and unsupervised learning, as well as Machine Learning best practices. (Offered to the public through Coursera and to Stanford students as part of the university’s curriculum.)
  • Course CalTech: Machine Learning (Yaser Abu-Mostafa): This course is introductory but less friendly for novices. It covers most of the same topics as Ng’s course, but more deeply and with a more theoretical approach, and is recommended for students and practicing data scientists.
  • Course Reinforcement Learning: Set of four courses covering various fundamental concepts and includes a hands-on project.
  • Course LinkedIn: Machine Learning and AI Foundations
  • Course LinkedIn: Become a Machine Learning Specialist
  • Book Deep Learning with Python
  • Course [Improving your Model Performance — ML Strategy (1) Coursera](https://www.coursera.org/lecture/machine-learning-projects/improving-your-model-performance-4IPD6)
  • Course [What is Predictive Model Performance Evaluation by divya singh Medium](/@divyacyclitics15/what-is-predictive-model-performance-evaluation-8ef117ae0e40)

Deep Learning

ML Ops, data engineering and more

Experimentation and causal inference

Experimentation and causal inference are designed to identify causal relationships among variables. Given the importance of understanding causal drivers to ensure the right data-driven decisions, these techniques have been gaining increased adoption among data scientists in the industry. These resources provide great learning opportunities on these topics:

Experimentation

Causal inference

Data visualization and communication

Data science organizations often partner with stakeholder teams throughout an organization. Communicating data science deliverables is an important step in maximizing their impact, whether through presentations, data visualizations, or written communications, and whether presented to a business or technical audience. Here are some resources to help with this:

Data visualization

Here is a range of books, courses and papers on data visualization techniques and approaches that you can incorporate into your work. Also see the data visualization articles on the Data Science at Microsoft online publication.

Communication and public speaking

Below are some resources for presentation training and scientific writing, as well as an organization you can join for further practice.

  • Course [Presentation Skills LinkedIn Learning](https://www.linkedin.com/learning/paths/develop-your-presentation-skills?u=3322)
  • Course [Public Speaking LinkedIn Learning](https://www.linkedin.com/learning/topics/public-speaking?u=3322)
  • Course [Scientific Writing Coursera](https://www.coursera.org/learn/sciwrite)
  • Community Toastmasters

Communities, podcasts, datasets, and events

As you continue learn, here are some great spaces where you can exchange ideas with others and hear from their experiences regarding data science in practice. We’ve included opportunities to engage in online communities, participate in hands-on events, leverage publicly available datasets, listen to data science podcasts, and attend relevant conferences. We also recommend GitHub and Jupyter Notebooks as great ways to share your work and collaborate with others.

Communities

Countless data science meetups and communities exist. Here are a few where you can engage with other data scientists on relevant topics:

Podcasts

For those who prefer learning via audio, the following podcasts are great options:

Hands-on events

These can be a great place to learn about new tools, hone your skills, and uncover best practices in the data science domain.

  • Kaggle Competitions: Kaggle allow users to work with other data scientists and Machine Learning engineers to enter competitions to solve data science challenges.
  • Women in Data Science Datathons: A global event that encourages more women to enter the field of data science.

Datasets

The best way to learn data science is to practice with different projects. You can search and download free datasets online using the following resources.

  • Kaggle datasets: Kaggle has one of the largest dataset libraries online. The data is free and you can also upload your own datasets there.
  • KDNuggets datasets: KDnuggets maintains a good collection of datasets that are free and can be used for learning data science.
  • Data is Plural: A weekly newsletter of useful and interesting datasets.
  • TidyTuesday: A weekly data project aimed at the R ecosystem.

Conferences

Conferences can be a great way to learn from others’ experiences, get exposure to new ideas, and gain additional perspective. Here are some to explore:

  • NeurIPS: The purpose of the Neural Information Processing Systems annual meeting is to foster the exchange of research on neural information processing systems in their biological, technological, mathematical, and theoretical aspects.
  • SIGKDD: The main professional association for data mining and knowledge discovery.
  • ICML: The International Conference on Machine Learning is the leading international academic conference in this subject area.
  • CVPR: CVPR is the premier annual computer vision event comprising the main conference and several co-located workshops and short courses.
  • ACL: The Association for Computational Linguistics (ACL) is the international scientific and professional society for people working on problems involving natural language and computation.
  • SIGIR: The annual SIGIR conference is the major international forum for the presentation of new research results and the demonstration of new systems and techniques in the broad field of information retrieval (IR).
  • MLSys: The Conference on Machine Learning and Systems targets research at the intersection of systems and Machine Learning.

Original article:

https://medium.com/data-science-at-microsoft/data-science-learning-resources-193ccf6fafb