Embarking on a career in data science is an exciting challenge that requires a lot of initiative and a desire to learn and apply knowledge quickly. For research scientists, there is an emphasis on experimentation and scientific discovery. The methods and objectives for research scientists are more nuanced than those of data scientists more focused on the business solution. In this post, I aim to shed light on vital practices and habits for efficient and productive research. However, these tips will also help any data professional that requires scientific rigour in their work.
Table of contents
- Summary
- Formulate a Robust Research Hypothesis
- Write A Literature Review For Every Research Question
- Present Your Work Outside of Your Expertise Level
- Use Illustrations to Describe Complex Ideas
- Avoid Tunnel Vision When Doing Analysis
- Bounce Ideas Off Your Colleagues
- Develop a Culture of Research
- Ask For Help When You Need It
- Be Assured In Your Work
- Concluding Remarks
Summary
- Formulate a robust research hypothesis.
- Write a literature review for your research hypothesis.
- Present your work often and treat your presentations like lab book entries.
- Use diagrams and graphs to explain ideas; you can never have too many illustrations.
- Always refer back to the “why” of your experiment, frame analysis around your hypothesis, and stay on topic.
- Find others to bounces ideas off. Vocalize your thoughts often.
- Develop a culture of research, read relevant papers.
- Do not be afraid to say I do not know, but pre-empt possible questions about your work.
- When conveying your work, be confident and assured.
Formulate a Robust Research Hypothesis
Research scientists’ day-to-day will involve the formulation of and working through experiments to achieve some scientific discovery. A scientific experiment starts with a research hypothesis, which is simply a statement that introduces a research question and proposes an expected result.
The hypothesis must be testable, which boils down to three criteria that must be met:
- The hypothesis can be proven true.
- The hypothesis can be proven false.
- The results of the hypothesis must be reproducible.
Ensure that you describe the problem that you are trying to solve. You can write the hypothesis as an if-then-else statement, e.g., “If I take this action, I expect this outcome.” In every scientific experiment, there is a relationship between an independent variable and a dependent variable. The analysis involves the changing of the independent variable and assessing the impact of the variation on the dependent variable. By clearly defining these variables, your hypothesis will be compelling and founded on testable outcomes.
An example of a testable hypothesis is:
- Athletes who attend training perform better at races than those athletes who skip training. This is testable because you can compare race times for athletes who do and do not skip training. These results can be reproduced by another researcher.
An example of a untestable hypothesis is:
- It does not matter whether or not you attend training as an athlete. This hypothesis cannot be tested because there no outcome has been proposed for skipping training. “It does not matter” is not a quantifiable or testable statement.
Before you start your experiments ask these questions to identify weaknesses in your hypothesis:
- Is the hypothesis clear?
- Does the hypothesis introduce the research topic?
- Are the independent and dependent variables identified clearly?
- Is the hypothesis testable?
- Is there a clear “If-then” statement that can be made from the hypothesis?
Write A Literature Review For Every Research Question
The famous saying from the physicist Isaac Newton goes, “If I have seen further, it is by standing upon the shoulders of giants.” To progress research, you need to have a deep understanding of the previous discoveries in your field. A literature review is a survey of scholarly articles about a particular issue, area of research, or theory, to summarise and evaluate these works in relation to the research problem under investigation.
By doing a useful literature review, you will be able to trace the intellectual progression of the field and identify where gaps exist in how a problem has been researched to date. With a literature review, you are aiming to contextualize each contribution to the research problem raised in your hypothesis and identify new ways to interpret prior research and the relationships of each work.
New research needs to be defensible; a literature review enables you to locate your research within the context of existing literature and build the case for why your work is a valid addition. There are multiple types of literature reviews, but the reviews most pertinent for research-focused data scientists are:
- Methodological review: focuses on the methods of analysis or techniques used to answer a research question.
- Systematic review: consists of an overview of existing evidence for a research question that uses standardized methods to assess relevant research critically.
- Theoretical review: examines the corpus of theory accumulated regarding an issue, concept, theory, or phenomena. This review establishes existing theories, the relationships between them, and to what degree the existing theories have been investigated.
Explore sources that are contrary to your perspective and discuss the strengths and weaknesses of each source. Try to avoid only listing and summarizing one reference after another. Organize the review into a narrative, with sections that present themes and emerging trends.
Present Your Work Outside of Your Expertise Level
The defining skill of a data scientist is telling a story around data. Within your data science team, it likely easier to present your research because there will be a common understanding of jargon and a similar level of technical knowledge. The real challenge is in presenting to non-experts in your company who have no background knowledge on machine learning or statistical data analysis. Being an effective communicator allows your fantastic ideas and research to flow from your head to the entire company, which can lead to innovation and advancement of existing solutions. Practice summarising your research by asking several questions:
- What pain-point am I addressing for the company?
- What am I doing differently to anyone else with the same problem?
- How will the conclusions from my research resolve the pain-point?
- Is the proposed solution an augmentation of the current solution or an entirely new one?
- What further steps can I take to improve on this research?
If you hone in on these questions when making your presentations, you will cut out most of the technical jargon, which is seen as noise for the non-expert. By framing your research around the business solution, it becomes instantly interpretable and enables you to build a compelling data story. Aim to present your work at the company-level every one to three months. You can do this in the form of a “Show and Tell.” Frequently presenting will keep your colleagues engaged with your research, prevent you from feeling isolated, and help keep science a core part of the company’s business strategy.
If you have frequent meetings with your manager, e.g., weekly, you should document your weekly findings in a slideshow. Slideshows are effective for summarising your thoughts and methods, but more importantly, they can be used as a lab book. In each slideshow include aim, method, results, and conclusion sections. If you do not complete an experiment within the time between two meetings, simply use the same aim and method from the previous presentation with updated results. Over time you will build an extensive log of investigations and conclusions. This log can be used as the foundation for a formal publication and as a guide for future experiments. A detailed set of presentations will ensure you do not repeat the same experiments and limit divergent experiments.
Use Illustrations to Describe Complex Ideas
You will be able to convey far more through visualization than by text or lines of code. Scientific concepts are explained better through diagrams and graphs. Using visuals to highlight patterns in data or technically advanced concepts allows non-experts to gain insight and makes your research conclusions actionable. In discussions, diagrams serve to materialize thought processes and remove the chance of misinterpretation. You will often see in research papers diagrams describing models, data flow, and algorithms. When presenting your work, aim to have a visual in the first few minutes of your presentation to captivate your audience and deliver your main points.
There are many tools available for data scientists to construct diagrams and draw graphs, here are some examples:
- Tableau Free and paid options
- Qlikview Free and paid options
- Matplotlib (Python) Free
- Seaborn (Python) Free
- ggplot2 (R) Free
- ggvis (R) Free
- D3 (Javascript) Free
- Plotly (Python, R, Javascript) Free
- DataWrapper (No language) Free and paid options
Get comfortable with one of the free tools for your preferred language. The most common are Matplotlib, Plotly, and ggpplot2. For diagrams, most presentation tools (OpenOffice, Keynote) will have all the shapes necessary. For specialist diagram-building tools, I recommend using LucidChart or Draw.io.
Avoid Tunnel Vision When Doing Analysis
Once an analysis is underway, it is easy to start “tumbling down the rabbit hole” and finding interesting, offshoot experiments. These experiments can add richness and defensibility to your research, but you should remain focused on the primary testable outcomes. Before performing a new experiment, ask yourself if it is adding anything to the narrative of your research, i.e., does it fit in with your research question. Depending on the time scale of your analysis, you may want to limit offshoot experiments to two or three. If the experiment is outside of the scope of the research question, you can save it for when you have answered the research question and include it as an appendix result.
Bounce Ideas Off Your Colleagues
Research can be a lonely endeavor, especially if your colleagues are not in your field of expertise. Scientific discovery thrives off of discussion and peer-review. Sharing your work and your research problems with colleagues will help you accurately formulate your ideas and foster confidence in your work. Being always asked questions about your research forces you to look at your work from a different perspective and perhaps highlight something you have missed. Research discussion tests the validity of your experiments, making them more defensible. By bouncing ideas off of your colleagues you will stay in the practice of telling a coherent story.
If you do not have other data scientists in your team, find other data professionals such as data analysts or engineers. You can also use online forums to discuss ideas for example Kaggle and Reddit.
Develop a Culture of Research
Your reading and understanding of research should go beyond obligatory literature reviews. Aim to be in a constant state of learning and develop a culture of research. By doing so, you will stay up to date on the state-of-the-art in your field and also will be more fluent in relevant techniques. Try to read at least one paper in your field a week. You can use Papers With Code to find state-of-the-art and trending papers. Once you have the routine of one paper a week, aim to add a second paper reading in another field of interest. Make notes on every paper you read so that you can express the main points of the paper intuitively. You can find a comprehensive method for reading research papers here. If you are in a team of data scientists, make a habit of holding meetings where you take turns presenting and discussing papers.
Ask For Help When You Need It
Data scientists are in the position of “expert” for data handling tasks and for discussing business solutions. Even with a broad knowledge base, no single person knows everything there is to know about a problem. The most confident and productive data scientists actively seek help and work alongside colleagues who have complementary skills. Before starting on your career, you should have some idea of your strengths and weaknesses and recognize those in your company that can help you. You can find my in-depth discussion on the different types of data professionals and how they complement each other titled “Key Differences Between Data Scientist, Research Scientist, and Machine Learning Engineer Roles.”
When discussing research, you may be asked a difficult question. Know that it is perfectly fine to say: “I do not know,” but you should always follow that up with “I will find out by doing x investigation.” By staying proactive in this situation, you are taking the path of discussion and solution building rather than an interrogation. Before you have an important meeting where you are sharing results, pre-empt questions that might arise, and attempt to answer them.
Be Assured In Your Work
You may be required to provide insight into how a product feature could be improved, implement a strategy, or deploy a new algorithm. Having confidence when involved in such discussions will assure your team that your proposed steps are the correct ones to take. Confidence can be tricky to build, so to take practical steps to improve, create meetings with others in your team, e.g., a stand-up where you discuss your day-to-day tasks. You will stay in the practice of presenting action items and also defending them. Always take steps to cross-check and validate your results before sharing them.
Concluding Remarks
Research scientists have a nuanced perspective within a company. They often serve as the source of innovation for product development but can also be entirely shielded from the business solution. Their exposure to product development can also vary depending on the research question explored. To maintain the flow of ideas and the integration of research with the business narrative, scientists need to be effective communicators and strategists. They must build sound, testable experiments, and, at the same time, map the discoveries to potential use cases. It can be tempting to isolate yourself within your research and only poke your head up when asked about a specific pesky algorithm. Discussing your ideas on different levels of expertise will make your research defensible and communicable. In these final remarks, I want to emphasize building a culture of research. As a research-focused data scientist, you make science integral to the company – removing the reliance on gimmicks and buzzwords – and continue the passion for scientific discovery.
Thank you for reading! I hope this post was helpful to you. If you are not yet in research-focused data science but would like to be, see my post “7 Best Tips to Get A Data Science Job From Scratch” to get you on the right path. For more advanced career advice for research scientists see my post titled “Six Essential Tips After Two Years as a Research Scientist.” Be sure to share this post and put your thoughts down in the comments below. See you in the next post!
Suf is a senior advisor in data science with deep expertise in Natural Language Processing, Complex Networks, and Anomaly Detection. Formerly a postdoctoral research fellow, he applied advanced physics techniques to tackle real-world, data-heavy industry challenges. Before that, he was a particle physicist at the ATLAS Experiment of the Large Hadron Collider. Now, he’s focused on bringing more fun and curiosity to the world of science and research online.