Research scientists have a unique position within a company. They bridge the gap between pure research and application and explainability of data science solutions. They require a broad understanding of existing research and techniques and write algorithms and design prototypes to test hypotheses. At the time of writing, I will have been a research scientist for two years. On this anniversary, I thought it would be useful to share six lessons I have learned that have helped make my work more productive and rewarding. This list is written from the perspective and with the primary benefits to research scientists. But these ideas can be taken on by anyone, particularly other data science-related career paths.
Table of contents
Share Ideas and Help Others in Your Team
I work within a Science team; we primarily focus on developing research and testing hypotheses to inform algorithm-driven solutions for fault localisation on information technology networks. The team is self-contained and to a significant degree, operates independently of the typical engineering work-cycles and product delivery timelines. When a research idea is explored, and the resulting work can facilitate a business solution, the team becomes more porous. Information flows more regularly between Science and Engineering to build the pipeline from research idea to prototype to minimum viable product to product. In general, science teams tend to be isolated, so it is beneficial to ensure ideas flow smoothly and regularly within the group. You can do this first by setting up a Slack channel for your team. If one already exists but is not very active, start sharing interesting articles or papers to start conversations. Set up weekly informal meetings with your team where you update each other on your work. These meetings will give you a relaxed environment to share ideas and tackle problems you find. You will also help your colleagues with coding problems, steps to take in research or just by asking questions, all of which is very satisfying and helps build momentum within the team. Research can be a lonely endeavour, and as a scientist, you should do all within your means to make research within your team a collaborative effort. As a research scientist, in a post-2020 world, you are likely spending all if not part of your time working remotely, making it necessary yet increasingly challenging to team-build and boost morale. For the best tips on how to make remote working life easier on you, go to my blog post titled “7 Best Tips For Remote Working For Data Scientists“
Learn Engineering/Applied Skills Outside of Your Wheelhouse
Research scientists often rely heavily on the data science toolkit of Python, R and all related libraries. For the best tutorials on Python and R, go to the Online Courses section. These are more than enough for writing algorithms and developing machine learning research. However, if there is a need to build scalable applications based on your research, it can be useful to add a few more tools to your kit. Python is the lingua franca of data science and machine learning. However, Java is still the go-to language for building large-scale applications and is widely used by organizations worldwide. If you’re prioritising Python for research and development, you should learn how to prototype your research so that it can be tested in a demo environment, similar to what is expected in web applications. You can use Spring Boot to build, quick and easy to deploy applications. Cloud-native solutions are becoming the norm, so you should also learn about containerisation and CI/CD. Although these concepts lie more in the realm of software engineering and applied science, having a solid idea of how to piece together a pipeline from a local machine to a cloud instance will enable you to be a “good neighbour” to the engineering team and ensure the prototype you deliver plugs in with the typical ecosystem for cloud-based technology. Here is an idea of roughly how a pipeline would look and the relevant technology for each step:
- Build a containerised Java Application (Quarkus).
- Create a Docker Image with the relevant environmental variables and libraries. (Docker).
- Create a chart archive and push to a remote repository (Docker, Helm, Gradle).
- Construct a cloud instance (Jenkins, Kubernetes).
- Install the chart archive on the cloud instance and run the application (Helm, Kubernetes).
If your company is cloud-native or transitioning to being cloud-native, they are highly likely to use the aforementioned technologies. Get up to speed with them by going through the relevant tutorials.
Structure Your Year Around Important Conferences
Research scientists are focused on producing new research and publishing their findings. Publications are a crucial part of the scientific endeavour. They can provide further understanding within a given field and allow ideas to be shared within the community on specific problems. Machine learning conferences typically come in two flavours: industrial application and theoretical. Industrial application conferences publish work exploring the application of machine learning techniques to industry problems. The innovation can come from machine learning, providing a novel way to solve a problem or augment an existing solution. A paper for an industry application conference will contain some machine learning theory, but only as background. On the other hand, a machine learning theory conference will publish work focused on developments in machine learning techniques, with more appreciation of the mathematics and statistical theory behind the development. Furthermore, the technique’s practical application may be highlighted in an analysis or further work section, but will not be the publication’s crux. Depending on how focused you are on theory versus application, for a given calendar year, choose one or two conferences that align with your research area that is accepting shorter (six-page) papers for a short-to-medium term goal, and a larger conference receiving expansive (eight or more page) papers. By doing this, you will frame your work and have clear, motivating targets to aim for. Granted, it is not guaranteed that your work will bear fruit for specific conference dates, but it is a good idea to know the key machine learning conferences. Aim to try and submit to a conference at least once a year.
Here are some examples of international, renowned machine learning conferences:
- International Conference on Machine Learning
- Neural Information Processing Systems (NIPS)
- International Joint Conference on Neural Networks
Depending on your research domain, you will have a specific list of conferences beyond the large, international ones. Use IEEE’s search engine to find those most relevant to you.
Prioritize Clear Presentation of Your Work
Following on from the publication tip, to make your life easier when you come to the point of structuring your analysis and results to write a paper, make sure you document every step of your research. Although we tend to be in front of computers as research scientists, we should treat our desktop or laptop as a laboratory. Every experiment we do should be logged, and each exact research question or answer that arises should be documented. This will give your work a strong narrative, allowing you to trace back to critical points in your research. Not only is this vital for writing papers but for relaying information to your colleagues and supervisors. One way of making a “lab report book” is to present your recent work – either to your peers or supervisor, or both – at least once a week with presentation slides. The slides will force you to document your findings, whilst adding chronology to your work. I found that when I was fully involved in a piece of work, I knew every aspect of the code and results, but after moving on to a new piece of work, those details quickly left my foremost memory. Presentation slides will help jog your memory and allow you to go back and continue where you left off if needed. Documentation also includes your code repositories. While it is the bane of every software developers’ existence, it will reduce the amount of pain you have trying to understand what a particular variable or few lines of logic do months after writing it. A good practice is to imagine you have the memory of a goldfish and you need to add a comment to variable definitions and code blocks as you go. You should have an ongoing document in the broader sense, which describes what your software does as you develop it. Research scientists often will have the crucial task of “knowledge transfer” to perform, which involves handing over an extensive body of work – a combination of theory and software – to an individual or team and bring them up to speed on how to use it. Documenting the work as you go will make the transfer task easier for you and improve your understanding of your research’s intricacies.
Read Broadly Around Your Research Area
Research scientists may be asked to conduct a symposium on a given research area. To do so requires a broad and comprehensive literature survey and the ability to condense the prominent developments in the domain. Make a habit of reading around your research area and making a note of important papers. You can use Mendeley to keep a running tally of your reading. You can use Papers With Code to find relevant papers and order them by most recent, trending or most impactful (in terms of Github stars). You can use ResearchGate to find citation chains for relevant papers. Citations will give you an insight into the current developments and allow you to trace back to earlier, influential papers. When you start your reading, store the papers in a spreadsheet together with a few other columns that ask questions to help you organise the papers. These questions could include “Is this the current state-of-the-art”, “Is the method described the only way to solve this problem?”, “Is the focus on theory or practical application?” Tagging papers like this will allow you to stratify your readings and allow you to discuss them smoothly. Outside of symposium requests, you should still keep up to date with the most recent developments in your domain and find at least one other related area in machine learning that interests you to do this for. One of my research areas is natural language processing, but I read around computer vision and topological data analysis.
Embrace the Uncertainty of Research
When you start an experiment, there are rarely clear cut answers that can be plucked out of the ether. Often you will start with one premise, which slowly evolves as you ask more questions. Eventually, you will have a series of questions to which you’ve found an answer, which will form your research narrative. However, there are no guarantees that your journey of answering questions will result in a positive conclusion, nor that the answers you get will match your assumptions. Aim to embrace this uncertainty and be malleable in your approach to research. Practically, you can do this by creating “mini-crossroads” as you progress. Have one or two questions that you can answer over a two week to a month period, which will result in either progression or pivot when answered. These mini-crossroads will help guide the next steps and allow you to determine whether or not you have reached a dead-end and need to pivot. Arriving at that crossroad and deciding whether to press on or choose another path is something that takes practice and objectivity. We can often become tied to a particular method, model or dataset and want to get the right results out of it. It is useful to make the decision with your supervisor and share with others in your team for an objective view. The more flexible you are with research, the broader your analysis and the richer your narrative will be. Bear in mind that an experiment’s course can span several months to a year depending on priority and other demands, so typically for research scientists, it is more productive to take the slower steadier approach, than try to whizz through a rigid analysis.
Concluding Remarks
Thank you for reading to the end of this post. Hopefully, these tips can help make your day-to-day as a research scientist easier and more rewarding. Are you a research scientist? What experiences have led to you progressing and learning more in your field? Please share them in the comments below. If you are interested in becoming a research scientist, and want to learn more, go through my post titled “Key Differences Between Data Scientist, Research Scientist, and Machine Learning Engineer Roles“. If you are embarking on your research scientist journey, go through my post titled “9 Best Tips For Early Career Research Focused Data Scientists” to get a head start in your career. Please share this post with anyone you think will benefit from it, and I will see you in the next post!