IDBS Blognegative results are important to science

IDBS Blog | 28th March 2024

The importance of recording zero in the digital age of science

negative results are important to science

By Nathalie Batoux, Product Manager (Discovery and Innovation) Digital Products Development, IDBS

The discovery or invention of the number Zero centuries ago had a profound impact on humanity and continues to play a crucial role in today’s digital age. In addition to its role in binary encoding, Zero is indispensable across various scientific disciplines and everyday life. It is essential to recognize the importance of recording negative or failed results alongside positive ones for a comprehensive understanding of the scientific process. This article explores the value of negative results and how proper data management of all results enables your digital maturity journey towards machine learning (ML) and scientific advancement.

The importance of comprehensive reporting

Traditionally, scientific publications have focused on successful experiments, rarely delving into the details of what did not work and the possible reasons why. However, recording negative results and the specific parameters involved is vital for effective knowledge sharing and future research endeavors. While positive results facilitate replication, detailed records of failures enable valuable learning experiences for both individuals and ML systems.

Scientists’ publication bias

Scientists often publish their work with a positive spin, highlighting what has worked and downplaying what has not. This bias is evident in external publications such as patents, but it is also present in internal reports within organizations. The focus of these publications is mainly on explaining how to make something work or highlighting the efficiency of a process. Negative results and the parameters or conditions that led to them are rarely shared in detail. This trend may stem from a fear of showcasing failures or an emphasis on presenting only successful outcomes.

The value of negative results

While reporting positive results is essential for reproducing experiments, it is not as valuable for learning and ML applications. Scientists and research teams learn from their failures and negative outcomes, but this knowledge remains confined to individuals if not thoroughly recorded and made available. When a new researcher attempts the same experiment without access to previous failures, it often takes longer and more attempts to achieve the desired results. Machine Learning, similar to human learning, benefits from records of negative results, allowing for a more comprehensive understanding of experiments and study designs.

Machine learning and negative results

Machine Learning requires datasets that include negative results to be effective in providing insights, predictions and study design assistance. Training ML models solely on positive outcomes limits their ability to handle real-world scenarios where failures and negative outcomes are common. To build robust ML algorithms, it is necessary to include negative results and associated parameters in training datasets. This enables the model to learn from unsuccessful attempts and make more accurate predictions in practical applications.

Transitioning to comprehensive digital recording

Thoroughly recording negative results and associated parameters in digital systems offers several advantages. Firstly, it enables the accumulation of knowledge from unsuccessful attempts, allowing researchers to learn from past mistakes and avoid repeating them. Secondly, comprehensive recording facilitates the creation of robust ML training datasets, leading to more accurate predictions and insights. While retrospectively documenting past experiments comprehensively is impossible, the digital age presents opportunities for meticulous recording moving forward. Digital systems offer purpose-built templates and integrated measuring instruments that facilitate the easy and consistent documentation of critical parameters, and other parameters that may not have been thought critical initially. Automation further enhances the process by allowing simultaneous observation and measurement of multiple parameters. Capturing exceptions and deviations from established procedures is crucial, as seemingly insignificant details can contribute to valuable knowledge when analyzed collectively.

Building resilient datasets for machine learning

Recording experimental data in digital systems creates robust and resilient datasets for ML training. However, ensuring the data follows the FAIR principles (Findable, Accessible, Interoperable and Reusable) is crucial for success. Well-trained ML algorithms accelerate discovery and minimize the repetition of failed or similar experiments, saving valuable time and resources.

In fact, in recent years, drug repurposing has gained traction as a strategy for identifying new uses for existing approved drugs. By revisiting and expanding on previously recorded data, researchers can uncover valuable insights and potential alternative therapeutic applications.

That said, accessing negative results can be challenging, as they are often buried in internal company databases or isolated systems. Overcoming these obstacles by making negative data easily accessible empowers data scientists to focus on more productive tasks such as algorithm creation.

Accelerating drug repurposing

Recording and documenting negative results, failed experiments, and studies are as important as capturing successful outcomes. Both humans and machines derive valuable insights from unsuccessful attempts, contributing to scientific progress and the advancement of ML. Embracing digital systems, promoting accessibility and adhering to comprehensive reporting practices will pave the way for more effective knowledge sharing, faster discoveries, and cost savings in research and development efforts.

The role of data accessibility and findability

Accessibility and findability of data are crucial factors in successful ML training. While positive results are often readily available in publications and reports, finding and accessing reliable negative results is critical for building high-quality training datasets. Negative results are likely to be found in internal company data, provided that all parameters and results have been thoroughly recorded. However, locating and formatting this data can be a time-consuming task for data scientists, diverting their focus from more productive work such as creating ML algorithms.

The impact on time and cost savings

By embracing comprehensive recording practices, the scientific community can achieve time and cost savings in research and development efforts. Thoroughly documented negative results reduce the need for researchers to repeat failed experiments, saving time and resources. ML algorithms trained on datasets that include negative results can provide more accurate predictions, reducing the waste of attempting experiments that have already failed. This not only accelerates scientific progress but also contributes to time and cost savings and reduced waste in drug discovery, materials science, and other research domains.

 

About the author

Nathalie Batoux

Nathalie Batoux, Product Manager (Discovery and Innovation) Digital Products Development, IDBS

 Nathalie left the lab and joined IDBS in 2005, driven by her passion to help scientists with workflows and informatics tools to expedite their research.  

At IDBS, Nathalie is a Product Manager for Discovery and Innovation. In her role, she regularly interacts with customers and researchers to gain an understanding of their data management needs and drives the development of platform capabilities to provide a solution.  

Nathalie is an organic chemist by training and started her career as a post-doctoral researcher in chemistry working in the domain of nucleoside and dinucleotide analogues. 

 

More news