The Urgency to Be F.A.I.R.
Info Sheet
In 2018 alone, thousands of petabytes of data were collected. But all too often, scientific data is not used past its intended purpose. In fact, PLOS conducted a study indicating only about 20% of published papers post their supporting data into a scientific repository of information. This makes it a challenge for scientists to access and use data. Instead, it just sits in drives or computers, gathering digital dust.
When you think of the effort it takes to generate and curate data, the fact that it is barely used, then filed away improperly, hurts. Why don’t scientists use data to its full potential? In this article, we will discuss the blockers to data use and re-use and explain why the R&D industry urgently needs to remove them.
Don’t underestimate the role of data
The R&D sector is built on data; it is a foremost producer and consumer of data and relies on its analysis to develop innovative products and services. But when it comes to sharing that data, there are certain requirements to consider: is it easily findable using common search tools? Can other researchers access it to examine the data and associated metadata easily? And are they able to use the data in multiple ways, so that they can compare, analyze, and integrate data using commonly used terms and formats? Is it re-useable? In a word, is it F.A.I.R. (findable, accessible, interoperable, re-useable)? Barend Mons, professor at Leiden University Medical Center in the Netherlands, co-lead of GO FAIR and the International Science Council’s committee on data, puts it best: “It is irresponsible to support research but not data stewardship.”
Data is our most valued asset; each data point forms a piece of the puzzle needed to innovate. Think of all the exciting and potentially breakthrough findings passing by undiscovered, just because data has not been treated in the correct way. Let’s take the pharmaceutical industry as an example. Discovering a drug is difficult as it is challenging to predict its properties, such as the efficacy and toxicity in the human body. To be able to maximize the chance of drug discovery and re-purpose molecules to new treatments, access to all available data is crucial.
Also, reusing data has become more important in developing predictive models and learning from experience and mistakes. This is particularly relevant at a time when no single organization can offer the value needed in this digital and connected world.
It’s difficult for researchers to properly share their data
New ways of communicating with digital tools has broken down traditional boundaries, giving rise to more partnerships, and companies should be able to share information to make informed decisions. That is where the challenge lies: organizations cannot find data, access it easily or interoperate it. So, data cannot be re-used.
A 2018 report by the European Commission estimated that issues with data reuse cost the EU about €10 billion a year in the academic sector alone, and another €16 billion in lost opportunities to innovate. When you include the cost of the reproducibility problem, these costs rise significantly.
According to PLOS, even when researchers say their data is ‘fully available on request,’ others looking to use that data often came up against a wall, unable to access the data sets.
In fact, only one in ten data sets were accessible and reliable for use, even when requested from the author directly. The study found this could be either because the researcher could not be reached, does not want to share their data, or the data was simply lost or unavailable. What we need is for data sharing policies to ensure data is not only available but also reliable and accessible long-term.
If you’re not F.A.I.R., you’re losing money
Putting data in a repository supports the F.A.I.R. principles. Repositories give clear and persistent identifiers, expert collection and curation, proper landing pages, and any support needed for the citations of data. This all offers clear and consistent information that is easy to find, access, use in multiple ways – whether that is by a researcher or a machine – and reuse. With this level of detail and information, it is also much easier to make connections with other related studies and draw informed conclusions.
Mons also stresses that if data is collected, curated, and treated properly, researchers will have a lot more time to conduct more research. The way things are done now, PhD students spend 80% of their time fixing formatting issues and minor errors so that data will be suitable for analysis. That is a waste of both time and talent. In monetary terms, 400 such students would be equivalent to 200 full-time employees.
This issue reaches beyond research and into industry, encouraging organizations to form partnerships and break down silos to deliver the value required. So, what happens when an organization’s data is not F.A.I.R? Currently, many pharma organizations’ internal databases are stuck in silos and less F.A.I.R. than data in the public sector. A reason for this is because historically (and until just recently), data was generated for a specific project or study, without considering the possibility of re-using the data. Given that predictive modelling and learning from past experiments is dependent on reusing data, this scenario is less than ideal and needs to change – urgently.
This is where the F.A.I.R. guiding principles will have the most impact; at the start of the data journey, when data is collected and stored. Not only will F.A.I.R. data impact the data being shared outside of the company, but it will also help prevent internal data siloes, facilitating the lives of scientists as well. The F.A.I.R.er the data, the better quality they will be, and the better results will be generated.
Adopting F.A.I.R. won’t be easy, but remains critical
Industry has responded quickly and begun adoption. Large companies including Janssen, Bayer, Novartis and Roche have embarked on F.A.I.R. projects, striving for good data management.
Now, because F.A.I.R. literature is full of technicalities on standards, metadata, and data management best practice guidelines, you may be thinking, “this is an IT problem.” But no, going F.A.I.R. has an impact on everyone, from a scientist working at the bench, to the end-consumer. This is a big part of the digital transformation that companies are implementing to stay competitive.
Everyone will agree that future leaders of all industries will be the first to leverage the power of data science, AI (Artificial Intelligence), and Machine learning. But are you ready? Is your data correctly captured, contextualized, and curated? In other words, is it machine ready?
For example, the Journal of Biomedical Semantics published a study presenting a machine learning model using Euretos AI Platform that could predict if a particular drug will be effective in the treatment of a specific disease. Access to public data resources enables this model to predict with 78% accuracy, which is 12% more accurate than previous ‘state-of-the-art’ models. Your next blockbuster may already be waiting in your drawer, if only you were organized enough to see it!
Making data F.A.I.R. presents the industry – and, in extension, the world – a tremendous opportunity. What are you waiting for? The sooner you start, the closer you will be to more life-changing innovations.
Not sure where to start? We can help.
More Info Sheets