Replicating science: $28 billion is wasted every year in the US alone
Why? Because studies can’t be replicated.
If a study in science can’t be replicated, is it still viable?
Scientific studies are notoriously difficult to reproduce. A study by Plos Biology showed that about $28 billion every year is spent on research that can’t be replicated. And that’s in the United States alone. Extrapolate those figures across the world and we have a real problem.
In recent years, as computing power has increased, cloud software has been widely adopted and data sets have grown, it has become more and more evident that scientists are unable to generate the same results, even while using the same data sets. If studies cannot be replicated and the same conclusions reached, it undermines the credibility of scientists, and of science itself.
The repercussions of this could be serious. If the results can’t be trusted, then the very nature of science and the scientific process becomes questionable. At a time when new technologies like machine learning and artificial intelligence are emerging, it encourages people to question the value of these powerful and potentially life-changing technologies and can instill a level of mistrust. In this article, we’ll look at why this issue is so wide-spread and how we can address it.
The problem will continue to worsen without action
This ‘replication crisis’ is not a new problem. In fact, it has been a pervasive issue in the social sciences for decades. An article by Jerry Alder, ‘The reformation: can social scientists save themselves?’, published in Pacific Standard, covered the topic in great detail. But this pattern of irreproducible studies is by no means constrained to the world of social sciences; it is also a major issue in the pharmaceutical industry.
In 2005, John Ioannidis, a professor of health research and policy at Stanford University, wrote a paper that first brought this problem to the attention of the scientific community. Published in the journal PLoS Medicine, his paper, ‘Why most published findings are false’, shined a spotlight on methodologies, biases and study design failings. He concluded that “Simulations show that for most study designs and settings, it is more likely for a research claim to be false than true.”
His paper had a powerful impact, encouraging companies to take another look at their work. In 2011, pharma giant Bayer found that only a quarter of studies were reproducible. That same year, Glenn Begley, who was head of the oncology division at biopharmaceutical company Amgen at the time, set out to reproduce the results of 53 papers published in the previous decade that formed the foundation of oncology. Even using datasets identical to the originals, Begley could only replicate the results of six of them.
Irreproducibility cannot be ignored
Replication of work is the cornerstone of the scientific process – the finding needs to be a pattern for it to be confirmed.
A single result could be a mistake, or a fluke. Receive the same result under identical conditions a second time and it could still be passed off as a coincidence, or maybe biased. But a third time and we’re in business.
This principle is so ingrained in science that it is part of the laboratory guidelines, taught to all budding scientists: it takes at least three consecutive batches to validate in pharmaceuticals. This number of batches depends on the level of risk involved in manufacturing. If little is known of the process, it stands to reason that more statistical data is needed to prove the process is consistent enough to meet quality requirements.
Scientists can’t gain insight from a single data point, and two points simply draw a straight line. You need a minimum of three
batches to validate, and three is generally the number labs stick to. And why not more? Although regulatory bodies, such as the Food and Drug Administration (FDA) in the US do not specify a maximum number of batches to validate, running batches is expensive and time-consuming, so most labs follow the guidelines.
Methods are there for a reason
How has this become such a wide-spread issue? As is often the case, it is due to a variety of reasons – poor methods, convoluted protocols, and sometimes even misconduct.
Increasingly, researchers begin their studies without a proper hypothesis, and could end up grasping at straws to find ‘meaningful correlations’ in the data. Often, there is a reasonably good chance that a valid p-value can be found, as the larger the data set, the more likely that a small pattern within the set will appear significant, instead of a random event.
In his paper, Ioannidis says he is concerned that researchers try to find patterns in the data, using machine learning to find a hypothesis, instead of starting out with one. The result is an approach that requires little to no validation.
There could be several factors that contribute to this, including publication bias, errors in experiments, not using statistical methods correctly, and inadequate machine learning techniques. But these all have one thing in common: scientists are spotting patterns in the data that do not match the real world.
The pressure to produce useful studies is still on
Today, we have numerous tools to help us collect and analyze enormous amounts of data. We have the opportunity to get it right from the get-go, the freedom to decide how we collect data, organize it, and how we analyze and interpret it.
With the ability to collect and access reams of data comes an increased need for proper methodologies. The challenge that remains is to design a method that fits a hypothesis and test it with the data gathered or use the appropriate statistical methods when the number of hypotheses are significantly large.
Take the Bayer study. Although they were unable to replicate more than 25% of their studies, fortunately, they found that those results they were able to reproduce proved they were robust – an excellent marker that the study had clinical potential.
It has been suggested that scientists can use data mining techniques to find those studies that are most likely reproducible. But to do so requires a data set to mine. More replication
studies must be conducted to build a databaseand streamline the process in the future.
But for now, scientists must continue trying to replicate existing studies – testing them for their reproducibility and robustness.
Or, they can plan a study the right way up, using software to build methods, catch deviations before it’s too late, and collect data with context so it can be accessed and interpreted with ease. Most importantly, the right scientific informatics platform can validate results, ensuring studies can be replicated, but don’t have to be re-done as a result of mistakes.
More Info Sheets