Experts convened at the National Academies of Sciences, Engineering and Medicine to examine how the uptake of real-world evidence will impact medical product development, recommending producers and consumers of evidence focus on maintaining transparency, accountability and reproducibility of results.
Attendees agreed that, above all, transparency was key. The reasoning behind all design decisions should be clear and available for anyone to look at, they said.
“Are other people able to access this data source and reproduce the work? Or do people have free access to all the transformations that happened between the source dataset and the analytic dataset?” asked workshop series co-chair Greg Simon, a research professor at the University of Washington and head of the Mental Health Research Network.
“I’d make a strong case that for credibility, that’s absolutely foundational,” Simon said, adding that without a study’s code and algorithms being made available for public review, confidence in the scientific findings could falter.
Although, simply posting the algorithms online isn’t always helpful; not everyone has the ability to analyze raw, complex computer code. After trials are thoroughly described — including any adjustments for aspects of the population, or following updates in electronic health records or other software, for example — reproducing the results can become much more streamlined.
Sponsors should incorporate an audit trail with version control, and set up firewalls between those doing analysis and researchers, the workshop recommended. Studies using more than one data source should report point estimates and confidence intervals individually, to illustrate variability in results.
Real-time monitoring should be segmented as well. “You don’t want to mix data streams. You want to see discontinuities,” Simon said. “You want to continuously monitor the quality of data streams so you see when something breaks.”
When it comes to validation, the industry needs to move away from comparing the results of a real-world study using observational treatment assignment to a randomized, controlled clinical trial, and showing they came up with the same result, Simon said.
“We’re deciding if the new method worked based on the answer it got — we need to have a way of deciding that upfront,” he added, citing the example used by former FDA Commissioner Robert Califf of drawing a bullseye around a hole in the side of barn, after the hole is already there.
In addition, new data-gathering and analysis tools should be validated not only against available tools, but against what’s happening in the real world.
“We have to do both,” said Califf, now vice chancellor for health data science at Duke University. “If we don’t ground it to the old tool, you can’t win the argument in the regulatory space or in the clinical review space, for example.”
“Sometimes our gold standard may not be so gold,” said Jesse Berlin, vice president and global head of epidemiology at Johnson & Johnson, who suggested the development of an open library of real-world data definitions and validated algorithms where all stakeholders could contribute. Common tools could bolster transparency, quality and efficiency in evidence generation.
“It should be illegal to write a custom piece of code to do a study,” said David Madigan, dean of the faculty of arts and sciences and professor of statistics at Columbia University. “It’s really crazy in this day and age. We should be using validated tools, not building things from scratch.”
“Building things from scratch is riddled with the potential for errors and is just intrinsically non-reproducible,” Madigan said, citing a published paper that said it “adjusted for age.”
“Okay, so you try to reproduce that, and you go into the database and there’s somebody in there with an age of -3,” he said. “So what did they do with that? …The level of irreproducibility that we’re living with right now is unacceptable. The custom crafting of code is one of the root causes.”
However, as the field of pharmacoepidemiology has evolved in the past few years, tools such as the FDA’s Sentinel Initiative have grown with it, said Richard Platt, professor and chair of the department of population medicine at Harvard Pilgrim Health Care Institute.
Sentinel can now fully reproduce previous, large-scale, real-world data studies that relied on custom-built code, if only because nothing else was available at the time, he said.
“I think at the end of the day, it’s not only better, but substantially cheaper,” said Platt, who also serves as Sentinel’s principal investigator. “We’re now in a position where we can do better, clearly reproducible studies, using tools that can be as extensively vetted much faster and at a much lower cost than the old-fashioned way.”
To lower costs further, researchers and sponsors should decide where randomization and blinding are absolutely necessary, and examine the trade-offs when enrolling tens of thousands of patients, they said. Randomizing treatment won’t lead researchers away from the right answer, “but the problem is it’s often crazy expensive, it takes a hugely long time, it’s a giant hassle, and patients and providers don’t like it — but besides that, what’s not to like?” said Simon.
While blinding of outcome assessments and analysis are important, blinding patients and providers may be less so, and at times unnecessary, he said. Requiring the treatment experience to be identical can obscure the truth of a product’s use in real-world settings.
And in such settings, sponsors should prepare to have study participants behave in unwanted ways — that is, as they would outside of a controlled trial environment.
As an example, a study examining the use of lithium to prevent repeated suicide attempts within the Veterans Affairs health system struggled with recruitment, with about 30 percent of potential patients being excluded for taking other medications.
The researchers argued to their institutional review board that these patients were necessary to mirror a real-world population, and they were allowed to be included with additional monitoring. In addition, unexpected events such as pregnancies should not be treated as protocol violations, but as something patients normally encounter in life.
These safety monitoring events can even be exploited as a benefit, said Michael Horberg, director of HIV and AIDS research at Kaiser Permanente.
In a major study of HIV pre-exposure prophylaxis treatment, or PrEP, patients were advised on safer sex and to use a condom, but the trial still resulted in over 120 pregnancies. However, they occurred without transmissions of HIV, helping to demonstrate the therapy’s effectiveness.
“When studying things that sometimes involve undesirable and stigmatized behaviors... unless we welcome in the way real world works, we’re never going to answer the question,” added Simon.
The academies’ third workshop, scheduled for July, will focus on approaches for operationalizing the collection and use of real-world evidence, including ways to supplement traditional clinical trials and challenges for incorporating its use into health systems and product development.