DIA Conference: Researchers Work to Balance Data Speed and Accuracy in Complex Studies
As clinical trials grow more complex and technology offers more ways to acquire and process the data generated during studies, researchers face a growing tension between providing data for evaluation quickly — often with goals of near-real-time — and ensuring that it is accurate and free from anomalies.
Drug monitoring committees (DMCs) have a key role to play in this process, especially in assisting in overseeing data quality issues to help ensure study integrity, often by flagging data anomalies, Anna Kettermann, a mathematical statistician in CDER’s Office of Biostatistics, noted in a discussion last week during a session at the DIA’s 2020 Virtual Global annual meeting.
But central data monitoring and stronger statistical process control during a trial could mitigate issues caused before the data reaches the DMC, which could provide the DMC with a clearer picture of the data and help it to better detect and address underlying data quality issues, she said.
The DMCs view their job to be to make recommendations during the course of a clinical study about whether that study should continue or whether there should be changes to the protocol or operations, said Janet Wittes, president of WCG Clinical’s Statistics Collaborative. Since DMCs are largely concerned about bias in data, they care very much about the speed at which data is provided, because “if the data that it sees is six months old, what is advised isn’t very helpful. That is why it wants to see very speedy, up-to-date data,” she said.
Statistical reporting groups (SRGs) are important players in this process, Wittes said. These groups receive data from study databases; from that data, the SRGs develop reports for the DMCs. For the reports to be meaningful requires a clean database. The time needed to clean the database can mean that data reaching the SRG — and then the DMC — can be delayed.
“To me, that is not best practice,” Wittes said. “Best practice is to give the data as quickly as you can, but let the SRG use defensive programming to remove bad data and correct errors. The goal of the SRG is to take data known to be somewhat ‘dirty’ and prepare a report in way that is clinically sensible so the DMC can make recommendations about what should happen with the study.”
“In managing the tension between data cleanliness and speed, we tend to prefer speed,” Wittes said, adding that this could mean that data anomalies never get flagged for the sponsor. “A good SRG is focused on getting the committee information to make recommendations. Defensive programming prevents [anomalies] from being in the report. We don’t tell the sponsor about data anomalies; we assume the sponsor will fix obvious errors in the data cleaning process.”
But that doesn’t always happen. Wittes noted that her company conducted a study with a DMC in which it ran final clinical research data through a routine analysis. That review showed that 30 percent of data was missing primary outcome data. For instances, the number of episodes in a work week added to those over a weekend did not add up to the reported total for the full week. A solution, she suggested, is for SRGs to do defensive programming; that helps DMCs by providing cleaner data.
Katan, however, estimated that only 3 percent of total data typically is changed. Of that change, about half is due to source document verification, she added.
“Data may be messy, but it’s due to the process, not the data itself,” she said. “If we can control the process, we can control the data. That doesn’t mean trying to control every little step, but to control the big picture.”
Statistical process control (SPC) is one way of exerting the necessary control — and thus, improving data accuracy — at speed, Katan suggested. Developed as a method for manufacturing, it can be used in many other arenas, including healthcare. Similar to the risk-based monitoring approach, the SPC method aims to identify risks to quality. It does this by tracking pre-identified metrics over time, identifying atypical behaviors, conducting a root cause analysis and taking appropriate action to resolve that root cause. The SPC team then must verify that the resolution successfully addressed the data issue found.
“We don’t fix data points, but want to make sure whatever a site did to create an error is not repeated,” Katan said, meaning that just as data may be processed in near-real-time, so can data cleaning or identification of errors. This can increase both the accuracy of the data and the speed with which it is processed. SPC can be done leveraging remote monitoring tools and central monitoring teams for maximum efficiency, she noted.
And Kenneth Getz, deputy director and research professor at Tufts Center for the Study of Drug Development, pointed to potential technological solutions. He noted that CRFs have seen a declining share in total data, as more data is collected via specialty labs, wearable technology and mobile apps. Much of the data is collected, compiled and cleaned remotely. The clinical research industry’s response to the COVID-19 pandemic has further boosted this tendency, and “we expect a lot of this real-world data to increase its contribution to the clinical data we are used to relying on,” Getz said.
Reduced complexity and streamlined, risk-based procedures can help keep data flowing quickly to the DMC while also improving the accuracy of the data presented for DMC review, as can use of machine learning and AI-assisted approaches to high-volume data management.
While the challenge of concomitantly speeding availability of data and ensuring accurate, quality data is provided to the FDA for review creates tension in the process, the panel agreed that both factors are important to generate high-quality data in a clinical trial. Kettermann said both are necessary for high-quality data, saying, “You cannot compromise either accuracy or speed. We need both.”
Nechama Katan, director of data science at Pfizer, agreed that “the faster they can address systemic weaknesses, the faster they can improve data. Speed and accuracy are not trade-offs. By waiting and letting it sit, you end up with accuracy problems because you can’t fix systemic problems.”