Last week Vice President Joe Biden launched the first open access data system for sharing cancer tumor and clinical data—a step he called “critical” in the White House’s Cancer Moonshot and Precision Medicine Initiative efforts to accelerate the pace of cancer drug discovery.
Named the National Cancer Institute’s (NCI) Genomic Data Commons (GDC), the new data platform is poised to speed drug development by “increasing the pool of researchers who can access data and decreasing the time it takes for them to review and find new patterns,” said Vice President Biden.
The GDC will centralize, standardize and make accessible genomic data from large-scale programs of the NCI, including The Cancer Genome Atlas (TCGA) and its pediatric equivalent, Therapeutically Applicable Research to Generate Effective Targets (TARGET). Together, the two data sets contain tumor genome sequences and clinical data from more than 14,000 patients.
In addition, the GDC will “house data from a number of newer NCI programs that will sequence the DNA of patients enrolled in NCI clinical trials,” said Louis M. Staudt, M.D., Ph.D., NCI. “These datasets will lead to a much deeper understanding of which therapies are most effective for individual cancer patients.
“We hope that GDC will be a place where we increasingly learn about genetics and cancer,” said Staudt. “Drug companies have increased awareness of mutant alleles. If they see a hotspot in a gene that is mutated, they can see if the drug works against it.”
Consolidating the data in a readily searchable fashion will significantly speed the process of zeroing in on a particular gene for study, added Staudt. The GDC will allow researchers to access data in minutes that previously would have taken months to pull from storage in separate silos. The system uses state-of-the-art computational tools for analysis. Without an advanced system like the GDC, “any complicated question about TCGA data would require extensive bioinformatics infrastructure and support, and would take months to assemble,” said Staudt.
“Discoveries that would have taken millions of dollars, many years and required integrated diverse teams in the past can essentially be done in days with the GDC,” said Robert Grossman, M.D., Ph.D., professor of medicine and director of the Center for Data Intensive Science at the University of Chicago, where the GDC was developed. “GDC is taking the same approach to cancer data that Google is taking to advertising.”
Access to large datasets costs money, said Staudt. By crowdsourcing high-value data for the public domain, the GDC will essentially democratize data, said Grossman. In fact, the GDC, which involved more than two years of development, is a major advancement of the Obama Administration’s push to open up data, particularly to support health.
“It’s a step in the right direction,” said Daniel Castro, director of the Center for Data Innovation. “It’s in line with what we’re seeing in Europe.” Another data-sharing platform—the European Open Science Cloud Initiative—was announced last month.
“In some regards, the U.S. is still behind the curve,” said Castro. Europe ranks as the global leader in data-sharing efforts, he said. In the U.S., there are a few federally funded initiatives on a few platforms. In contrast, the European Initiative outlines a more comprehensive plan for data sharing among multiple stakeholders including both government and private industry. The European Commission described it as “a blueprint for cloud-based services and world-class data infrastructure” for Europe’s 1.7 million researchers and 70 million science and technology professionals to “ensure science and public services reap the benefits of the big data revolution.”
“The GDC is a great step forward and a big goal, but not big enough,” said Greg Koski, M.D., Ph.D., co-founder and president of the Alliance for Clinical Research Excellence and Safety. “The GDC is very limited in comparison of the full capabilities that could be realized. We need to connect data across multiple databases.” The GDC includes data from more than 14,000 patients, whereas companies such as SureScripts maintain data from millions of patients, he said.
“This is a NCI-driven initiative that the government can’t do by itself,” he said. Development of deep knowledge requires “large-scale, robust public-private partnerships overseen by a neutral third party. The technology is there to do more. We just haven’t developed the will.”
In the future, NCI hopes that the platform will expand to substantially impact drug discovery. “The explanatory power of the data in the GDC will grow over time as data from more patients are included, and ultimately the GDC will accelerate our efforts in precision medicine,” said Douglas Lowy, M.D., NCI acting director.
Over the course of the next two years, plans are being made for the rapid development of new capabilities such as sophisticated tools for integrating histological data and CT scan cancer imaging. In addition, the GDC will be generating data that can be used to study relapse after therapy and development of drug-resistant alleles, according to Staudt.
“With each new addition, the GDC will evolve into a smarter, more comprehensive knowledge system that will foster important discoveries in cancer research,” said Staudt. “I view the launch as the end of the beginning.”
This article was reprinted from Volume 20, Issue 23, of CWWeekly, a leading clinical research industry newsletter providing expanded analysis on breaking news, study leads, trial results and more. Subscribe »