Broad Institute of MIT and Harvard is teaming up with Google Genomics to explore how to break down major technical barriers that increasingly hinder biomedical research by addressing the need for computing infrastructure to store and process enormous datasets, and by creating tools to analyze such data and unravel long-standing mysteries about human health.
As a first step, Broad Institute’s Genome Analysis Toolkit, or GATK, will be offered as a service on the Google Cloud Platform, as part of Google Genomics. The goal is to enable any genomic researcher to upload, store and analyze data in a cloud-based environment that combines the Broad Institute’s best-in-class genomic analysis tools with the scale and computing power of Google.
GATK is a software package developed at the Broad Institute to analyze high-throughput genomic sequencing data. GATK offers a wide variety of analysis tools, with a primary focus on genetic variant discovery and genotyping as well as a strong emphasis on data quality assurance. Its architecture, processing engine and computing features make it capable of taking on projects of any size.
GATK already is available for download at no cost to academic and nonprofit users. In addition, business users can license GATK from the institute. To date, more than 20,000 users have processed genomic data using GATK.
The Google Genomics service will provide researchers with a powerful, additional way to use GATK. Researchers will be able to upload genetic data and run GATK-powered analyses on Google Cloud Platform, and may use GATK to analyze genetic data already available for research via Google Genomics. GATK as a service will make best-practice genomic analysis readily available to researchers who don’t have access to the dedicated compute infrastructure and engineering teams required for analyzing genomic data at scale. An initial alpha release of the GATK service will be made available to a limited set of users.
“Large-scale genomic information is accelerating scientific progress in cancer, diabetes, psychiatric disorders and many other diseases,” said Eric Lander, president and director of Broad Institute. “Storing, analyzing and managing these data is becoming a critical challenge for biomedical researchers.”
Broad Institute plans to continue to support and upgrade GATK for all users, both on site and on the cloud, and will continue to offer the software directly. Academic and nonprofit users will continue to have free access to GATK just as they do today through broadinstitute.org/gatk. Business users will continue to be able to license GATK through the Broad directly. By offering GATK on the Google Cloud Platform, users will have another option that could eliminate the need for labs to develop additional computing infrastructure on site.
Broad Institute is a founding host institution of the Global Alliance for Genomics and Health (GA4GH), which was established in 2013 to build a shared framework to enable genomic and clinical data sharing while ensuring data privacy and security as genomic research continues to evolve. Google joined the Alliance in early 2014. Services available through the Broad and Google collaboration will be specifically designed to align with existing and emerging GA4GH standards.
In keeping with the Broad’s mission to foster openness and innovation, this collaboration will be non-exclusive. Broad and Google will each continue to engage with other community members on genomic projects to empower research worldwide.