1000 Genomes Project data available on Amazon Cloud

Monday, April 2, 2012 02:51 PM

The world's largest set of data on human genetic variation — produced by the international 1000 Genomes Project — is now publicly available on the Amazon Web Services (AWS) cloud, the according to the National Institutes of Health (NIH).

The public-private collaboration demonstrates the kind of solutions that may emerge from the Big Data R&D Initiative announced last week by the White House Office of Science and Technology Policy.

"The explosion of biomedical data has already significantly advanced our understanding of health and disease. Now we want to find new and better ways to make the most of these data to speed discovery, innovation and improvements in the nation's health and economy," said NIH director Francis S. Collins, M.D., Ph.D. Collins was among agency leaders speaking in support of the initiative at the launch event.

The Big Data initiative will initially engage at least six federal science agencies — including the NIH, the National Science Foundation, and the Department of Defense and the Department of Energy — committing more than $200 million to a collaborative effort to develop core technologies and other resources needed by researchers to manage and analyze enormous data sets.

Among the NIH components participating in the Big Data initiative are the National Human Genome Research Institute (NHGRI) and the NIH National Center for Biotechnology Information (NCBI) — a division of the National Library of Medicine. NHGRI played a lead role in organizing and funding the international 1000 Genomes Project. NCBI, along with the European Bioinformatics Institute of Hinxton, England, began making 1000 Genomes Project data freely available to researchers in 2008.

Since the project's launch in 2008, the data set has grown enormously: At 200 terabytes — the equivalent of 16 million file cabinets filled with text, or more than 30,000 standard DVDs — the current 1000 Genomes Project records are a prime example of big data that has become so massive that few researchers have the computing power to use them.

To help solve that problem, AWS has just posted the 1000 Genomes Project data for free as a public data set, providing a centralized repository on the Amazon Simple Storage Service. The data can be seamlessly accessed through services such as Amazon Elastic Compute Cloud and Amazon Elastic MapReduce, which provide organizations with the highly scalable resources needed to power big data and high performance computing applications often needed in research. Researchers pay only for the additional AWS resources they need to further process or analyze the data.

The public-private collaboration to store the data in the AWS cloud allows any researcher to access and analyze the data at a fraction of the cost it would take for their institution to acquire the needed internet bandwidth, data storage and analytical computing capacity.

"Improving access to data from this important project will accelerate the ability of researchers to understand human genetic variation and its contribution to health and disease," said NHGRI director Eric D. Green, M.D., Ph.D. NHGRI is a major funder of the 1000 Genomes Project, along with Wellcome Trust of London and BGI-Shenzhen of China.

Cloud access also enables users to analyze the data much more quickly, as it eliminates download time and because users can run their analyses over many servers at once. "Putting the data in the cloud provides a tremendous opportunity for researchers around the world who want to study large-scale human genetic variation but lack the computer capability to do so," said Richard Durbin, Ph.D., co-director of the 1000 Genomes Project and joint head of human genetics at the Wellcome Trust Sanger Institute in Hinxton, England.

Paul Flicek, D.Sci., co-leader of the 1000 Genomes Project Data Coordination Center (DCC), added that the new venue “fulfills a central goal of the 1000 Genomes Project to make the data as widely available as possible to accelerate medical discoveries.”

Share:          
CLINICAL TRIAL RESOURCES

Search:

NEWS ONLINE ARCHIVE

Browse by:

CWWeekly

September 30

Novartis-Walgreens pilot study blurring the line between retail pharmacy, investigative site

CISCRP to launch traveling science museum exhibit to demystify clinical trial participation

Already a subscriber?
Log in to your digital subscription.

Subscribe to CWWeekly.

The CenterWatch Monthly

October

New growth and decline in Asia clinical trials
South Korea, Japan, China see big growth in 1572s, while India posts huge drop

Harnessing Big Data to transform clinical trials
From protocol to patient recruiting, data analytics can yield valuable insights

Already a subscriber?
Log in to your digital subscription.

Purchase the October issue.

Subscribe to
The CenterWatch Monthly.

The CenterWatch Monthly

September

Sponsors look to collaborate on comparator drugs
Co-therapies, comparators are in 60% of studies, cost $25m per company a year

Early adopters implement risk-based monitoring pilot programs
Experiments aim to offer long-term solutions, despite short-term uncertainties

Already a subscriber?
Log in to your digital subscription.

Purchase the September issue.

Subscribe to
The CenterWatch Monthly.

JobWatch centerwatch.com/jobwatch

Featured Jobs