U.S. research institutions collaborate to share, standardize neuroscience databases
The Allen Institute for Brain Science based in Seattle, Wash., the California Institute of Technology, the New York University School of Medicine, the Howard Hughes Medical Institute (HHMI) in Massachusetts and the University of California, Berkeley are collaborating on a project aimed at making databases about the brain more useable and accessible for neuroscientists—a step seen as critical to accelerating the pace of discoveries about the brain in health and disease.
With funding from GE, the Kavli Foundation, the Allen Institute for Brain Science, the HHMI and the International Neuroinformatics Coordinating Facility (INCF), the year-long project will focus on standardizing a subset of neuroscience data, making this research simpler for scientists to share.
This is the first collaboration launched by Neurodata Without Borders, a broader initiative with the goal of standardizing neuroscience data on an international scale, making it more easily sharable by researchers worldwide.
Unlike image file formats such as jpeg or tiff that store digital information when a photo is taken with a mobile phone, which can be shared with anyone with a computer, no such data standard exists in neuroscience. However, developing such a standard, unified data format would enhance the ability of brain researchers worldwide to share and combine their research results. This would not only drive progress in neuroscience, but also encourage the validation of existing results and create vital new collaborations with other fields.
Recent, rapid technical advances mean that neuroscientists are generating data that are quantitatively and qualitatively different than before. But the languages, or formats, they use to capture those data (as well as the software tools they use to access and analyze them) vary from laboratory to laboratory—and sometimes even within a laboratory. This lack of uniformity makes it challenging to share and integrate experimental data—the raw material of science—and to mine and extract the most value from them.
The need for a common data format in neuroscience is made more urgent by the rise of large-scale collaborative projects, such as the Brain Research through Advancing Innovative Neurotechnologies (BRAIN) Initiative in the U.S.
"These new initiatives are going to produce masses of data, but if it isn't interchangeable and comparable, it's just not going to be useful," said Christof Koch, chief scientific officer at the Allen Institute.
On a practical level, scientific publishers and granting agencies, such as the NIH, are moving toward mandating data sharing as a requirement for funding.
"This is following on other efforts at openness in science," said Markus Meister, a professor of biology at Caltech whose research group is supplying experimental data to the project. "The idea is that the material or resources that were developed with government funding or published in a journal have to be made available. But for neurophysiology data, there is no organized mechanism for doing that at the moment."
The initial one-year program focuses on a subset of neuroscience data: cell-based neurophysiology data, which is sought-after by theorists who are building models of how the brain works. The partners will work with software developers and vendors to establish an open format that can store electrical and optical recordings of neural activity, and, importantly, the conditions under which an experiment was performed, such as how brain activity was recorded, how the animal was behaving at certain time points, and its species, sex and age. These metadata often are lost and yet without them the research results are meaningless.
This metadata problem poses an enormous challenge, said Friedrich Sommer, a theoretical neuroscientist at UC Berkeley who oversees an existing repository, CRCNS.org, where the neurophysiology datasets of Neurodata Without Borders will be stored and shared. UC Berkeley is coordinating Neurodata Without Borders with staff from the Allen Institute.
As Sommer explains, once a data format has been selected and extended, the neurophysiology datasets will be translated into the new common language and shared with the broader neuroscience community through the repository. Lastly, "application programming interfaces" (APIs) will be developed to allow researchers to use the common format for their own data with ease.
To get to that point, Neurodata Without Borders is calling on the neuroscience community to get involved. "We want to solicit the best ideas for the data format, so we are inviting researchers to look at the datasets which are now shared in their current format at CRCNS.org. Our hope is to engage the community to contribute ideas or propose their own data format for consideration," said Sommer.
The most promising approaches to a common data format will be discussed, tested and extended at Neurodata Without Borders Hackathons, the first of which will be held in late November, to drive the rapid development of innovative software tools.
"The project has an aggressive timeline, but in a year's time, the goal is to come up with a standard for neurophysiology data that we can agree on. We may not get it 100% right for 100% of researchers, but we'll make a very good attempt," said Karel Svoboda, group leader at HHMI's Janelia Research Campus and a data-provider to Neurodata Without Borders. "Then, by buying into the data format ourselves—by explicitly moving our data into the format and making them available, we'll set an example of how it could be done, and hopefully have others in the neuroscience community follow in our footsteps."
"With the emergence of large-scale brain initiatives around the world, data reuse and sharing becomes more important than ever. This project will facilitate neuroscience collaboration at a global scale," said Sean Hill, scientific director, INCF.