Dataverse Joins NIH in Increasing Access to Biomedical Data

January 26, 2022
dataverse_project_logo2_1_02

The Dataverse Project at IQSS is joining the Office of Data Science Strategy (ODSS) at the National Institutes of Health (NIH) and five other data repositories in launching a new data curation, sharing, and interoperability initiative. Through this collaboration, the Generalist Repository Ecosystem Initiative (GREI), the Dataverse Project plans to facilitate access to NIH-funded data by building on the existing Harvard Dataverse Repository. In order to supplement the NIH’s existing domain-specific repositories, the goal of the GREI is to expand its data ecosystem into additional repositories so that researchers can more easily and effectively find and share data from studies funded by the NIH. 

The Dataverse Project will expand its services and teams–including the UX/UI, development, research computing, and data curation and management teams–using recently awarded NIH funding. The funding is designed to be flexible, and will allow the Dataverse Project team to work with NIH and the other collaborating repositories to determine the highest priority areas of focus and impact, and then quickly build new workflows to support biomedical researchers. Some areas being explored are:

  • Supporting very large datasets by integrating  metadata records in the repository with the data in  research computing storage, allowing data to be discovered in the Harvard Dataverse and viewed, explored, and analyzed directly in the research computing environment;
  • Increasing support for biomedical and cross-domain metadata standards and controlled vocabularies–taking advantage of the Dataverse Project’s extensive support for metadata standards and additional custom metadata;
  • Facilitating researchers’ efforts to share and publish their entire workflows or containers that describe the main transformations and analysis of the data, following the FAIR (Findable, Accessible, Interoperable, and Reusable) principles;
  • Improving the existing harvesting functionality in the Dataverse software based on the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) standard, and coordinate with other repository packaging standards to share or move metadata and data;
  • Standardizing usage metrics within the repository and across other repositories and coordinating the implementation of the metrics with other repositories, so that the values and assumptions are comparable;
  • Supporting the sharing and discovery of sensitive data using privacy-preserving tools such as those from the OpenDP project at IQSS;
  • Improving the user experience and interface (UX&UI) and Application Programming Interfaces (APIs) for depositing, viewing, and accessing data and metadata in the repository; and
  • Providing curation services, outreach, and training for managing and sharing NIH-funded research assets in the repository; the Dataverse team at IQSS will provide overall guidance and support in UX&UI efforts, technical design, and implementation. 

Launched in 2006, the Harvard Dataverse Repository is powered by the Dataverse open-source software, which was developed at the Institute for Quantitative Social Science at Harvard University. Of the 76 Dataverse repositories that are now deployed worldwide the Harvard Dataverse Repository is the largest, with more than 100,000 datasets containing about 1 million files available to users who continue to share, explore, cite, and analyze every day.

The Dataverse Project was founded by IQSS Director Gary King in 2006. Its software platform provides a preservation and archival infrastructure, and allows researchers to share, keep control of, and get recognition for their data through an easy to access web browser interface.

For more information about the Dataverse Project and its platform, visit: https://dataverse.org/

The Office of Data Science Strategy leads implementation of the NIH Strategic Plan for Data Science through scientific, technical, and operational collaboration with the institutes, centers, and offices that comprise the National Institutes of Health. More information about the Generalist Repository Ecosystem Initiative (GREI) can be found in the recent ODSS announcement on the initiative.