The Dataverse Project at IQSS received a two-year grant (2018-2020) from the Alfred P. Sloan Foundation to expand Dataverse’s data sharing features and practices.
In recent years, it has become increasingly common for a scientific journal to either recommend or require its authors to share the data accompanying a given publication. This growth in the adoption of data sharing policies, as well as the growth of public data repositories, has succeeded in increasing the number of research datasets that are findable and accessible--a change in which the Dataverse Project has played a key part. However, with only a few exceptions, most datasets have yet to be truly interoperable and reusable. With the current Sloan Foundation grant the Dataverse team, under the leadership of IQSS Chief Data Science & Technology Officer
Mercè Crosas, will focus on improving the quality of shared data by offering a myriad of tools and services to address this issue.
The Dataverse Project’s grant-funded efforts target three main goals: the first toward establishing multi-tiered curation services that will support independent researchers as well as larger organizations. This service model is intended to emulate that of the Odum Institute at the University of North Carolina Chapel Hill, which offers three tiers of free and fee-based curation services.
Dataverse’s second goal will be to create infrastructure and software tools that will make the process of reproducibility verification easier. Researchers and reviewers will have the option to conduct the process in the cloud without the need to create and maintain their own development environment. Depending on their needs and budget, researchers will also have the option to use Encapsulator (an open source tool created by a Harvard team led by outgoing professor Margo Seltzer), or Code Ocean (a fee service), both of which will be integrated into the Harvard Dataverse.
Code Ocean is a computational research platform that employs Docker technology to execute code in the cloud. The platform does two key things—it integrates the metadata, code, data and dependencies into a single ‘compute capsule’, ensuring that the code will run—and it does this in a single web interface that displays all inputs and results. Within the platform, it is possible to develop, edit or download the code, run routines, and visualize, save or download output, all from a personal computer. Users or reviewers can upload their own data and test the effects of changing parameters or modification of the code.
Finally, Dataverse will expand the metrics it makes available to researchers, using resources from the Make Data Count project at the California Digital Library to give researchers more information about how others reuse their data. To do this, Dataverse will develop new ways of tracking interactions with a researcher’s data, will create APIs to send metrics to DataCite, and will enhance dataset pages to show information available from the Make Data Count APIs about data impact.
This project will expand on work that the Dataverse Project has previously accomplished with the help of Sloan Foundation funding, including the PKP-Dataverse Integration Project and the more recent Scholarly Communication Project.
The Dataverse Project is an open source web repository that enables users to share, cite, and analyze research data. For more information, visit Dataverse.org.