IQSS Data Science releases Dataverse 4.0

May 27, 2015
dataverse_project_logo2_

The Harvard Dataverse has upgraded to Dataverse 4.0, with many new and reworked features. The IQSS Data Science team, under the direction of Director of Data Science Merce Crosas, worked directly with users to produce the latest version of the Dataverse, released in April 2015 and available for all to use.

The Dataverse is an open source web application developed at IQSS, in collaboration with the Harvard Libraries and Harvard University Information Technology (HUIT). The Dataverse enables users to easily make data available to others, and allows them to replicate others' work while researchers, data authors, publishers, data distributors, and affiliated institutions all receive appropriate credit.

The Dataverse has a growing and active development and user community. There are currently ten installations of the Dataverse software platform, supporting institutional data repositories as well as repositories open to all researchers worldwide. Harvardís own Dataverse is open to the world, with more than 1000 dataverses (virtual archives or containers of datasets), more than 58,000 datasets and 270,000 files, and more than 1.3 million downloads.

Crosas explains that the many changes in the new Dataverse can be summed in three major categories.

A face lift: This includes a new user experience and user interface. Dataverse 4.0 also includes a new process to iteratively get feedback from users, and conduct formal usability testing to validate the changes in user experience.

A body lift: Structural improvements include upgraded open source technology; a robust, extensible enterprise platform (Java EE7, application server Glassfish 4.1); leading-edge UI framework (PrimeFaces, Bootstrap); and Solr search platform.

A mind lift: With version 4.0, Crosas explains, the Dataverse defines data publishing in a rigorous way - data must be discoverable, accessible, reusable and interoperable. Dataverse 4.0 follows standards and best practices for research data management, publishing, and preservation in order to provide a feature-rich, open source repository platform for publishing, citing, and archiving data. New features introduced with Dataverse 4.0 include:

  • Data citation generated automatically, compliant with the Joint Declaration of Data Citation Principles and the data citation standard proposed by Altman and King (2007).
  • Rich metadata to improve discovery and reusability of data, with faceted search across three levels: citation metadata, scientific domain-specific metadata, and file metadata.
  • Data publishing workflows, supporting multiple (and configurable) roles and permissions to contribute, collaborate, review, curate, and publish a dataset.
  • Reformatting for preservation of tabular data files (statistical files, such as SPSS, STATA and R files) to convert data to a preservation format independent of the original software that was used to create the file.
  • Interoperability with extensive APIs to deposit, search, and access data, allowing to integrate Dataverse with data visualizations and analysis tools such as TwoRavens, a web tool to explore and analyze data using Zelig and R.

For more information on the Dataverse, please visit Dataverse.org or read more at the IQSS Data Science team website.