Roadmap: The Dataverse Project

dataverse project logo

The Strategic Goals of the Dataverse Project are our highest-level guide.  These goals are to:

  • increase adoption (users, dataverses, datasets, installations, journals
  • develop capability to handle sensitive, large scale, and streaming data
  • expand data and metadata features for existing and new disciplines
  • expand archival and preservation features
  • increase interoperability through implementation of standards
  • increase contributions from the open-source development community
  • improve UX and UI
  • continue to increase the quality of the software

Throughout the year, we'll identify big steps that we can take to focus on one or more of these goals. These big steps are detailed on our Roadmap, below.

The big things the development team and the community are working on right now are shown in the Implementation section. In the Planning/Design section, you can see those strategic items that we've prioritized and are designing and testing with the community. In the Future section, you'll see the things that we'd like to work on but haven't yet prioritized. A list of Recent Releases is available at the bottom of the page. 

This roadmap is only focused on the big initiatives. We're always working on smaller bug fixes and enhancements. If you'd like to see everything that the Dataverse Team and Community are working on right now, check out the Project Board on Github.

Implementation

Q1 2020 Python Installer (Released in Dataverse 4.19)

  • A new installer written in Python will replace the old installer written in Perl. This will make the installer more maintainable and will make it easier for community members to contribute.
  • Github Issue #3937

Q1 2020 OpenID Connect (Released in Dataverse 4.19)

  • Though adding basic support for any OpenID Connect (OIDC) compliant authentication provider, Dataverse installations will be able to easily integrate with authentication providers just by loading a configuration file, without touching the codebase. Previously, authentication providers would need to be added through a code change and a fork would need to be maintained. 
  • Github Issue #5974

Q1 2020 Multiple Dataverse Storage Locations (Released in Dataverse 4.20)

  • Dataverse Installations will be able to add and manage a specific S3 storage location for each dataverse. This allows installation administrators to better allocate and track storage, especially in installations supporting multiple institutions.
  • Github Issue #6485

Q1 2020 Direct S3 Upload for Large Files (Released in Dataverse 4.20)

  • Users will be able to upload directly to S3 storage instead of having data uploaded over HTTP, which is inappropriate and unreliable for large files. Methods to transfer large data in to Dataverse already exist through scripts, but users will now be able to upload files using the more familiar, user friendly methods on the dataset page.
  • Github Issue #6489

Q2 2020 Simplified Add Data, Linking, and Dataverse Creation Workflows

  • By making it easier to create dataverses and link datasets, researchers can more easily create and curate custom collections of data.
  • Github Issue #5874#5890#5615

Q2 2020 Updated Privacy Policy and Terms of Use

  • The Harvard Dataverse will review and update the privacy policy and the terms of use. 
  • Github Issue #26

Q2 2020 Homepage Visualization

  • The IQSS Dataverse team is working with the Harvard Library to create a homepage visualization that shows the growth of a dataverse installation and the activity and connections between the datasets. This will be made open source for the community to use and edit.
  • Github Issue #5603

Q2 2020 Trusted Remote Storage Agent (TRSA) Integration for Sensitive and Large Data

  • TRSAs allow data providers to create metadata records in Dataverse for research data that is too large or sensitive to deposit into Dataverse. Researchers can discover the metadata in Dataverse and be directed to the appropriate steps or automated workflows to access the data itself. Additional Information is available from cyberimpact.us.
  • Github Issue #5213
  • Check out the Code in Progress

Planning/Design

Q2 2020 Redesigned, Scalable Dataset and File Pages

  • As we add additional features to Dataverse we're finding that we need to revisit our Dataset and File pages. We're working on a more modular, scalable, accessible, and responsive experience that will be informed by present use cases and future use cases.  
  • Mockups
  • Github Issue #3404

Q3 2020 Capsulation and Packaging for Replication Objects

Future

Q3 2020 Citations for Dataverses

  • Researchers will be able to cite Dataverses as well as Datasets and Files.
  • Github Issue #6112 

Q3 2020 Embargo

  • Authors will be able to create dataset metadata in Dataverse and set up a timed-release process for the data itself.
  • Github Issue #4052

Q4 2020 Code Deposit and Expanded Software Metadata, Sync from Github

  • Code can currently be deposited into Dataverse, but we'll provide some code-specific metadata, an updated workflow, and a way to automatically sync Github repositories into Dataverse as researchers make changes.
  • Github Issue #2739#5372

Q4 2020 Green, Blue, Yellow DataTags Support

  • Through an integration with Datatags, researchers will be able to deposit and share datasets that contain sensitive information up to the Yellow level.  
  • Github Issue #871 

2021 Orange DataTags Support

  • Through an integration with Datatags, researchers will be able to deposit and share datasets that contain sensitive information up to the Orange level.
  • Github Issue #871

 

    Recent Releases

     

    • 4.20 Multiple S3 Stores, Direct S3 Upload 4/1/2020
    • 4.19 OIDC Connect, Python Installer 1/22/2020
    • 4.18 File Preview, Microsoft Login 11/14/2019
    • 4.17 Dataset Level Explore Tools, Performance Enhancements 10/3/2019
    • 4.16 Metrics Redesign and Make Data Count Support, HTML Codebook Exports, Harvesting Improvements 8/28/2019 
    • 4.15.1  Performance Enhancements, Variable Metadata Edit APIs 7/10/2019
    • 4.15 Sorting and Filtering Files in a Dataset, Better Recognition and Categorization of Files 6/14/2019
    • 4.14 OpenAIRE-compliant exports, expanded analytics options 5/10/2019
    • 4.13 File Hierarchy Support, File Metadata Edit APIs 4/22/2019 

    Last updated 4/1/2020