Blog

The Scholars Portal Dataverse team is happy to announce that the Dataverse File Previewers tool has been installed and enabled for Scholars Portal Dataverse users.

This new tool will allow users to view certain file types, like images and audio/video files, directly in their web browser without needing to download the file. Files that are restricted will not be able to be previewed unless the user has permissions to download the file in the system.

File previews can be open by clicking on the "Explore" button in the list of files in a dataset, or by clicking "Explore" on a file landing page.

After clicking "Explore", a new tab will open in your browser to preview the file. Depending on the file type, you may see an image, text, or a playable audio/video file.


A preview of an image file

The following file types can be previewed in Scholars Portal Dataverse:

  • Text and document files: txt, html, pdf
  • Audio files: mp3, mpeg, wav, ogg
  • Image files: gif, jpeg, png
  • Video files: mp4, ogg, quicktime
  • Statistical formats: Stata, R
  • Hypothesis annotations

The File Previewers tool was developed by the Qualitative Data Repository at Syracuse University. More information about the tool is available on the project's GitHub repository.

The Scholars Portal Dataverse team is pleased to announce the completion of the development project “Dataverse for the Canadian Research Community,” funded by CANARIE’s program for Research Data Management software tools.

The 18-month project involved innovative development of the Dataverse platform to better support Canadian researchers around data deposit and sharing with national and international collaborations. The development work addressed key themes around data curation, authentication, scalability, and large-file support (described below).


Data Curation

The Data Curation Tool (DCT) allows users to create and edit variable-level metadata for tabular files (e.g., SPSS, R, Excel, CSV), as a modular application launched from within Dataverse. The aims of the DCT are to improve data curation workflows within Dataverse, to improve the ability for data reuse, and to support the application of standards and best practices using the Data Documentation Initiative (DDI) metadata standard.

Other features include the ability to view summary statistics and charts about the data. User edits are saved back to Dataverse and can be exported outside the platform.

Special thanks to the University of Ottawa for the French translations.

Deliverables

  • The DCT was launched in Scholars Portal Dataverse on October 31, 2019.
  • Code openly available in GitHub 
  • Hosted webinar (see recording)
  • Published blog post on features and functionality

Authentication

Scholars Portal configured Dataverse to work with Shibboleth for institutional single sign-on through the Canadian Access Federation (CAF), an identity management service for Canadian research institutions run by CANARIE. Dataverse requires each user’s email, first name, last name, and affiliation, which are released under Research and Scholarship (R&S) entity profile.

The benefits of using the CAF's R&S include ease of collaboration between Dataverse as the service provider and institutions as the identity providers. CAF’s vetting process ensures secure and trustworthy exchange of identity information.

For Dataverse users, this integration results in a simpler log-in process with one less username and password to manage.

Deliverables

  • Launched in Scholars Portal Dataverse on October 31, 2019
  • 14 institutions now participating
  • Planning upcoming webinar with Portage and CANARIE on the benefits of joining CAF's R&S profile
  • Documentation in Scholars Portal Dataverse Guide coming soon

Scalability

Scholars Portal connected Dataverse to cloud storage by hosting files in a test cluster of the Ontario Library Research Cloud (OLRC).

The aim is to optimize system architecture for scalable use and to leverage an existing, distributed Canadian data storage network.

Deliverables

  • Developed in test environment with an innovative design
  • Plan to upgrade to cloud storage for Scholars Portal Dataverse (end of 2020)

Large-file support

Scholars Portal developed proof-of-concept integration with Globus as a large-file transfer mechanism. Dataverse users would run Globus Connect Personal (free software) and have a Globus account to upload/download files to/from Dataverse.

Our testing demonstrated robust transfers up to 100 GB in size and up to 38,000 files. We are continuing to collaborate and consult with the Harvard's IQSS Dataverse team to bring this proof-of-concept development work into the core code.

Deliverables

  • Developed in test environment
  • Code available in GitHub
  • Plan to launch in Scholars Portal Dataverse (early 2021)
  • Blog post with demo of Globus deposit coming soon



Acknowledgements

Thank you to CANARIE for the grant funding that made this project possible.

The development work was a collaborative effort by the Scholars Portal Team:

  • Direction/Organization
    • Kate Davis - PI
    • Amaz Taufique - Technical Lead
    • Amber Leahey - co-PI (on leave)
    • Meghan Goodchild - Project Manager
    • Kaitlin Newson
  • Developers
    • Jayanthy Chengan
    • Sunil Manikonda
    • Victoria Lubitch
  • Systems Support
    • Bikram Singh
    • Sohaib Anwar
    • Carlos McGregor
    • Dawas Zaidi

Special thanks for input and feedback:

  • Lee Wilson (Portage)
  • Danny Brooke, Gustavo Durand, and Tania Schlatter (IQSS Harvard)
  • Jim Meyers
  • Felicity Tayler and Pierre Leblanc (University of Ottawa)


The Scholars Portal Dataverse team has been hard at work preparing for our upgrade to version 4.19, up from version 4.17. We're happy to announce that this upgrade has been successfully completed and is now available.

Some of the new features in this upgrade include:

  • Improved descriptive text and internationalization on the login page
  • Expiry dates for API tokens
  • Allowing for MakeDataCount metrics to be collected (but not yet displayed in the front-end)
  • Moving the institution name to the front of the subject line for support emails
  • New Dataset Author Identifiers options in metadata (DAI, ResearcherID, Scopus)
  • User notifications and emails are now sent when a tabular file ingest completes (e.g., xlsx, csv, sav)
  • Accessibility improvements for keyboard support and screenreader users

There are also several bug fixes in this release, including:

  • Fixed thumbnail display issues on the first page load
  • Resolved issue where French characters in SPSS files may be garbled on ingest

Harvard’s complete release notes are available for each version at the following links:

A special thank you to Alex Cooper (Queen’s University), Martine Gagnon (Université Laval), Ève Paquette-Bigras (Université de Montréal), Doug Brigham (University of British Columbia), and Yayo Umetsubo (University of Toronto Mississauga) for volunteering their time to help us test this release.



The Scholars Portal Dataverse team has been hard at work on the new Dataverse Data Curation Tool as part of our Canarie RDM grant project. Development on this project is being led by Victoria Lubitch, Programmer/Analyst at Scholars Portal.

The Data Curation Tool (DCT) allows data owners and curators to create and edit variable-level metadata for any tabular file in a dataset. Users can access this tool as a modular application once they’ve uploaded a tabular file (e.g., SPSS, R, Excel, CSV) to a dataset in Dataverse.


The Data Curation Tool

The Data Curation Tool


Similar to tools like SPSS, the DCT allows users to view summary statistics about their data, add variable information like 'Interviewer Instructions' or 'Notes', create variable groups, and indicate weighting variables.


Summary Statistics in the DCT

Summary statistics in the DCT


Variable editor in the DCT

Variable editor in the DCT


Once edits have been completed and saved back to Dataverse, these changes can then be downloaded as an XML file or exported to a codebook.


Example of a codebook in Dataverse

Example of a codebook in Dataverse


Usability testing sessions were recently completed with 5 participants, who worked through a series of tasks and helped us identify areas where the user experience could be improved in the tool. We’re now working on translating this tool to be used in French, with translations provided by the University of Ottawa.

A demo of this tool is available online, and the code can be accessed on GitHub. The Data Curation Tool will be launched with the next Scholars Portal Dataverse upgrade, currently scheduled for the end of October, and will be available for community testing soon.

If you have any comments or suggestions, contact us at dataverse@scholarsportal.info. If you would like to see all the updates and have a SpotDocs account, click the "Watch this blog" button on the top right corner of the page to receive notifications.

Background

Welcome to our Scholars Portal Dataverse blog, where we will be sharing news and updates about the Dataverse platform and service, including development work. Our first blog post provides an update about the development project "Dataverse for the Canadian Research Community"! This project is funded by CANARIE's RDM grant program and led by Scholars Portal and University of Toronto Libraries, with support from CARL and Portage. 

We're currently about half way through our 18 months of development work (October 2018-March 2020).

The aim of the grant is to enhance Dataverse to address the needs of a broad range of researchers in Canada through improved scalability, improved integrations with Canadian cloud storage and authentication providers, and better support for data curation workflows. These three areas of development are described further below and will be discussed in more detail in future blog posts. 


Scalability

The goals of the first leg of the project include:

  • Optimize system architecture for scalable use
  • Connect to existing Canadian cloud data storage environments
  • Support large files in upload/download contexts


Planned deliverables:

  • Develop and test connections to SWIFT object storage, such as the Ontario Library Research Cloud (OLRC)
  • Support Globus endpoints with file access mediated outside of Dataverse application
  • Develop large-file upload utility to support deposit of larger file sizes (2GB+) into Dataverse

Authentication

The goals of the second leg of the project include:

  • Integrate with Canadian authentication infrastructure

  • Streamline login workflows

Planned deliverables:

  • Integrate Dataverse with CAF Shibboleth Login for single-sign on
  • Investigate further integration with ORCID to support linking research outputs


Data Curation

The goals of the third leg of the project include:

    • Enhance multi-disciplinary support for data curation
    • Enable users to adopt metadata standards and best practices

Planned deliverables:

  • Data Curation Tool, a modular application integrated within Dataverse that would allow users to create and edit variable-level metadata of tabular data files to aid in data re-usability

Status update

Our Project Timeline & Deliverables roadmap is included below. We have completed our first two deliverables and are currently working on the third.

For our first deliverable to connect Dataverse with Swift as the primary storage service, we stood up a test instance of Dataverse connected to the OLRC. The SP team tested upload and download functionality, as well as the integrity of files stored, with a variety of file types and sizes, along with other functionalities core to Dataverse. The idea behind this type of configuration would allow us to more easily scale the system, add storage resources, and run the platform more optimally.

We have also successfully configured Dataverse to work with Shibboleth for single sign-on using the University of Toronto as the test case. We are now initiating a pilot project with interested institutions to test out new sign-up and login workflows. More details to come in another blog post.

Currently, we are working on completing our third deliverable - developing the Data Curation Tool. We presented the DCT prototype at NADDI (link to slides) and at the Dataverse Community Meeting (link to slides). Feel free to test out the Data Curation Tool - Prototype and stay tuned for a future blog post describing the development of this tool.

In the fall, we will start to focus on the large-file support and storage connection pieces of the project.

We will be sharing more details about these deliverables and details about the development work in upcoming blog posts! If you have any comments or suggestions, please feel free to contact us at dataverse@scholarsportal.info. If you would like to see all the updates and have a Spotdocs account, click the "Watch this blog" button on the top right corner of the page to receive notifications.


Project Timeline & Deliverables


The Scholars Portal Dataverse is a repository for research data collected by individuals and organizations affiliated with Ontario universities. It is open to anyone in the world to deposit, and has over 78 Dataverses, with over 500 studies deposited to date. 

To update our existing infrastructure, we are excited to announce that we will be upgrading to Dataverse 4 over the summer and fall 2016. The upgrade will improve the overall look and feel of the Scholars Portal Dataverse, and offer new features for data management.

Fall: Release

As of September 30th 2016, Scholars Portal will be releasing a new version of Dataverse. The service will be down from September 26th to 30th in order to properly upgrade and test the new system.

If you have questions about this upgrade, please contact dataverse@scholarsportal.info