Try DISQOVER

FAIR data management and DISQOVERability

Posted by Ontoforce Team on Mar 4, 2022 1:15:39 PM

This is a guest blog post by Maarten Coonen, Data Architect @ DataHub, Maastricht UMC+, and Maastricht University about FAIR data management.

At DataHub Maastricht, we are providing data management services to research groups in both the Maastricht University Medical Center and the life sciences faculty of Maastricht University. Our role is that of a data broker who enables data reuse by researchers in the hospital, the university, and beyond. Our solution is currently (June 2018) serving a research community of approximately 170 users and managing 48 TiB of data.

We work according to the FAIR principles to make the data available to all stakeholders in the most optimal way (Findable, Accessible, Interoperable, Reusable).

Concerning our DataHub implementation, this entails a series of actions:

  • Each data set registered and stored in iRODS is given a unique and persistent [1] identifier (PID) [FINDABLE]

  • Metadata is structured and enriched with knowledge from ontologies using EBI’s Ontology Lookup Service (OLS) [FINDABLE + INTEROPERABLE + REUSABLE]

  • The metadata is registered in iRODS and indexed in DISQOVER [FINDABLE]

  • Data sets can be retrieved by their PID and metadata via an HTTP landing page. Metadata stay accessible, even when the data have been deleted [ACCESSIBLE]



“It’s key that data sets are both human and machine-readable.”

 

 

A LINKED DATA CLOUD

Performing these actions enables our data sets to be part of a massive decentralized linked data cloud. The DISQOVER technology is used to traverse this cloud, in our case comprising:

  • Research project data in iRODS

  • Multiple on-premise research databases

  • Electronic Medical Records databases

  • Over 140 public data sources


  •  

The data from public sources, in fact, comes with DISQOVER: the semantic knowledge platform that we use to search through all the data. Coupling in-house data with public data sources via DISQOVER’s data federation is crucial here, as it greatly extends our view on the data. With DISQOVER, it becomes possible to simultaneously aggregate results from data residing at public and private sources that otherwise would have to be collected or searched separately, thereby improving end-user efficiency. DISQOVER makes it possible to bring semantic searching to a wide research community. Key herein is the user-friendliness and the intuitive user interface that DISQOVER provides.

HOW IT ALL WORKS

The data process workflow consists of five major steps:

  1. Data and metadata, captured in various source systems, are initially centralized and managed in iRODS;

  2. All metadata passes through a staging environment and is semantified in a series of Extraction Transformation and Loading (ETL) steps;

  3. Semantified metadata (RDF) is loaded in DISQOVER

  4. End-users use the DISQOVER front-end to search for the most interesting insights and data sets.

  5. Via a persistent identifier link out (ePIC handle.net system) to a landing page, the actual dataset can be downloaded via the iRODS cloud browser or a WebDAV connection.

 

EXPANDING OUR SEARCH

Datahub

Systems like DISQOVER generate their greatest impact through reach and accessibility. Within Maastricht UMC+, the following groups are already actively using DISQOVER:

And this continues to expand. All Maastricht University and University Maastricht Medical Center staff can simply access our local DISQOVER instance online. After logging in with their institutional account through SURFconext, they can start their discovery through the public and in-house data sets. Of course, access to some data is limited by the user’s authorization level.

 

Want to know more about how we use DISQOVER within DataHub? Do reach out to me.

ABOUT DATAHUB

DataHub is a cross-organizational initiative within Maastricht UMC+ to help researchers from both the academic hospital and the university. We provide an institutional repository for research data, that is more than just a data archive. We continuously improve our services in order to provide added value to researchers who want to do more with their data. https://datahub.mumc.maastrichtuniversity.nl/

ABOUT FAIR

The FAIR principles (Findable, Accessible, Interoperable, Reusable) are a set of 15 principles that form a guideline for proper research data management and data stewardship. Originating from a Netherlands-based workshop in 2014, these principles have now gained more and more interest from researchers, publishers, funding bodies, and government agencies worldwide. A key aspect of the FAIR principles is to make human and machine-readable representations of data sets in order to achieve semantic interoperability.

ABOUT iRODS

iRODS stands for ‛Integrated Rule-Oriented Data System’. It is open-source data management software that links unstructured data to metadata and is used for distributed storage and data management automation.

 

[1] persistent identifier (PI or PID) is a long-lasting reference to a document, file, web page, or other objects.

#FAIR Data

WHITEPAPER

DATA-DRIVEN CLINICAL TRIAL FEASIBILITY & STUDY DESIGN

How a new approach to data management can boost efficiency and lead to better decision-making in drug development.

Download
Book-img-new

Are you ready to discover the insights in your data?

Let's chat
Clinical studies
Biotech
Oncology