ONTOFORCE at the Bio-IT World 2017 FAIR Hackathon
In May 2017, Bio-IT World hosted its first hackathon on FAIR data. As ONTOFORCE, we helped organize the event and devoted a team to the competition on ‘Aligning a dataset to FAIR principles’. Here’s our report.
FAIR data is gaining ground across different scientific fields. But what does it mean and how will it affect your business.?
FAIR is a set of guidelines to make data Findable, Accessible, Interoperable and Reusable. The FAIR principles emerged in January 2014, out of a Lorentz Workshop held in Leiden, the Netherlands. FAIR is neither a norm nor a format. It is a set of recommendations to easily search, find, integrate and reanalyze datasets. The objective? To assist humans and machines in their searches.
- Findable : A dataset (i) contains meta-data, (ii) is referenced with global, unique, persistent identifiers and (iii) is in a searchable resource.
- Accessible : A protocol is available to access data and meta-data, with or without authentication. Meta-data are persistent.
- Interoperable : Data are provided with a common language, with usage of identifiers, ontologies and qualified references.
- Reusable : Data have provenance, license usage and relevant attributes of domain stakeholders.
The scientific community (academic institutions, industries…) have gathered large and precious datasets these last few decades. However, we are having a hard time to fully extract the knowledge and value from this data, as we‘re not always aware of its existence, or we don’t really know how to access and / or process it, and / or in which context it could be replaced.
FAIR guidelines provide everyone with a set of rules to transform existing datasets or produce new ones, in such a way that extraction, linkage, integration and reanalysis of such data is facilitated.
The hackaton in action
ONTOFORCE co-organized the Bio-IT World FAIR Hackathon 2017 along with the Cambridge Healthtech Institute and the Dutch Techcenter for Life Sciences. More than 170 data scientists, developers and domain experts subscribed to the event. After an introduction to the FAIR principles, the different teams started to work on a dataset of their choice with the aim of aligning it to the FAIR principles.
The ONTOFORCE team toiled on the pediatric oncology dataset from the pediatric portal of Foundation Medicine. More specifically, we looked into gene variant identification in pediatric cancer samples. The team worked on leveraging the dataset’s FAIR level: standardizing formats, assigning unique identifiers, adding references to standards as well as ontologies and loading the dataset in DISQOVER, all in just one day.
The progress of the different teams was evaluated by a jury of experts: Tom Plasterer (US Cross-Science Director, R&D Information at AstraZeneca), Helena F. Deus (Director of Disruptive Technologies at Elsevier), Myles Axton (Chief Editor of Nature Genetics), and Filip Pattyn (Scientific Lead & Product Manager at ONTOFORCE).
The work done gained the ONTOFORCE team the second prize. The first prize was given to the team that worked on ClinVar – a renowned database of clinical variants, developed at NCBI.
Good to know: as a user, you will soon be able to access this data in our public DISQOVER platform.
Keep up the good work
A core activity at ONTOFORCE is to continually import more new data into DISQOVER by…
- Converting data into a standard format
- Assigning and linking out to globally unique identifiers
- Capturing meta-data
As such, we are – on a daily basis – applying the FAIR guidelines on a very wide variety of datasets. ONTOFORCE fully supports the FAIR principles, as the data integration process becomes more efficient and the potential for knowledge extraction is maximized.
So next time you produce or transform a dataset, make sure to play it FAIR.
Discover more FAIR data in DISQOVER now!
The hackaton winners
Pictures of the hackaton by Bio-IT World.