Public Data Update 1.18

Public Data Update 1.18

19 January 2021 | version 1.18

This document contains the release notes of the Public Data Update 1.18

Public Data Update 1.18

Improvements to clinical studies

The data of the Clinical Study canonical type has been thoroughly updated.

These are the most notable changes:

  • The facet “Study Record Type” can be used to distinguish “Protocols” and “Trials” as individual instances. Here, a protocol refers to the protocol record (Clinical Trial Applications (CTAs), third-country files) for a specific country where the trial was authorized. This information is available via the ‘EU Clinical Trials Register’ (CTR) for European Union Member States as well as for European Economic Area countries. A trial is a clinical study record retrieved from one of the data sources mentioned below and can contain information about one or more protocol records.
  • The data sources clinicaltrials.gov, EudraCT and WHO ICTRP provide the main content of the Clinical Study canonical type and are updated weekly.
  • Clinical studies originating from the WHO have new URI prefixes. The original URI prefix is now only used for studies registered in clinicaltrials.gov. The previous URIs are still available but will be removed in the following data update.
  • Clinical study facets are consolidated across data sources and protocols. These facets are now single-valued for improved filtering and analytics:
    • The categorical facets Study Type, Status, Phase, Observational Model, Intervention Model, Masking, Allocation, Gender
    • The numerical facetsMinimum Age, Maximum Age, Enrolled People
    • The date facets Registration Date, Start Date, Primary Completion Date, Completion Date
  • Improved linking to other canonical types:
    • Clinical Study – Person: better relation type coverage, such as “Study Chair”, “Overall Officer” and “Investigator”.
    • Clinical Study – Organization: unifying “Lead Sponsor” labels for the top 50 sponsors
    • Clinical Study – Active Substance: improved annotation of “Tested Substance” property using natural language processing on unstructured data such as the study title.
    • Clinical Study – Disease: improved annotation of “Condition” property using natural language processing on unstructured data such as the study title.
  • Removal of obsolete facets. These facets are currently empty and will be removed in a next data update:
    • “Verification Date” was omitted in favour of the other date facets that are used more.
    • “Country of Sponsor” only contained limited data and was poorly annotated.

Drugbank data removal

Due to changes in the licensing and terms of use of DrugBank (https://www.drugbank.com/), this data source is no longer part of the DISQOVER public data system (https://www.disqover.com), the DISQOVER Remote Data Subscription (RDS) and DISQOVER federation data services. This results in the following changes:

  • Approximately 7500 unique Active Substance and Chemical instances are removed and several facets and property values and concepts are removed for the remaining instances.
  • Approximately 8000 Patent instances are no longer available.
  • Several links between Active Substance instances and instance of other canonical types are removed.
  • The Active Substance – Protein associations canonical type instance count is reduced by approximately 66% and has empty values for several facets and properties.
  • The Active Substance – Variant associations canonical type is empty and will be removed in a future data update.

If possible, this gap in data availability will be filled in by one or more other publicly available data sources in future data releases.

Changes within other canonical types

  • Cell line – The values of the “Type” facet now link to Uncategorized instances.
  • Person –  Major update of the ORCID data source.
  • Publication –  New hierarchical facet “MeSH classification”.

Changes to the data configuration

The following changes are impacting dashboard builders who use federated public data.

Changes within canonical types

The addition or removal of properties and facets within canonical types require to review all relevant templates and to update the affected templates.

      Clinical StudyDashboard:Existing Clinical Study facets have been made single-valued to improve filtering and analytics.
      PublicationDashboard and instance popout:The new hierarchical facet “MeSH Classification” can be added. It contains Publication MeSH Descriptor links to the MeSH classification tree.

Both the improvements made to Clinical Studies and the removal of the DrugBank data source have resulted in facets and properties that no longer contain data or are duplicated by other facets or properties. These will be removed in a next data update:

Data Changes 1.18

The following facet and property have been removed from the configuration:

  • Patent – The duplicate “Synonym” property has been removed.

 

DISQOVER templates

Using new features of DISQOVER version 6.02, new search page templates were created to improve the user experience on https://www.disqover.com. These templates are available for use.

Changes to the federated data integration

The following changes are impacting a pipeline builder who integrates federated public data. Both the removal of the DrugBank instances and the improvements to the Clinical Study canonical type have an impact on federated pipelines.

 

Active Substance and Chemical instances that only originated from the DrugBank data source have been removed. DrugBank URIs are removed from remaining instances but can be linked using other identifiers such as their PubChem compound identifier.

·        URI scheme of the removed instances:

https://identifiers.org/drugbank/<drugbank_id>.

Patents that only originated from DrugBank are no longer present. DrugBank URIs are removed from remaining instances but can be linked using their EPO identifier.

·        URI scheme of the removed instances:

https://identifiers.org/google.patent/<patent_id>.

·        URI scheme of the EPO patents:

https://ns.ontoforce.com/datasets/epo/patent/<patent_id>

The URI scheme for Clinical Study has changed. The previous URIs are deprecated and are currently still available as alternative URIs. They will be removed in a future data update. The new URI scheme depends on the study record type of the clinical study and on the data source of the study.

·        Deprecated URI schemes:

https://ns.ontoforce.com/datasets/who/<study_id>

https://identifiers.org/clinicaltrials/<study_id>

·        New URI schemes for Protocols:

https://ns.ontoforce.com/instance/clinicaltrials/<NCT_id>

https://ns.ontoforce.com/instance/euclinicaltrials/<EudraCT_id>

https://ns.ontoforce.com/instance/whoclinicaltrials/<study_id>

·        New URI schemes for Trials:

https://ns.ontoforce.com/instance/clinical_study/<study_id>

https://identifiers.org/clinicaltrials/<NCT_id>

https://identifiers.org/euclinicaltrials/<EudraCT_id>

 

header image

Try the free Community Edition or upgrade to DISQOVER 6.10 Enterprise

Experience the DISQOVER 6.10 Community Edition right now:

  • Create a free account
  • Enjoy unlimited action to public data
  • Access ~150 data sources
  • Create your own dashboards and share them with peers

Contact us to unlock the full DISQOVER experience with the ability to link internal and third-party data sources to create a truly data ecosystem. 

New call-to-action