Bringing down the virtual walls around hospital data

Integrating an organization’s internal data is challenging. It requires strategic vision by the organization’s leadership and an open mindset on different levels. In our experience, establishing such a solution has a greater chance of success if stakeholders quickly and easily notice the benefits. Small pragmatic initiatives with few data sources that solve a limited number of use-cases, can demonstrate how things can improve for the whole organization.

The power lies in using data structuring and standardization that can be initiated anywhere in the organization. If the basic ideas of the semantic web and linked data are followed, these independent initiatives can be merged together into an internal, overall, linked data information network with the potential of linking to public data sources.

A hospital setting

Take the example of a hospital. Patient information there is stored in Electronic Health Records (EHR) or Electronic Medical Records (EMR). Ideally, this should contain or be linked to a patient’s medical history, medication, allergy and immunological status, billing information, biosamples (stored in biobanks), lab test results, radiology images, personal and administrative information, health insurance details and, last but not least, the status of consents given by that patient (or the patient’s representatives). After all, patients have the right to access their own data and some of them may want to share (parts of) that data with others in order to help research initiatives. Connecting this information, however, is not straightforward within the currently used EHR solutions.


Scattered data

In reality, most of this patient data is scattered across several hospital databases and applications that aren’t interacting or exchanging information. While most of this data is an integral part of the critical hospital workflows, changing these workflows requires changing the mindset and introducing new software and technologies. No small feat, as there’s no generally accepted data exchange format for EHR data that is able to deal with the requirements needed for easy semantic integration and data linking. The currently used Health Level 7 (HL7) data exchange format hasn’t changed in years despite the fact that it has several flaws. New initiatives like Fast Healthcare Interoperability Resources (FHIR) are promising but these haven’t become official standards yet (1).

But more is possible! At ONTOFORCE, we use a more pragmatic approach of data dumps or data connectors to transform data into a more generalized format with annotated metadata. By using the Resource Description Framework (RDF) data model, data becomes immediately ready for data linking and semantic web technologies (2).

Screen shot 1

A series of pilot setups

We are currently evaluating a series of pilot setups in hospitals where our semantic data search and integration platform DISQOVER is used to link patient data from an EHR system to other internal sources, like a lab test system or a biobanking application. Note that the complete setup is driven by an informed consent management system. Here’s how it works.

DISQOVER is installed on a local server where user security roles can be applied on different levels. This allows of refined access per type of role (e.g. doctor, researcher, …) to specific parts of the data or relationships between data. Also, the installation is configured such that federation (3) enriches the local data with data from 120+ public data sources available in the central DISQOVER installation (

We began the project by setting up a small demo environment with mockup data to demonstrate the potential of linked hospital data through DISQOVER. The data exported from an EHR underwent a semantic conversion before being loaded into DISQOVER. Only data of patients who consented to having their biosamples used in studies, was included. In addition, data about the storage of related biosamples as well as the next-generation sequencing experiments conducted on these samples, is integrated. This setting allows solving of a number of use-cases, two of which we explain in more detail below.

Picture 'Doctor' by Hamza Butt via Flickr _ BLOG 2.1

Use-case 1

Imagine you’re a physician at a hospital and you would like to get an overview of all your male patients that were over 60 at the time of a consultation for which you prescribed warfarin. After having selected these patients, you would like to get an overview of all clinical indications for which warfarin was prescribed. Because all indications are translated into SNOMED CT terms, the generally accepted disease classification which is mapped to other disease classifications, DISQOVER can give you, on the spot, the possibility to go to all the relevant literature that is annotated in public data sources as being related to these terms. Similarly, other links to public data directs you to the proteins known to be related biomarkers or drug targets or to clinical studies that mention the indications.

doctor, lab, laboratory, medical, medicine, chemistry, test tube, dna, deney

Use-case 2

Now, imagine you’re a translational science researcher working with patient data in a specific study. The patient IDs you’re using are pseudonyms which cannot be related to the original patient identifiers in the EHR. In contrast, a physician who has a doctor-patient relationship, can see the full patient record.

As a researcher, in DISQOVER you only get the limited clinical information of the patient data that you are authorized to see as well as the related biosamples of that patient from the biobank – again with pseudonymized IDs – and the links to experiments that you have conducted with these samples.

In our demo, we show the possibility to create links to the results of a targeted resequencing experiment on the patient samples. Variants located in specific gene locations are identified by using a standardized annotation for the alleles. This information can be integrated with functional annotation data and links to diseases affected by the variants from ClinVar, dbSNP and other data sources with information about genomic variants.


Patient privacy guaranteed

With DISQOVER, one of our objectives is to streamline the process of making better use of the valuable data stored in EHRs, while fully respecting every patient’s privacy rights. Although privacy and data security shouldn’t be bottlenecks to integrating and linking data, they often still are. By adhering to the right quality standards and privacy principles, the value inside EHR and other related data sources can be unlocked. The results are smarter decisions by physicians and more and new insights from researchers.

The first challenge is usually breaking down the data silos within a specific setting, like a hospital. Challenge two is even greater: bringing down the virtual walls across institutions while transparently managing patient consents. In an ideal world, the data goes to wherever the patient goes. We aren’t there yet, but we are making headway.

Care to know more about these use-cases? Contact us!

Want to check out what DISQOVER can do for you? Check out our free license.


(1) Fast Healthcare Interoperability Resources.

(2) Resource Description Framework.

(3) With ‘federation’, data is kept in its original location, queried locally while results from the various databases are presented in a unified, directly usable manner.

Picture ‘Doctor’ by Hamza Butt via Flickr

Get your free DISQOVER access today and start searching 130+ open databases.