Bring epidemiology data and disease genes in closer contact

Posted by Ontoforce Team on Mar 17, 2022 8:31:48 PM

A blog by Filip Pattyn, Scientific Lead at ONTOFORCE

These discussions can’t arise when a term is chosen from an ontology where a precise definition is added, by default, and the relationships with closely related terms are explained (e.g. ‘type 2 diabetes’ is a more specific form of the general term ‘diabetes’ and has sibling terms like ‘type 1 diabetes’).  Most of us know the difference between type 1 and 2 diabetes but it doesn’t end there if you dive deeper into the ‘diabetes sea’.

Healthcare professionals widely use ‘Systematized Nomenclature of Medicine Clinical Terms’ or SNOMED CT terminology in medical reporting. SNOMED CT is extensive, very extensive, which makes it appropriate to fully describe a medical situation. The major drawback is that you can lose yourself in the multilevel hierarchy of terms. Diabetes mellitus – the official collective term – covers more than 100 different kinds, organized in at least 3 different sublevels in SNOMED CT alone. Same situation if you take a closer look at diabetes in ORDO or ICD10, which have more than 45 rare types and more than 180 diabetes varieties respectively. A hierarchical representation that allows you to select a parent value and all its children, grandchildren, etc. could be part of a solution to the problem.

Searching traditionally


Assume you want to find out what the average age of onset is for all types of diabetes and link this with the known associated genes and variants. Why? Well, you want to investigate the potential of a predictive screening test for diabetes using a relevant gene panel. How do you start? First, capture all diabetes-related diseases by diving into a disease classification like the ones explained above. Orphanet is one of the online data sources that provides epidemiological information for the rare kinds of diabetes. So grab the data from there. You can further complete this for the more common forms of diabetes with the aid of the Genetic Home Reference (GHR). Having this at hand, we proceed to get the genes and related variants known to be involved in diabetes diseases. DisGenet is a good example of a source that can help you to make the link between genes and diseases. It also contains data about disease-variant associations. Stitching this all together won’t be easy because these sources tend to use different disease classifications. In addition, you only get a view on one snapshot in time. After a while, your information is outdated unless you redo the complete process.

Searching semantically


Our Linked Data platform DISQOVER can help to link the different diseases classifications. We aim to bring together and link disease-related data from different sources. Our goal is to create an integrated dataset to cover key aspects of such diseases and to display these in a disease-centric way. You could compare it with Wikipedia but then automated, striving to be more complete, and in sync with the data sources.

Disease-related population and genetics data comes together in DISQOVER. You can browse through extensive classifications and select a subtree of disease terms, sometimes specified in more detail than foreseen. This operation captures the diseases in the subtree and our user interface makes it possible to go to all related data types like genes or variants.



How a new approach to data management can boost efficiency and lead to better decision-making in drug development.


Are you ready to discover the insights in your data?

Let's chat
Clinical studies