Connected Data landscapes: a Self-Service Knowledge Platform at Amgen

One and a half years ago we started to work with Amgen to help them improve the way they search and retrieve research data. They were looking for a solution that could aggregate and interlink their internal research data, enrich it with public data and provide an appealing, user-friendly interface for end-users.

Initially, the focus was on internal data from early research in drug development and, more specifically, gene-centric data. We started with a proof of concept project and our semantic data aggregation, as our linking and search platform DISQOVER was being installed on site.

Inside out, outside in

First, we trained the Information Sciences team at Amgen on how to configure DISQOVER. After that they started to include additional internal data sources increasingly more independently. All of this was done to solve a number of specific use cases based on search questions with internal data.

In a next phase, interest grew at Amgen to link their data to public data. From that moment on, we activated the data federation functionality. This allows them to securely connect to our public DISQOVER version containing an ever-growing number of integrated and linked public data sources. In addition, internal links to other Amgen applications were made which extended the usability of their local DISQOVER installation. They branded this as ‘Gene Knowledge Discovery’ or ‘GKD’. Amgen users can now search for specific data such as internal projects or experiments and link through to other applications for re-analyzing experimental datasets (see Fig. 1).

170524_Bio-IT_World_Amgen_ONTOFORCE_co-presentation_FP_WH Fig 1

Fig. 1: Overview of how a researcher at Amgen can use GKD (Gene Knowledge Discovery) to find research data and links to internal applications and related public data sources.

The Information Sciences team at Amgen supervises the configuration of GKD themselves by managing their DISQOVER integration ontology.  This is used to decide which Canonical Types should be made visible to the users. Each Canonical Type is a collection of similar concepts that are potentially linkable to other Canonical Types. Each data source can contribute data to different Canonical Types that results in aggregated information per concept (see Fig. 2).

170524_Bio-IT_World_Amgen_ONTOFORCE_co-presentation_FP_WH Fig 2

Fig. 2: Start page of GKD with all available Canonical Types represented as tiles. Canonical Types that could contain internal data have a small Amgen logo in the top-left corner. On the bottom right, an example is given: which data sources are aggregated for the human gene ‘PCSK9’?

The purpose of GKD is to provide researchers with a powerful tool to find information fast and efficiently. A search can be envisioned as a data journey from concept A to concept B via a meaningful path, independent of the underlying data source (see Fig. 3).

170524_Bio-IT_World_Amgen_ONTOFORCE_co-presentation_FP_WH Fig 3

Fig. 3: Example of a search path through linked data.

This is possible by using the DISQOVER interface (see Fig. 4, steps 1 to 4) with the possibility to look in detail at your search strategy and to share and reuse a previous search strategy (see Fig. 4, step 5).

170524_Bio-IT_World_Amgen_ONTOFORCE_co-presentation_FP_WH Fig 4

Fig. 4: Screenshots of the DISQOVER data search interface (steps 1 to 4) and a screenshot of the search strategy view.

GKD has been in production now for some time with a steadily growing usage. We have received some great feedback already from the Amgen collaborators (see Fig. 5).

170524_Bio-IT_World_Amgen_ONTOFORCE_co-presentation_FP_WH Fig 5

We are looking forward to continuing this project in close collaboration with a core team of Information Sciences Experts at Amgen supervised by Arun Nayar, Larry Rodriguez, Dan Gschwend and Wolfgang Hoeck. They are in close contact with the end-users and provide us with feedback how to further improve the usability of DISQOVER, both from a functional and data perspective. We, at ONTOFORCE, will continue our efforts to improve the data quality standards of public data sources. Currently, one of the focus points for new functionalities is to improve the search capabilities. For example, via a search expansion with synonyms and the extension, if desired, to a chemical structure search.


These slides were presented at the Bio-IT World Conference 2017 in the track ‘Software Applications & Services’ and are available on SlideShare:

Get your free DISQOVER access today and start searching 130+ open databases.