Lessons from Boehringer Ingelheim’s journey toward strategic clinical data reuse ONTOFORCE

BLOG

Maximizing the full potential of clinical data: lessons from Boehringer Ingelheim

Lessons from Boehringer Ingelheim’s journey toward strategic clinical data reuse.

ONTOFORCE team
27 May 2025 4 minutes

Life sciences organizations conducting clinical trials accumulate vast amounts of data over time, with each clinical trial generating significant quantities of patient data, observations, and findings. Historically, this data often remained siloed, underutilized beyond the initial reporting requirements. Now, forward-thinking companies recognize the untapped potential within their historical clinical trial datasets. By effectively leveraging this legacy data, organizations can significantly enhance their research capabilities, accelerate scientific discovery, and optimize resource utilization.

ONTOFORCE invited Karsten Quast, Data Domain Owner at Boehringer Ingelheim, to speak at our recent webinar focused on reusing clinical data and cohort building. As Boehringer Ingelheim will soon be operational with a platform powered by ONTOFORCE’s DISQOVER to drive clinical data reuse and cohort building, we wanted to learn more about:

  • Their approach to reusing clinical data to build cohorts
  • The value of creating a holistic overview of available legacy clinical data
  • Which IT investments are necessary to make these processes not only possible, but efficient

During the webinar, Karsten highlighted Boehringer Ingelheim’s ReSource project. The main objective of the project is to harmonize vast volumes of historical clinical trial data dating back to 1998. The ReSource project was thus launched with aims to:

  • Extract data from Boehringer’s clinical trial data management systems and legacy data storage
  • Reconcile and harmonize the most valuable data
  • Provide a cohort selection user interface
  • Create data products for subsequent analysis

Based on the insight Karsten shared during the webinar related to the ReSource project, in this blog we expand on a few lessons learned when it comes to reusing clinical data.

Lesson 1: Not all legacy clinical data is valuable

The volume of clinical trial data continues to grow exponentially. Technological advancements, along with more comprehensive trial designs, contribute to increasingly complex and voluminous data sets. Data from clinical trials now frequently encompass detailed patient demographics, vital sign data, treatment data and patient-reported outcomes, various efficacy measures, imaging results, omics data, and more.

Today, it’s estimated that Phase II and III protocols involve an average of 263 procedures per patient, supporting approximately 20 distinct endpoints, reflecting the expanding scope of what trials aim to measure and analyze. As a result, Phase III trials now produce an average of 3.6 million data points, a threefold increase compared to late-stage trials conducted just a decade ago.

With these massive amounts of data points, it’s no surprise that when it comes to reusing clinical data, not all data is of relevance or value. To effectively leverage legacy clinical data, consideration should be given into which data is most relevant and valuable for the specific reuse applications. For example, administrative or logistical metadata, such as timestamps for trial monitoring visits, are essential during the active trial management phase. However, they typically hold little analytical value for subsequent analyses aimed at therapeutic insights, cohort selection, or predictive modeling.

As Karsten pointed out, it’s best to avoid the issue of becoming overwhelmed by the efforts to harmonize and reconcile all legacy clinical data. Within the context of Boehringer’s ReSource project, this means that only certain data is included for reuse. Specifically, the project focuses on various Study Data Tabulation Model (SDTM) domains, such as demographics, adverse events, lab results, or concomitant drugs.

Data ingestion and harmonization can be time and resource-intensive processes. Focusing only on high-value data helps to ensure that neither are wasted. To further address the challenges of ingesting and harmonizing complex and voluminous legacy clinical data, DISQOVER provides an out-of-the-box SDTM pipeline, enabling users to seamlessly ingest their own SDTM-formatted data. With this feature, researchers can quickly ingest legacy datasets into the DISQOVER platform to accelerate the process of generating actionable insights from previously underutilized clinical trial information.

Lesson 2: Clinical data reuse can power multiple applications and directions

Reusing legacy clinical trial data unlocks a vast reservoir of untapped knowledge that can drive innovation and efficiency across pharmaceutical research and development. When effectively utilized, this historical data not only streamlines research processes but also enhances the accuracy and depth of scientific inquiries. By minimizing redundant efforts and rapidly enabling precise cohort selections, clinical data reuse empowers researchers to test hypotheses more efficiently, design more robust clinical trials, or uncover hidden insights that inform new therapeutic strategies.

To tap into their underlying reservoir of knowledge, Boehringer’s ReSource project focuses on leveraging the full potential of historical data, supporting evidence generation through multiple angles, including:

  • Gaining deeper understanding of diseases and being able to better characterize patients regarding disease etiology, diagnosis, and prognosis.
  • Supporting pre-clinical development to obtain a better understanding of translatability of animal models into the human situation.
  • Better understanding therapies and finding new therapeutic concepts.
  • Support developing new therapeutic and diagnostic products.

During the webinar, Karsten highlighted these use cases, which emphasize the power of clinical data reuse in fostering new pathways to scientific insight. By unlocking previously underutilized historical datasets, researchers can approach scientific questions from multiple directions, significantly enhancing their ability to innovate, refine therapies, and accelerate impactful research outcomes.

Lesson 3: A user interface should encourage legacy data exploration and understanding

For Boehringer, unlocking the full potential of clinical data requires easy access to the data, along with the ability to exploratively view it. As Kasten mentioned, the ReSource project is focused on:

Making existing clinical trial data easily and quickly available for reuse

and

Providing a holistic exploratory view of this data, in compliance with regulatory, legal, and ethical requirement.

Thanks to DISQOVER’s intuitive user interface (UI) with powerful data visualizations and interactions, achieving these goals is straightforward. No prior knowledge of previous trials or extensive data science experience is needed to extract value from the platform. DISQOVER enables users to intuitively navigate all existing clinical data ingested in the platform, effortlessly defining and refining cohort criteria through dynamic visualizations and filtering options. By presenting data clearly and interactively, users feel encouraged and empowered to explore the legacy data thoroughly, leading to deeper insights and more informed decisions.

Boehringer Ingelheim’s journey toward strategic clinical data reuse

Boehringer Ingelheim webinar ONTOFORCE

You can learn more about Boehringer Ingelheim’s journey toward strategic clinical data reuse and their ReSource project by watching the webinar recording. Specifically, learn more about how data is harmonized, how Boehringer ensures rigorous compliance and data protection when reusing data, and the dynamic user interface powered by DISQOVER. Watch the recording now >>>

With their comprehensive ReSource project, Boehringer Ingelheim exemplifies how pharmaceutical organizations can maximize the immense potential of legacy clinical data, transforming previously siloed datasets into powerful resources driving therapeutic innovation and breakthroughs.