How can knowledge graphs and LLMs bring more value for drug development?

LLMs and generative AI are impacting the life sciences industry’s evolution and pushing the boundaries on what’s possible, especially when it comes to how users work and interact with a knowledge graph.

26 June 2023 7 minutes

The life sciences industry generates an enormous amount of data, ranging from genomics and proteomics to clinical trials, drug interactions, and more. Efficiently connecting, managing, and making sense of this data is crucial for all professionals working within the field. Knowledge graphs have emerged as a powerful tool for organizing and analyzing complex data relationships. With recent advancements in large language models (LLMs) and generative AI, working with knowledge graphs in the life sciences industry could become more efficient and accessible for users of all backgrounds, ultimately driving value throughout the drug development life cycle. 

As the provider of a data platform operating on knowledge graph technology for the life sciences industry, ONTOFORCE is excited about the potential of leveraging LLMs and generative AI when working with knowledge graphs. We foresee these technologies impacting the life sciences industry’s evolution and pushing the boundaries on what’s possible, especially when it comes to how users work and interact with a knowledge graph. In this article we’ll discuss the use of LLMs and knowledge graphs for the life sciences industry and how we anticipate their use evolving as the capabilities of LLMs and generative AI grow.  

Large language models for life sciences 

A large language model is a type of artificial intelligence (AI) algorithm that uses deep learning techniques and large data sets like websites, articles, and books to understand, summarize, generate, and predict new content.  

OpenAI’s ChatGPT is perhaps currently the most well-known LLM. It’s gained popularity by being able to generate human-like text and its ability to be used for a variety of tasks, such as answering questions, translating languages, writing essays, and more. It’s thus known as a generative AI system. LLMs are just one type of generative AI.  

Specific LLMs for the life sciences industry are starting to also gain popularity. The Stanford Center for Research on Foundation Models (CRFM) trained a 2.7 billion parameter GPT on biomedical data from PubMed to create BioMedLM. DRAGON is another biomedical language model, also released by the CRFM team in a separate effort. The model is pre-trained from PubMed abstracts and an expert-curated biomedical knowledge graph. 

Microsoft’s BioGPT is another domain specific LLM. BioGPT is trained on millions of previously published biomedical research articles, making it able to perform tasks such as answering questions, extracting relevant data, and generating text relevant to biomedical literature. 

How do knowledge graphs and LLMs work together? 

For the life sciences industry, generative AI and LLMs (domain specific and not) alone already have many applications areas across the entire drug development timeline and are becoming more and more common. LLMs are especially impactful for prediction tasks, such as predicting the properties and interactions of molecules or predicting potential safety concerns during the early stages of drug discovery, and with optimization tasks, such as supporting researchers to conduct more efficient literature reviews.  

Knowledge graphs on their own are also an essential tool for life sciences companies. They support organizations in organizing, analyzing, and deriving insights from complex data. On top of this, knowledge graphs enable improved data integration, knowledge representation, discovery of hidden relationships, and support decision-making processes across various phases of drug development, from basic research to clinical and regulatory applications. 

One could think that the rise of LLMs indicates that knowledge graphs will sooner or later be made obsolete by LLMs and be, in a sense, replaced by them. When taking a full inventory of what’s possible with each technology, it’s clear that that would not likely be the case, as both systems have advantages and disadvantages, making them better suited for certain applications. For example, while LLMs are powerful at understanding and prediction, they are not designed to store knowledge or to allow this knowledge to be corrected or governed, making them rather slow and inaccurate when it comes to retrieving specific knowledge. Knowledge graphs on the other hand, are reliable for producing (and reproducing) specific, factual results while providing full transparency on how a query returned that result. While neither could replace the other, combining the two technologies opens up the possibility of getting the best of both worlds.  

Here are some examples of what can be accomplished (or what could be accomplished as the technology grows) when leveraging an LLM with knowledge graph within a pharmaceutical organization: 

  1. Question answering: LLMs can be trained to understand natural language queries and retrieve relevant information from knowledge graphs. By combining the capabilities of LLMs with the rich relationships and semantic connections captured in knowledge graphs, pharmaceutical researchers can obtain precise answers to specific questions, enabling faster information retrieval and decision-making.
  2. Data exploration: Knowledge graphs enable exploratory data analysis by capturing relationships and connections between different data elements. LLMs can leverage these connections to provide insights and recommendations based on patterns and correlations found within the graph. This can help identify potential target-drug associations, drug-drug interactions, or identify new research areas based on existing knowledge.
  3. Knowledge curation and expansion: LLMs can assist in the curation and expansion of knowledge graphs. By analyzing large amounts of textual data, LLMs can extract relevant information and populate or update the knowledge graph with new entities, relationships, and attributes. This enables the continuous growth and enrichment of the knowledge graph, enhancing its utility in drug discovery and development efforts. 

DISQOVER, the linked data platform ONTOFORCE webinar with Filip Pattyn

LLMs will improve how non-experts use knowledge graphs 

While there are many applications for knowledge graphs and LLMs to be used together, what is perhaps most promising is how LLMs will impact how knowledge graph users interact with them. Namely, we anticipate that non-expert users will be empowered to accomplish more tasks, more efficiently when working with an LLM-augmented knowledge graph.  

"With DISQOVER, one of our main goals is to democratize data access,” says Bérénice Wulbrecht, ONTOFORCE VP of Solution Enablement. “The platform already enables every user - from data lovers, data scientists, to more basic data consumers - to search and explore data on the knowledge graph. We believe that LLMs will enable this ability even further so users can reap even more benefits when working with knowledge graphs.” 

Working with and maintaining a knowledge graph requires certain skills. Depending on the specific knowledge graph technology an enterprise is using and depending on what a user would like to do, the necessary skillset differs from user to user. Some users may only need to extract information from a knowledge graph. In such a case, a more technical skillset is often not needed, assuming the user has the necessary domain expertise, and that the knowledge graph technology offers an intuitive enough user interface. We often refer to these types of users as end users. 

For life sciences companies, end users usually include researchers and scientists who use knowledge graphs to explore relationships between genes, proteins, diseases, drugs, trials, assays, etc. to uncover hidden connections and discover new patterns and correlations. This helps in identifying potential drug candidates, predicting drug-drug interactions, and optimizing the entire drug development pipeline. 

 Data engineers and data scientists are often more advanced, “expert” knowledge graph users who play a critical role in the construction, maintenance, and utilization of a knowledge graph and therefore possess more technical knowledge, in addition to domain expertise. These profiles are often responsible for gathering data from various sources that could be used to populate the knowledge graph, cleaning or converting data for the knowledge graph, ingesting data into the graph, and defining the data model and ontology(ies) for the knowledge graph. 

 Due to a lack of technical expertise, non-expert users (namely end users) often need to rely on expert users to accomplish tasks like data integration into the knowledge graph, adjusting pipelines, advanced data analysis, or building certain dashboards.  Without the proper data in the knowledge graph or without the proper visualizations created, non-expert users might make decisions on partial information. Or, if they wait for assistance from an expert user, this will most likely cause them to operate on a delay, which carries further impact across the business. On the flip side, assisting non-expert users can be time-consuming for advanced users and can take away time from working on other tasks where their advanced skills are also needed.  

With the help of generative AI and LLMs, users who were previously unable to conduct various tasks due to technological barriers could take some of these tasks over. Here are a few examples of what could be possible with an augmented knowledge graph: 

  • A scientist (end user) could upload a patient's medical history or complementary information (that is not yet part of the knowledge graph) to the AI system and have it automatically link that to and introduce it in the knowledge graph. This could allow a user to potentially extend the graph with relevant information that data scientists may not have thought of yet.  
  • A researcher (end user) would like to validate the hypothesis that some protein kinases are associated with a disease. He searches for relevant proteins and requests to the knowledge graph platform with AI enhancement to extract the data and run a network analysis (shortest path, causal network analysis) to validate potential connections to the disease, enabling this end user to run advanced analytics on his own, without the support of an advanced user. 

These examples reflect the crux of what an AI-augmented knowledge graph can do for knowledge graph users at the enterprise level:  non-expert users accomplishing tasks faster without depending on others, in turn allowing advanced users to focus on more value-added and complex tasks. All in all, improving resource management and driving a faster time to value for life sciences companies.  

Accelerating drug development with an augmented knowledge graph 

An augmented knowledge graph enabling a faster time to value is what really excites us here at ONTOFORCE. Why? ONTOFORCE's mission is to help life sciences companies accelerate drug development for improved patient outcomes by championing the power of data. With the ability to automate and expedite processes in place thanks to AI, we envision a future in which all knowledge graph users are empowered to efficiently and effectively access what they need, right when they need it. 

“Self-service is a major tenant underpinning the DISQOVER platform” says Martin Robbins, Head of Product at ONTOFORCE. “With AI, and LLMs in particular, enabling the next level of self-service for knowledge graph users, we see this creating huge potential for companies, improving data management practices, and marking a major shift in how data is accessed and utilized.” 

Automating knowledge graph configuration, expediting dashboard creation, and accelerating data ingestion, among other things, enables the necessary information to be actionable faster. In the life sciences industry, every instance of time-savings and increments of improvement on decision-making can make all the difference for patients.  

ONTOFORCE is committed to playing a role in driving the evolution of knowledge graphs augmented by AI, especially as that relates to accelerating drug development to drive faster and better treatments for improved patient outcomes. 

Webinar Improving clinical trial design with natural language processing and large language models  ONTOFORCE DISQOVER-1

Interested in learning more about how to leverage AI and knowledge graphs?

Check out our recent webinar on improving clinical trial design with NLP, LLMs, and knowledge graphs. Our experts shed light on how these new technologies can revolutionize the way clinical trials are designed, making the process more efficient, accurate, and patient-centric.

Watch the recording here >>>