Building on FAIR data: the semantic layer that turns principles into trusted AI

The Pistoia Alliance FAIR for Pharma community recently published a three-part series making the case that Trusted AI is inseparable from Trusted Data, and that Trusted Data means FAIR data. As a sponsor member and active participant in that community, ONTOFORCE contributed to articulating that argument. The series sets out the blueprint with clarity: govern your data, standardize your vocabularies, enrich your metadata, build a collaborative culture. We stand fully behind it.

From the vantage point of deploying knowledge graph technology inside pharmaceutical, biotech, and research organizations for 15 years, we want to add this nuance: it’s the layer that sits between the framework and the outcome, the semantic infrastructure, that makes FAIR principles actually function in the messy, legacy-laden reality of enterprise life sciences data.

Why life sciences organizations struggle to close the FAIR data gap

FAIR adoption in life sciences has made real progress. Governance frameworks are being established. Data stewardship roles are being created. Standards committees are active across most large organizations. And yet, data scientists still report spending the majority of their time wrangling data rather than analyzing it. Researchers in drug discovery still struggle to access what their colleagues in clinical development already know. Safety teams still reconstruct data provenance manually when regulators ask how a model reached its conclusions.

This is not a failure of FAIR as a framework. It is a reflection of how hard it is to bridge the gap between policy and practice without the right infrastructure underneath. A governance mandate that says "hypertension must be defined consistently" cannot, by itself, resolve the fact that one system stores it as HTN, another uses ICD-10 code I10, and a third captured it as free text in a clinical notes field. For FAIR to deliver on its promise, organizations need a layer that actively does the work of reconciliation across every source, in real time, without requiring a full replacement of legacy systems.

How knowledge graphs make FAIR data work in pharma and life sciences

A knowledge graph, grounded in domain ontologies, is what closes that gap. It does not replace the governance frameworks, the stewardship roles, or the metadata standards. Rather, it makes them operational. By sitting above existing systems and harmonizing representations through shared semantics, it resolves synonyms and identifiers, surfaces hidden relationships between genes, compounds, diseases, and clinical outcomes, and makes data discoverable by both humans and machines in a way that no amount of policy documentation can achieve on its own.

The impact on AI trustworthiness is direct and measurable. A benchmark study demonstrated this clearly: combining knowledge graphs with large language models improved accuracy on domain-specific question-answering tasks from 16% to 54%, and accuracy increased to 72% when ontology-based validation checks were added. Structured knowledge reduces hallucinations while fundamentally strengthening the reasoning, improving the explainability of results. That distinction matters enormously in a regulated environment. The question a regulator asks is not just "is this model accurate?" but "can you show us why it reached this conclusion, and what data it relied on?" A knowledge graph with FAIR-aligned metadata makes that answer straightforward rather than forensic.

FAIR data culture in pharma: why infrastructure makes the difference

One of the most important arguments in the article series is the need for a cultural shift, from data as departmental property to data as a shared organizational asset. We have seen this transformation happen, and it is real. But from deployment experience, we would add something the framework alone cannot deliver: culture change sticks when infrastructure makes good data behavior feel easy, not just important.

Researchers do not protect their data because they are inherently territorial. They protect it because sharing it has historically created more friction through reformatting, re-explaining, reconciling conflicting definitions across incompatible systems. When a platform removes that friction, the behavioral shift follows. When discovering and reusing a colleague's dataset becomes as fast and reliable as querying your own, the collaborative data culture that governance programs aspire to builds itself from the bottom up.

A practical roadmap for FAIR data implementation in life sciences

The organizations that make the most progress on FAIR transformation share a common approach: they anchor the investment in a specific, high-value use case usually tied to AI. As the article series mentions, safety signal detection, clinical trial cohort optimization, target-to-indication expansion are all concrete examples. Work backwards from there. Rather than launching enterprise-wide initiatives that stall under their own weight, they start narrow, prove the value visibly, and let success drive the next phase of adoption.

In practice, this means three things:

Choose two or three critical data domains (e.g. targets, compounds, adverse events) and apply ontology alignment and persistent identifiers there first.
Connect those domains through a semantic layer that bridges existing systems rather than replacing them.
Deliver a visible productivity win for researchers, such as a cross-domain query that took days now taking minutes. Let the value speak before asking for the culture change.

More insight into these steps in part three of the series.

Building toward the next generation of AI

The case for FAIR data is compelling today. It becomes even more urgent as AI moves from assisting decisions to operating autonomously through agentic systems that plan, reason, and act across multiple data sources and workflows without continuous human oversight.

Agentic AI raises the bar for data quality. A system navigating a complex drug discovery workflow autonomously will need to assess data provenance on the fly, determine fitness for purpose, integrate sources under strict quality constraints, and produce a transparent rationale for every action it takes. A knowledge graph with rich ontological structure and FAIR-aligned metadata is the architecture that makes this safe and scientifically defensible.

Organizations building strong FAIR foundations today are positioning themselves to move with confidence as AI capabilities continue to evolve. The investment compounds. Every domain made truly interoperable, every dataset enriched with provenance-rich metadata, every researcher empowered to share and discover data without friction: these are the building blocks of an organization that can trust its AI.

This post builds on the Pistoia Alliance FAIR for Pharma three-part series on Trusted AI and FAIR Data. ONTOFORCE is a member of the Pistoia Alliance FAIR for Pharma community.

Building on FAIR data: the semantic layer that turns principles into trusted AI

Why life sciences organizations struggle to close the FAIR data gap

How knowledge graphs make FAIR data work in pharma and life sciences

Building toward the next generation of AI

SOLUTIONS

TECHNOLOGY

RESOURCES

COMPANY