Data Ingestion Engine

Data Ingestion Engine

DISQOVER is equipped with a powerful and ultra-fast data ingestion engine. As a data scientist you can import, manipulate, link and integrate your own data into DISQOVER, using an innovative visual pipeline environment.

Building the indexed knowledge graph

DISQOVER stores data in an indexed knowledge graph and imports the source data via a configurable data ingestion process. During this process, you can standardize, integrate and link data from a variety of siloed sources.

Disqover Indexed Knowledge graph illustration


Visual pipeline building

When you are configuring the integration of your source data in DISQOVER, you can manage the data-ingestion process by building a visual pipeline, using a wide range of powerful reusable components. There is no need to write extensive code, which means that, compared to a conventional approach relying on RDF SPARQL, fewer  specialized skills are needed and development time is reduced, while retaining the same level of power and flexibility.

A visual pipeline makes it easier to communicate the choices made during data integration, resulting in increased transparency and auditability, and fewer chance of error. Stakeholders with only basic IT knowledge can understand, review, challenge and contribute to the data integration process.

Disqover pipeline data integration illustration


Ultra-fast, scalable and efficient

DISQOVER’s data ingestion engine uses a unique proprietary technology to efficiently process extensively linked big data, relying on a partially denormalized triple store using column-oriented storage. The engine is designed for ultra-fast bulk-linking and inferencing. Each action is executed as a sequence of full table scans, leveraging fast-block sequential I/O and temporary in-memory indexes. As a result, on equivalent hardware, DISQOVER can integrate and link data into a semantic knowledge graph much faster than conventional technology, such as relational databases or triple stores.

“Example performance comparison: Linking 18 million authors to 28 million publications, with 280 million author/publication links (hardware: Intel® Core™ i9 6 cores, 32GB RAM).”

Ultra-fast, scalable and efficient

Bi-directional lineage analysis

The data ingestion engine  is capable of tracking data dependencies throughout the entire pipeline. Thanks to this, you can see what source data field(s) contributed to every information field in the DISQOVER database. Conversely, you can also see every source data field that DISQOVER is contributing to.


Disqover pipeline depanal Illustration

Read on

Read more about the other technologies of DISQOVER. Next, we will talk about integrating public data via federation.