Release Notes DISQOVER 6.00

Release Notes DISQOVER 6.00

7 July 2020 | version 6.00

This document contains the release notes of the DISQOVER version 6.00 and instructions to upgrade. Please make sure you have read these before updating or installing this new release.

1 Introduction

The strategic vision for DISQOVER 6.00 product development relies on 6ix pillars: Superior UI/UX, Interactive Search, Links & Visual Analytics, Highly Customizable, High quality Public Data Integration, Data Ingestion Pipelines and Plugin Architecture.

Superior UI/UX is key to DISQOVER. We aim to provide a general purpose tool with an expert layer that is attractive and engaging, interactive and near-real-time. A continuously improving user interface and more intuitive data integration further contributes to a seamless user experience and a growing number of solvable use cases.

By enabling interactive search, links & visual analytics DISQOVER facilitates an interactive data journeys that rely on semantic and linked data concepts, but shield complexity. Views and navigational journeys are highly customizable to the end user’s expectations and use cases. With the self-service customization from the UI, users can prepare views for their peers.

We envision DISQOVER not as a standalone tool, but rather as a part of a customer’s solutions ecosystem. The plugin architecture enables customers the possibility to extend the product with custom functionality, possibly integrating 3rd party tools.

To allow users a flexible integration with proprietary data, DISQOVER offers a comprehensive integrated and linked data set for high quality public data integration. With a data ingestion pipeline, we enable self-service data ingestion with provenance attribution, focus on advanced QC and the possibility of multi-tier data architecture.

NEW FEATURES:

  • The Customizable Search Page can be tailored as a start page to initiate different types of searches and configured with widgets, lay-out and text elements.
  • Three new dashboard data visualizations are added, driven by several use cases. The Multilevel Choropleth and Dot Distribution Map are new interactive geographical visualizations. A Parallel Coordinate Plot shows the values of instances for multiple numerical facets.
  • Individual instances retrieved via different search paths can be stored and managed in a They can be shared with other users and used for filtering actions in search activities. Users can collaborate on research via stored collections of search results. Collections can then be used to filter dashboards.
  • When adding or updating data to a stable ingestion pipeline, Incremental Data Ingestion is now available as an alternative of a full re-run.
  • Remote Data Subscription extends the potential of DISQOVER federation and allows to exchange data, harmonized and prepared for direct ingestion in DISQOVER setups.
  • Grouping components in Pipeline Segments are an aid to manage extensive data ingestion pipelines.
  • External data access or manipulation plugins can be added as a custom component in data ingestion pipelines.
  • A new Authorization model provides more granularity and more traceability to control access to functionalities and data.

IMPROVEMENTS:

  • Keyword highlighting:
    – A summary of highlighted keyword hits is shown at the top of an instance detailed view.
    – Synonym keyword hits are highlighted in the instance.
  • Instance properties:
    – Property names in instance list and instance detail view can be configured via the editor.
    – Property descriptions are accessible via the information button in the instance detail view
  • Subinstances can contain links to other instances.
  • Separator lines and static text boxes can be added to dashboards and the newly added Customizable Search Page.
  • Data provenance of filtering or analytical widgets can be shown via the information button.
  • Analytical graphs can be individually restricted to visualize only filtered data.
  • Extended interactivity with scatterplot visualizations allow to set the axis range and connecting dots.
  • Search Extension:
    – Data type names are displayed together with the data type icons in the synonym extension panel.
    – The search path visualizes the use of extended synonym search.
  • Data Ingestion Engine:
    – Increased number of component options are converted from text boxes in drop-down boxes.
    – Increased number of component options use suggestions via drop-down boxes.
    – Improved information display such as the class circle on top of a component.
    – Possibility to browse for files in the ingestion folder on the DISQOVER server via the data source definition and importer components.
    – Improved description of options and execution warnings.
    – Possibility to copy multiple selected components.
    – New importer capable of handling Excel file formats.
    – Possibility to infer multiple predicates in one component.
    – New function to convert a JSON object into a string. 
  • The upper limitation of being able to follow links of the first 4 000 instance results is increased to the first 20 000 instance results. This option is configurable via the admin console.
  • Easier access to local log files for admin users.

Key features

1.1    Customizable Search Page

The Customizable Search Page can be used to direct first time users to predefined use cases and more experienced users to the full knowledge graph. Buttons can be added to open a predefined search template potentially built as a combination of filtering and linking steps to solve a specific use case or to open another search page (see Figure 1). Advanced users can benefit from using a visualization of the full ontology graph which automatically restricts to the search terms added on the search bar (see Figure 2). Direct links to data types via clickable tiles can be extended with tiles to directly link to a preset dashboard.   In addition, user views can now be fully managed via the admin console. This eases the configuration of customizable search pages for specific user personas.

Figure 1: Search page with predefined links to find domain experts via different routes. On the top right is a button to direct to another search page.

 

 Figure 2: Visualization of subgraph of the complete ontology graph restricted via keyword search.

1.2    Extended Analytics

The Multilevel Choropleth and Dot Distribution Map are new interactive geographical visualizations. The former shows counts or numerical values for different geographical levels and the latter shows the location of instances on a map, and can be colored or resized according to other instance property values (see Figures 3 and 4).

A Parallel Coordinate Plot shows the values of instances for multiple numerical facets (see Figure 5).

Figure 3: Choropleths of Europe and the United States of America.

 

Figure 4: Dot Distribution Map with coloring and the option to filter for occurrences in a defined area.

 

Figure 5: Parallel Coordinate Plot visualizing the connections between three properties of chemicals for a set of instances.

1.3    Incremental Data Ingestion

Instead of the default full re-run, an incremental pipeline run only takes into consideration data that has changed since the last run. This performance improvement allows to configure DISQOVER for nearly real-time updating. Additional to the performance gain, there is no downtime for an end-user using the platform during an incremental run.

Figure 6: Example of performance improvements during incremental data ingestion of a dataset with 26 million gene entries. The updated version is extended with 500 new annotations to existing genes. In a predefined deployment, a default full ingestion of such a dataset takes 90 minutes while incremental data ingestion only takes 3 minutes.

1.4    Remote Data Subscription

Remote Data Subscription extends the potential of DISQOVER federation and allows one to exchange data, harmonized and prepared for direct ingestion in DISQOVER setups.

When data is ingested locally via Remote Data Subscription, all DISQOVER features work equally well with all data. This lifts the limitation of a federation setup with mixed private / public data. In addition, no search terms leave the customer LAN and thus mitigates potential information security issues. Data scientists can control how and when data is integrated and can validate updates prior to production usage (see Figure 7).

It is possible to create different DISQOVER deployments for entities within an organization or a consortium of organizations where each entity has responsibility over their own data. The ingested data of those deployments can then in turn be combined across deployments (see Figure 8).

Figure 7: Data workflow between two DISQOVER setups where setup A is a data publisher and setup B is a data consumer in a Remote Data Subscription configuration.

 

Figure 8: Potential data workflow in a multiple DISQOVER setup.

1.5    Segments

Segments allow a user to structure complex data ingestion pipelines by grouping components together. This helps to retain an overview of large pipelines (see Figure 9).

Figure 9: Aa data ingestion pipeline with two segments.

3 Upgrade Instructions

Upgrading DISQOVER from version 5.20.x to version 6.00 can be achieved by updating the disqover-installer package through yum or apt, and running the disqover-update command.

To upgrade from an earlier version, you potentially need to run additional commands or to upgrade first to version 5.20.x. DISQOVER deployments configured with an external PostgreSQL host or with custom Docker containers, are required to follow additional guidelines.  More details are available in Chapter 3 of the DISQOVER 6.00 System Administrator Manual or via support at support@ontoforce.com.

Try the free Community Edition or upgrade to DISQOVER 6.00 Enterprise

Experience the DISQOVER 6.00 Community Edition right now:

  • Create a free account
  • Enjoy unlimited action to public data
  • Access ~150 data sources
  • Create your own dashboards and share them with peers

Contact us to unlock the full DISQOVER experience with the ability to link internal and third-party data sources to create a truly data ecosystem. 

Try the free Community Edition