DISQOVER - Data sources

DISQOVER

Data sources

Our knowledge tool DISQOVER connects your internal data with numerous data sources in one easy-to-use platform.

Numerous trusted data sources

FP7 is the short name for the Seventh Framework Programme for Research and Technological Development. This was the EU's main instrument for funding research in Europe, running from 2007 to 2013. FP7 was designed to respond to Europe's employment needs, competitiveness, and quality of life.

The Anatomical Therapeutic Chemical (ATC) classification system, divides active substances into different groups according to the organ or system on which they act and their therapeutic, pharmacological and chemical properties. Drugs are classified in groups at five different levels, Drugs are divided into fourteen main groups (1st level), with pharmacological/therapeutic subgroups (2nd level). The 3rd and 4th levels are chemical/pharmacological/therapeutic subgroups and the 5th level is the chemical substance. The Anatomical Therapeutic Chemical (ATC) classification system and the Defined Daily Dose (DDD) is a tool for exchanging and comparing data on drug use at international, national or local levels.

The Antibody Registry provides identifiers for antibodies used in publications. It lists commercial antibodies from numerous vendors, each assigned with a unique identifier. Unlisted antibodies can be submitted by providing the catalog number and vendor information.

The BRCA Exchange website is a product of the BRCA Challenge of the Global Alliance for Genomics and Health. It provides information on catalogued BRCA1 and BRCA2 genetic variants. By default, it shows variants that have been curated and classified by an international expert panel, the ENIGMA consortium, to assess their pathogenicity (associated disease risk). Optional settings allow the user to look at unclassified variants.

The cellosaurus is a knowledge resource on cell lines. It attempts to describe all cell lines used in biomedical research. Bairoch A. The Cellosaurus, a cell line knowledge resource. J. Biomol. Tech. (2018) 29:25-38 DOI: 10.7171/jbt.18-2902-002, PMID: 2980532

Chemical Entities of Biological Interest (ChEBI) is a freely available dictionary of molecular entities focused on 'small' chemical compounds.

ChEMBL is a database of bioactive drug-like small molecules, it contains 2-D structures, calculated properties (e.g. logP, Molecular Weight, Lipinski Parameters, etc.), and abstracted bioactivities (e.g. binding constants, pharmacology, and ADMET data). The data is abstracted and curated from the primary scientific literature, and cover a significant fraction of the SAR and discovery of modern drugs.

ClinicalTrials.gov provides free access to information on clinical studies for a wide range of diseases and conditions. Studies listed in the database are conducted in 175 countries

ClinVar archives reports of relationships among medically important variants and phenotypes. It records human variation, interpretations of the relationship-specific variations to human health, and supporting evidence for each interpretation. Each ClinVar record (RCV identifier) represents an aggregated view of interpretations of the same variation and condition from one or more submitters. Submissions for individual variation/phenotype combinations (SCV identifier) are also collected and made available separately. This collection references the Record Report, based on RCV accession.

The Cooperative Patent Classification (CPC) is a patent classification system, developed jointly by the European Patent Office (EPO) and the United States Patent and Trademark Office (USPTO). It is based on the previous European classification system (ECLA), which itself was a version of the International Patent Classification (IPC) system. The CPC patent classification system has been used by EPO and USPTO since 1st January 2013.

The Community Research and Development Information Service (CORDIS) is the European Commission's primary source of results from the projects funded by the EU's framework programmes for research and innovation (FP1 to Horizon 2020).

DailyMed provides information about marketed drugs. This information includes FDA labels (package inserts). The Web site provides a standard, comprehensive, up-to-date, look-up and downloads resource of medication content and labeling as found in medication package inserts. Drug labeling is the most recently submitted to the Food and Drug Administration (FDA) and is currently, in use, it may include, for example, strengthened warnings undergoing FDA review or minor editorial changes. These labels have been reformatted to make them easier to read.

The dbSNP database is a repository for single base nucleotide substitutions, short deletion, and insertion polymorphisms. The dataset loaded in DISQOVER includes all variants with a cross-reference from SwissVar, DisGeNET, and Provean. In addition, it covers all variants of a manually curated subset of genes that are somatically mutated and causally implicated in human cancer from the COSMIC database.

dbVar is NCBI's database of genomic structural variation. It houses variation data generated mostly by published studies of various organisms. Variants typically have lengths of 50 nucleotides or longer.

DrugCentral provides information on active ingredients chemical entities, pharmaceutical products, drug mode of action, indications, and pharmacologic action.

The ENZYME database is a repository of information related to the nomenclature of enzymes.

The European Patent Organisation is an intergovernmental organization that was set up on 7 October 1977 on the basis of the European Patent Convention (EPC) signed in Munich in 1973. It has two bodies, the European Patent Office and the Administrative Council, which supervise the Office's activities.

EudraCT (European Union Drug Regulating Authorities Clinical Trials) is the European Clinical Trials Database of all clinical trials of investigational medicinal products with at least one site in the European Union commencing 1 May 2004 or later.

The Evidence Ontology (ECO) is a controlled vocabulary of terms that describe scientific evidence in the realm of biological research. ECO can be used to document both the evidence that supports a scientific conclusion and how that conclusion was recorded by a scientist, whether a person or a computer.

The Experimental Factor Ontology (EFO) provides a systematic description of many experimental variables available in EBI databases. It combines parts of several biological ontologies, such as anatomy, disease and chemical compounds. The scope of EFO is to support the annotation, analysis and visualization of data handled by the EBI Functional Genomics Team.

Description ExPORTER provides access to RePORTER data files that include information on research projects funded by the National Institutes of Health (NIH), Centers for Disease Control and Prevention (CDC), Agency for Healthcare Research and Quality (AHRQ), Health Resources and Services Administration (HRSA), Substance Abuse and Mental Health Services Administration (SAMHSA), and U.S. Department of Veterans Affairs (VA), as well as publications, patents, and clinical studies citing support from these projects. The data are separated into four major categories: Projects, Publications, Patents, and Clinical Studies. There are also “Link Tables” that can be used to establish the many-to-many relationships between projects and their publications. To keep the project files to a manageable size, abstracts are stored in their own “Project Abstract” files and all other project information is in the “Project Data” files.

The FDA Adverse Event Reporting System (FAERS) is a database that contains adverse event reports, medication error reports and product quality complaints resulting in adverse events that were submitted to FDA. The database is designed to support the FDA's post-marketing safety surveillance program for drug and therapeutic biologic products.

Federal Information Processing Series (FIPS) codes are standardized numeric or alphabetic codes issued by the American National Standards Institute (ANSI) to ensure uniform identification of geographic entities in the United States.

FRIS-researchportal

The Gene Ontology project provides a controlled vocabulary to describe gene and gene product attributes in any organism.

GeneRIF (Gene Reference into Function) provides short, user-submitted descriptions of gene function, offering a quick reference to the biological significance of a wide range of genes.

The HGNC (HUGO Gene Nomenclature Committee) provides an approved gene name and symbol (short-form abbreviation) for each known human gene. All approved symbols are stored in the HGNC database, and each symbol is unique. HGNC identifiers refer to records in the HGNC symbol database.

The Genome Aggregation Database (gnomAD) is a resource developed by an international coalition of investigators, with the goal of aggregating and harmonizing both exome and genome sequencing data from a wide variety of large-scale sequencing projects, and making summary data available for the wider scientific community. The dataset loaded in DISQOVER is including all variations from gnomAD VCF file, with PASS filter and is covering a subset of manually curated genes that are somatically mutated and causally implicated in human cancer from the COSMIC database (http://cancer.sanger.ac.uk/cosmic/curation).

Global Research Identifier Database (GRID) is a database of educational and research organizations worldwide.

The Global Unique Device Identification Database (GUDID) contains key device identification information submitted to the FDA about medical devices that have Unique Device Identifiers (UDI).

The GWAS Atlas is a comprehensive database designed to aggregate and visualize data from genome-wide association studies, aiming to enhance the understanding of genetic links to various traits and diseases. On the other hand, the GWAS Catalog is a curated resource that compiles results from published genome-wide association studies, offering detailed information on SNP-trait associations, methodologies, and findings to support research in genetics and epidemiology.

HomoloGene is a system for automated detection of homologs among the annotated genes of several completely sequenced eukaryotic genomes.

The Horizon 2020 Research and Innovation programme is currently the EU's main instrument for funding research in Europe, running from 2014 to 2020. H2020 implements the Innovation Union, a Europe 2020 flagship initiative aimed at securing Europe's global competitiveness.

HSDB (Hazardous Substances Data Bank) contains comprehensive, peer-reviewed toxicology data for about 5,000 chemicals.

The Disease Ontology has been developed as a standardized ontology for human disease with the purpose of providing the biomedical community with consistent, reusable and sustainable descriptions of human disease terms, phenotype characteristics and related medical vocabulary disease concepts.

The Human Phenotype Ontology (HPO) aims to provide a standardized vocabulary of phenotypic abnormalities encountered in human disease. Each term in the HPO describes a phenotypic abnormality, such as atrial septal defect. The HPO is currently being developed using the medical literature, Orphanet, DECIPHER, and OMIM. HPO currently contains approximately 11,000 terms and over 115,000 annotations to hereditary diseases. The HPO also provides a large set of HPO annotations to approximately 4000 common diseases.

The Human Protein Atlas (HPA) is a Swedish-based initiative aimed at mapping all the human proteins in cells, tissues, and organs using integration of various omics technologies, including antibody-based imaging, mass spectrometry-based proteomics, transcriptomics, and bioinformatics.

The National Center for Health Statistics (NCHS), the Federal agency responsible for use of the International Statistical Classification of Diseases and Related Health Problems, 10th revision (ICD-10) in the United States, has developed a clinical modification of the classification for morbidity purposes. The ICD-10 is used to code and classify mortality data from death certificates, having replaced ICD-9 for this purpose as of January 1, 1999. ICD-10-CM is the replacement for ICD-9-CM, volumes 1 and 2, effective October 1, 2015.

The International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM) is based on the World Health Organization's Ninth Revision, International Classification of Diseases (ICD-9). ICD-9-CM is the official system of assigning codes to diagnoses and procedures associated with hospital utilization in the United States. The ICD-9 was used to code and classify mortality data from death certificates until 1999, when use of ICD-10 for mortality coding started.

The IDMP-Ontology provides a structured framework to define and organize data related to the Identification of Medicinal Products (IDMP), enhancing global medicinal product management and safety through standardized terminology and relationships.

 

Inxight Drugs is a comprehensive drug database that offers detailed information on drugs in development, approved medications, and experimental therapeutics, designed to support research and development efforts in the pharmaceutical and biotechnology industries.

InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences.

The two-letter codes and full country names used apply to the ISO 3166 Standard. Country synonyms are manually added, based on synonyms used in different sources.

The IUPHAR Compendium details the molecular, biophysical and pharmacological properties of identified mammalian sodium, calcium and potassium channels, as well as the related cyclic nucleotide-modulated ion channels and the recently described transient receptor potential channels. It includes information on nomenclature systems, and on inter and intra-species molecular structure variation.

Journals used in PubMed and NCBI Molecular Biology Database.

MarkerDB is a curated online database that provides comprehensive information on biomarkers, integrating data from various sources to support research in disease diagnosis, prognosis, and therapy selection.

MeSH (Medical Subject Headings) is the National Library of Medicine's controlled vocabulary thesaurus. It consists of sets of terms naming descriptors in a hierarchical structure that permits searching at various levels of specificity. This thesaurus is used by NLM for indexing articles from biomedical journals, cataloguing of books, documents, etc.

Mondo is a semi-automatically constructed ontology that merges in multiple disease resources to yield a coherent merged ontology.

The National Drug Code (NDC) is a unique, three-segment number used by the Food and Drug Administration (FDA) to identify drug products for commercial use. This is required by the Drug Listing Act of 1972. The FDA publishes and updates the listed NDC numbers daily.

The NCBI Database for Single Nucleotide Polymorphisms (dbSNP) is a comprehensive repository that provides information on genetic variations, specifically single nucleotide polymorphisms (SNPs), to support research in genetics, disease association studies, and personalized medicine. 

 

Entrez Gene is the NCBI's database for gene-specific information, focusing on completely sequenced genomes, those with an active research community to contribute gene-specific information, or those that are scheduled for intense sequence analysis.

The NCBI Taxonomy Database contains the relationships between all living forms for which nucleic acid or protein sequence have been determined.

OncoMX is a knowledge base that consolidates and integrates genomic and biomarker data on cancer, aiming to facilitate the exploration and analysis of molecular signatures, tumor types, and their implications in oncology research. 

 

The NUTS classification (Nomenclature of territorial units for statistics) is a hierarchical system for dividing up the economic territory of the EU and the UK.

The Open Targets Platform is a comprehensive and robust data integration for access to potential drug targets associated with disease. It brings together multiple data types and aims to assist users to identify and prioritise targets for further investigation.

Geocoding data based on address lookup.

ORCID (Open Researcher and Contributor ID) is an open, non-profit, community-based effort to create and maintain a registry of unique identifiers for individual researchers. ORCID records hold non-sensitive information such as name, email, organization name, and research activities.

The Orphanet Rare Disease ontology (ORDO) is a structured vocabulary for rare diseases, capturing relationships between diseases, genes and other relevant features which will form a useful resource for the computational analysis of rare diseases.

The Pathway Ontology provides a structured framework for categorizing and relating biological pathways, enhancing the understanding and study of complex biochemical processes and their roles in health and disease.

PheWAS Resources offer tools and datasets for Phenome-Wide Association Studies, enabling researchers to explore and understand the genetic underpinnings of a wide array of phenotypes and their associations with diseases.

PROVEAN (Protein Variation Effect Analyzer) was developed to predict whether a protein sequence variation affects protein function.

PubChem provides information on the biological activities of small molecules. It is a component of NIH's Molecular Libraries Roadmap Initiative. We only use a very limited subset of PubChem. This subset is only used to create mappings and is not providing other information.

PubMed is a service of the U.S. National Library of Medicine that includes citations from MEDLINE and other life science journals for biomedical articles back to the 1950s.

PubTator Central (PTC) is a Web-based system providing automatic annotations of biomedical concepts such as genes and mutations in PubMed abstracts and PMC full-text articles.

The Reactome project is a collaboration to develop a curated resource of core pathways and reactions in human biology.

The RxIMAGE API is a freely accessible Application Programming Interface that software developers can use to create apps for text-based search and retrieval from the RxIMAGE database. The RxIMAGE database is the Nation’s only portfolio of curated, freely available, increasingly comprehensive, high-quality digital images of prescription pills and associated data.

RxNorm provides normalized names for clinical drugs and links its names to many of the drug vocabularies commonly used in pharmacy management and drug interaction software, including those of First Databank, Micromedex, MediSpan, Gold Standard Drug Database, and Multum. By providing links between these vocabularies, RxNorm can mediate messages between systems not using the same software and vocabulary.

The SCImago Journal & Country Rank is a publicly available portal that includes the journals and country scientific indicators developed from the information contained in the Scopus® database (Elsevier B.V.). These indicators can be used to assess and analyze scientific domains.

The SemanticScience Integrated Ontology (SIO) provides a simple, integrated ontology of types and relations for rich description of objects, processes and their attributes.

SNOMED CT (Systematized Nomenclature of Medicine -- Clinical Terms), is a systematically organized computer processable collection of medical terminology covering most areas of clinical information such as diseases, findings, procedures, microorganisms, pharmaceuticals, etc.

The SPOR IDMP database is part of the EU's Substance, Product, Organisation and Referentials (SPOR) data management service, designed to implement and manage the Identification of Medicinal Products (IDMP) standards in Europe, ensuring consistent, reliable medicinal product information for regulatory purposes.

STRING is a database that provides comprehensive information on protein-protein interactions and pathways, facilitating the exploration of functional relationships and networks within cells to support research in molecular biology and genetics.

SureChEMBL provides free access to chemical data extracted from the patent literature.

SwissVar is a portal to search variants in Swiss-Prot entries of the UniProt Knowledgebase (UniProtKB), and gives direct access to the Swiss-Prot Variant pages. The Swiss-Prot Variant pages summarize all the information related to a particular variant and contain: - manual annotation on the genotype-phenotype relationship of each specific variant based on literature;- pre-computed information (such as conservation scores and a list of structural features when available) to help assess the effect of the variant

Uberon is an integrated cross-species anatomy ontology representing a variety of entities classified according to traditional anatomical criteria such as structure, function and developmental lineage. The ontology includes comprehensive relationships to taxon-specific anatomical ontologies, allowing integration of functional, phenotype and expression data.

The UCSC Genome Browser is a versatile platform that allows users to explore gene sequences and annotations within the context of complete genomes, offering a detailed view of genetic and genomic data to support research and education in genomics.

The purpose of the joint FDA/USP Substance Registration System (SRS) is to support health information technology initiatives by generating unique ingredient identifiers (UNIIs) for substances in drugs, biologics, foods, and devices. The UNII is a non- proprietary, free, unique, unambiguous, non semantic, alphanumeric identifier based on a substance’s molecular structure and/or descriptive information.

The UniProt Knowledgebase (UniProtKB) is the central hub for the collection of functional information on proteins, with accurate, consistent and rich annotation. In addition to capturing the core data mandatory for each UniProtKB entry (mainly, the amino acid sequence, protein name or description, taxonomic data and citation information), as much annotation information as possible is added. UniProt Consortium T. UniProt: the universal protein knowledgebase. Nucleic Acids Res. 2018 Mar 16;46(5):2699. doi: 10.1093/nar/gky092. PubMed PMID: 29425356;PubMed Central PMCID: PMC5861450.

The mission of the WHO International Clinical Trials Registry Platform is to ensure that a complete view of research is accessible to all those involved in health care decision making.

Wikidata acts as central storage for the structured data of its Wikimedia sister projects including Wikipedia, Wikivoyage, Wikisource, etc.

WikiPathways is a resource providing an open and public collection of pathway maps created and curated by the community in a Wiki like style.

Wikipedia is a multilingual, web-based, free-content encyclopedia project based on an openly editable model. It is written collaboratively by largely anonymous Internet volunteers who write without pay.

DISQOVER
Data sources per solution

 

Data Sources

RDS Package

DISQOVER for R&D

DISQOVER for Clinical 

DISQOVER for Regulatory 

DISQOVER for Cross-Functional Intelligence

1

(ToBeDetermined)

FDA/EMA Approval Documents

 

 

X

X

2

Anatomical Therapeutic Chemical

Chemical

X

 

 

 

3

Anatomical Therapeutic Chemical

Drug

X

X

X

X

4

Antibody Registry

Analytical Antibody

X

 

 

 

5

BRCA Exchange 

Variant

X

 

 

X

6

Cancer Genome Interpreter

Biomarker

X

X

 

X

7

Cell Ontology

Anatomy

X

X

 

 

8

Cellosaurus

Cell Line

X

 

 

 

9

ChEBI

Chemical

X

 

 

 

10

ChEBI

Drug

X

X

X

X

11

ChEMBL

BioAssay

X

 

 

 

12

ChEMBL

Chemical

X

 

 

 

13

ChEMBL

Drug

X

X

X

X

14

ChEMBL

Protein

X

 

 

X

15

ClinicalTrials.gov

Clinical Study

 

X

 

X

16

ClinVar Record

Biomarker

X

X

 

X

17

ClinVar Record

Variant

X

 

 

X

18

Cooperative Patent Classification 

Patent

 

 

 

X

19

DailyMed

Medicine

 

 

X

 

20

DailyMed 

Medicine

 

 

X

 

21

dbVar

Variant

X

 

 

X

22

DrugCentral

Chemical

X

 

 

 

23

DrugCentral

Drug

X

X

X

X

24

DrugCentral

Target

X

X

 

X

25

Ensembl Gene

Gene

X

 

 

X

26

Ensembl Protein

Protein

X

 

 

X

27

ENZYME

Protein

X

 

 

X

28

EPO

Patent

 

 

 

X

29

EudraCT

Clinical Study

 

X

 

X

30

Experimental Factor Ontology (EFO)

Disease

X

X

X

X

31

FDA Adverse Event Reporting System (FAERS) Drug Events

Adverse Event

 

 

X

X

32

Gene Ontology

Pathway

X

X

 

X

33

Gene Ontology

Protein Interaction

X

 

 

 

34

GeneRIF

Gene

X

 

 

X

35

gnomAD 

Variant

X

 

 

X

36

GRID

Organization

 

X

X

X

37

GUDID

Medical Device

 

X

 

X

38

GWAS Atlas

Target

X

X

 

X

39

GWAS Catalog

Target

X

X

 

X

40

HUGO Gene Nomenclature Committee (HGNC)

Gene

X

 

 

X

41

Human Disease Ontology

Disease

X

X

X

X

42

Human Phenotype Ontology (HPO)

Disease

X

X

X

X

43

Human Protein Atlas (HPA)

Protein

X

 

 

X

44

ICD-10-CM

Disease

X

X

X

X

45

ICD-9-CM

Disease

X

X

X

X

46

IDMP-Ontology

IDMP

 

 

X

 

47

Interpro

Protein

X

 

 

X

48

Inxight Drugs

Chemical

X

 

 

 

49

Inxight Drugs

Drug

X

X

X

X

50

IUPHAR Compendium

BioAssay

X

 

 

 

51

IUPHAR Compendium

Chemical

X

 

 

 

52

IUPHAR Compendium

Drug

X

X

X

X

53

IUPHAR Compendium

Target

X

X

 

X

54

MarkerDB

Biomarker

X

X

 

X

55

MeSH

Chemical

X

 

 

 

56

MeSH

Disease

X

X

X

X

57

MeSH

Drug

X

X

X

X

58

MeSH

Literature

X

X

 

X

59

Mondo Disease Ontology

Disease

X

X

X

X

60

National Drug Code

Medicine

 

 

X

 

61

National Drug Code (NDC)

Medicine

 

 

X

 

62

NCBI Database for Single Nucleotide Polymorphisms (DBSNP)

Variant

X

 

 

X

63

NCBI Gene

Gene

X

 

 

X

64

NCBI Homologene

Gene

X

 

 

X

65

NCBI Journals

Literature

X

X

 

X

66

NCBI Protein

Protein

X

 

 

X

67

NCBI Taxonomy

Organism

X

 

 

 

68

OncoMX

Biomarker

X

X

 

X

69

Open Targets

Target

X

X

 

X

70

ORCID (Open Researcher and Contributor ID)

Literature

X

X

 

X

71

Orphanet Rare Disease Ontology (ORDO)

Disease

X

X

X

X

72

Pathway Ontology

Pathway

X

X

 

X

73

PheWAS Resources

Target

X

X

 

X

74

PubChem

Chemical

X

 

 

 

75

PubChem

Drug

X

X

X

X

76

PubMed

Literature

X

X

 

X

77

PubTator

Literature

X

X

 

X

78

Reactome

Pathway

X

X

 

X

79

Reactome

Protein Interaction

X

 

 

 

80

RxIMAGE 

Medicine

 

 

X

 

81

RXNorm

Medicine

 

 

X

 

82

SCImago Journal and Country Rank

Literature

X

X

 

X

83

SNOMED-CT (US edition)

Disease

X

X

X

X

84

SPOR

IDMP

 

 

X

 

85

STRING

Pathway

X

X

 

X

86

STRING

Protein Interaction

X

 

 

 

87

SureChEMBL

Chemical

X

 

 

 

88

SureChEMBL

Drug

X

X

X

X

89

Uberon

Anatomy

X

X

 

 

90

UCSC Genome Browser

Gene

X

 

 

X

91

UNII

Chemical

X

 

 

 

92

UNII

Drug

X

X

X

X

93

UniProt Knowledge Base / Swiss-Prot

Protein

X

 

 

X

94

WHO: International Clinical Trials Registry Platform (ICTRP)

Clinical Study

 

X

 

X

95

WikiData (Life Sciences Organizations)

Organization

 

X

X

X

96

WikiPathways

Pathway

X

X

 

X

97

Wikipedia (Life Sciences Organizations)

Organization

 

X

X

X

 

Access these public data sources in the DISQOVER Community edition

Start here