Article · Wikipedia archive · Last revised May 31, 2026

DisGeNET

DISGENET is an AI-ready translational knowledge infrastructure that integrates comprehensive information on human genotype–phenotype relationships, including gene–disease, variant–disease, and disease–disease associations. Developed since 2010 and commercialised by MedBioInformatics Solutions, a Barcelona-based techbio company, the platform is widely used in biomedical research, drug discovery, and clinical genomics.

Last revised
May 31, 2026
Read time
≈ 11 min
Length
2,496 w
Citations
31
Source
DisGeNET
Content
DescriptionGenotype-Phenotype-Therapeutic map
Data types
captured
Associations between genes, variants, diseases, chemicals and drugs
OrganismsHomo sapiens
Contact
Research centerMedbioinformatics Solutions
AuthorsMedbioinformatics Solutions
Primary citationhttps://doi.org/10.64898/2026.01.05.697749
Release date2010
Access
WebsiteDISGENET
Miscellaneous
LicenseProprietary
VersioningDISGENET v26.1
Data release
frequency
Quarterly

DISGENET is an AI-ready translational knowledge infrastructure that integrates comprehensive information on human genotype–phenotype relationships, including gene–disease, variant–disease, and disease–disease associations. Developed since 2010 and commercialised by MedBioInformatics Solutions, a Barcelona-based techbio company, the platform is widely used in biomedical research, drug discovery, and clinical genomics.

The platform is designed as API-native, provenance-aware infrastructure aligned with FAIR data principles (Findable, Accessible, Interoperable, Reusable). Its structured data model and integration of multiple evidence sources have enabled its use in computational workflows, including applications in artificial intelligence (AI) and machine learning (ML) in the life sciences.

As of As of March 2026 (version 26.1), DISGENET has been cited in 8,500 scientific publications.

History and Development

DISGENET was first released in 2010 as a Cytoscape plugin designed to visualize, integrate, search, and analyze gene–disease networks. The initiative was led by researchers from GRIB - Universitat Pompeu Fabra - IMIM (Barcelona, Spain), to consolidate fragmented information about the genetic basis of human diseases into a single, unified resource. Since 2020, the platform has been developed and maintained by MedBioInformatics Solutions SL, a Barcelona-based techbio company established in 2020.

Over successive releases, the platform expanded its data coverage, analytical tools, and ontological integration:

  • 2010 (v1.0): First release as a Cytoscape plugin; introduced the gene–disease association (GDA) concept with provenance tracking.
  • 2015 (v2.1): Launched as a comprehensive discovery platform with web interface, REST API, R package, and NLP-based literature mining; contained over 380,000 GDAs across 16,000+ genes and 13,000+ diseases.
  • 2017 (v4.0): Major expansion integrating GWAS catalogues and animal models; introduced variant–disease associations (VDAs) alongside GDAs.
  • 2018: Selected as an ELIXIR Recommended Interoperability Resource by the European life sciences infrastructure ELIXIR, recognising the platform's adherence to FAIR data standards and its interoperability within the Linked Open Data cloud.
  • 2019/2020 (v6.0/v7.0): Further expansion to over 1.1 million GDAs and 210,000+ VDAs; introduction of the disease–disease association (DDA) dataset; new data sources including PheWAS and biobank data.
  • 2020: Medbioinformatics launched; Platform started the transition to a freemium sustainability model, ensuring long-term viability through a dual-tier structure: free licenses for academic and non-profit users, and commercial subscriptions for pharmaceutical and biotechnology companies.
  • 2021–2023: quarterly update cycle established; commercial activity established; release of the DisGeNET Cytoscape App publicationaper; expanded programmatic access tools.
  • 2024–2026 (v24.x–v26.x): Freemium model established; Substantial expansion including integration of biobank-scale GWAS data (UK Biobank, FinnGen), chemical and drug associations for more than 12,000 compounds, PheWAS data, and GenCC gene curation data; introduction of the AIDA AI assistant for natural language queries; Current release is version 26.1 (March 2026).

Data Content and Model

DISGENET is built around three primary association types that together form a comprehensive genotype–phenotype–therapy map:

  • Gene–Disease Associations (GDAs): Structured links between genes and human diseases or phenotypes, integrating evidence from experimental models, clinical research, GWAS studies, and the scientific literature.
  • Variant–Disease Associations (VDAs): Links between specific genomic variants (coding and non-coding) and diseases, sourced from clinical databases, GWAS catalogues, PheWAS studies, and large population biobanks.
  • Disease–Disease Associations (DDAs): Relationships computationally inferred from shared molecular mechanisms, annotated with semantic similarity metrics over the Unified Medical Language System (UMLS) graph and with semantic relations from the UMLS Metathesaurus.

Chemical and drug annotations provide an additional therapeutic layer, linking gene and variant associations to more than 12,000 drugs and chemicals, enabling pharmacogenomic and toxicogenomic analysis.

Statistics (version 26.1, March 2026)

As of As of March 2026 (version 26.1), the database contains:

Data type Count
Gene–Disease Associations (GDAs) 2.1 million
Variant–Disease Associations (VDAs) 4.8 million
Disease–Disease Associations (DDAs) 24.4 million
Genes 30,000+
Genomic variants 1.7 million
Diseases & phenotypes 50,000+
Drugs & chemicals 12,000+

For the most current statistics, see the DISGENET release information page.1

Data Sources

DISGENET aggregates evidence from two complementary streams:

  1. Specialized databases: Including UniProt, ClinVar, Orphanet, ClinGen, GenCC, PsyGeNET, Mouse Genome Database (MGD), Rat Genome Database (RGD), the NHGRI-EBI GWAS Catalog, PheWAS Catalog, FinnGen, and UK Biobank.
  2. Scientific literature and clinical trials: A proprietary natural language processing tool is used to mine evidence supporting the associations in DISGENET from scientific articles and clinical trial records, accounting for approximately 70% of the database content and enabling capture of emerging findings before they appear in curated repositories.

Metrics and Scoring System

DISGENET provides several original metrics to support evidence-based prioritization of associations:

  • DISGENET Score: A value from 0 to 1 quantifying the strength of evidence supporting an association, weighting curated and literature sources according to their reliability.
  • Disease Specificity Index (DSI): Measures the specificity of a gene or variant to a given disease, distinguishing disease-specific from pleiotropic associations.
  • Disease Pleiotropy Index (DPI): Indicates how broadly a gene or variant is associated across disease classes.
  • Evidence Index: Identifies associations where contradictory findings exist in the literature, signalling uncertainty.
  • Jaccard Index: Measures the overlap in genes or variants shared between two diseases, supporting DDA interpretation.
  • My Score: A user-configurable scoring option allowing custom weighting of evidence types.

Annotations also include ACMG variant classifications, GenCC gene validity levels, loss-of-function (LoF) and gain-of-function (GoF) tags, variant consequence types, deleteriousness scores, ancestry information, and pharmacogenomic context (drug effects on the association).

Suite of Access Tools

DISGENET provides a multi-modal suite of tools designed for users across computational and clinical backgrounds:

  • Web Interface (disgenet.com): Interactive search, filtering, and visualisation, including heatmaps, networks, biological pathway views, and Disease Enrichment Analysis.
  • REST API (api.disgenet.com): Programmatic access enabling integration into bioinformatics pipelines and AI/ML workflows at scale.
  • disgenet2r R package (GitLab): An open-source R package for analysis and visualisation of DISGENET data within the R statistical environment.
  • Cytoscape App (Cytoscape App Store): A plugin for Cytoscape enabling network-based exploration and visualisation of genotype–phenotype relationships, with API automation for Python and R workflows.
  • AIDA (AI-Driven Assistant) (disgenet.com/Assistant): An agentic AI tool that interprets natural language queries and translates them into structured REST API calls, significantly lowering the barrier for non-computational users and enabling autonomous AI workflows.
  • Full database download: Available to subscribers, enabling bulk data access for large-scale computational analyses.

Role in Artificial Intelligence and Machine Learning

DISGENET has been used in computational research in the life sciences, including applications in artificial intelligence (AI) and machine learning (ML). Its structured data model, which incorporates ontology-based integration, provenance tracking, and evidence scoring, enables its use in data-driven analyses in areas such as drug discovery and disease research.

Foundation Models and Knowledge Graphs

The structured associations in DISGENET have been used in machine learning approaches for gene–disease association analysis and prediction, including the construction of gene–disease datasets and knowledge graph–based models. In these studies, DISGENET data are integrated with other biomedical resources to support computational frameworks that combine heterogeneous data sources for disease-related analyses. 234567

Agentic AI Integration

DISGENET has been integrated into external computational environments to support automated retrieval of gene–disease associations, variant annotations, and related biomedical data within multi-step analytical workflows.

Such integrations reflect broader efforts to combine structured biomedical knowledge bases with computational methods, including systems that incorporate large language models and automated reasoning for hypothesis generation and prioritisation in biomedical research.891011

Applications

DISGENET has been used in different areas of biomedical research and clinical genomics.

Drug Research and Development

Applications described in the literature include:121314151617

  • Identification and prioritisation of therapeutic targets
  • Analysis of potential safety implications associated with gene targets
  • Identification of shared genetic mechanisms for drug repurposing
  • Biomarker discovery using disease specificity and pleiotropy measures

Clinical Genomics

DISGENET integrates multi-source genomic evidence from authoritative resources and enriches it with literature-derived knowledge, delivering a structured evidence-ranked genomic knowledge layer for clinical genomics and systems medicine. Key clinical applications 1819include:

  • Interpretation of Whole Exome Sequencing (WES) and Whole Genome Sequencing (WGS) data, including variants in non-coding regions underrepresented in clinical databases such as ClinVar.20
  • Development of gene panels for precision medicine 21
  • Support for rare disease diagnosis: In a study conducted at A.O.R.N. Cardarelli Hospital (Naples, Italy), the use of DISGENET in exome sequencing analysis was associated with an increase in diagnostic yield in an undiagnosed disease cohort, from 16 to 32 diagnosed cases.22
  • Comorbidity analysis using the DDA dataset to understand shared molecular mechanisms across co-occurring conditions.232425

Machine Learning, AI, and Precision Medicine

DISGENET data has been used to support knowledge graph development, model training and evaluation, and companion diagnostic design. Its uniform data model and provenance metadata satisfy requirements for explainable AI in regulated environments.262728293031

Sustainability Model

In 2020, DISGENET adopted a freemium access model:

  • Users from not-for-profit research and academic institutions can apply for a free subscription license, granting access to the database and open-source tools for non-commercial research and teaching.
  • Users from for-profit organisations can subscribe to commercial licenses.

Revenue from commercial subscriptions funds ongoing platform development and infrastructure maintenance.

Key Publications

The following peer-reviewed publications describe successive releases of the DISGENET platform:

1. Bauer-Mehren A, Rautschka M, Sanz F, Furlong LI. DisGeNET: a Cytoscape plugin to visualize, integrate, search and analyze gene–disease networks. Bioinformatics. 2010;26(22):2924–6.

2. Bauer-Mehren A, Bundschus M, Rautschka M, Mayer MA, Sanz F, Furlong LI. Gene-disease network analysis reveals functional modules in mendelian, complex and environmental diseases. PLoS One. 2011;6(6):e20284.

3. Piñero J, et al. DisGeNET: a discovery platform for the dynamical exploration of human diseases and their genes. Database. 2015;bav028.

4. Piñero J, et al. DisGeNET: a comprehensive platform integrating information on human disease-associated genes and variants. Nucleic Acids Research. 2017;45(D1):D833–9.

5. Piñero J, et al. The DisGeNET knowledge platform for disease genomics: 2019 update. Nucleic Acids Research. 2020;48(D1):D845–D855.

6. Piñero J, Saüch J, Sanz F, Furlong LI. The DisGeNET Cytoscape app: Exploring and visualizing disease genomics data. Computational and Structural Biotechnology Journal. 2021;19:2960–7.

7. Piñero J, Furlong LI, et al. DISGENET: Accelerating Data-Driven Discovery in Disease Genomics and Therapeutic Development. bioRxiv. 2026.

European projects

References

References

  1. "DISGENET Documentation (v26.1)". DISGENET. Retrieved 2026-05-03.
  2. Bonner, S., Barrett, I. P., Ye, C., Swiers, R., Engkvist, O., Bender, A., ... & Hamilton, W. L. (2022). A review of biomedical datasets relating to drug discovery: a knowledge graph perspective. Briefings in Bioinformatics, 23(6), bbac404.https://doi.org/10.1093/bib/bbac404
  3. Perdomo-Quinteiro, P., & Belmonte-Hernández, A. (2024). Knowledge Graphs for drug repurposing: a review of databases and methods. Briefings in Bioinformatics, 25(6), bbae461.https://academic.oup.com/bib/article/25/6/bbae461/7774899
  4. Santos, A., Colaço, A. R., Nielsen, A. B., Niu, L., Strauss, M., Geyer, P. E., ... & Mann, M. (2022). A knowledge graph to interpret clinical proteomics data. Nature biotechnology, 40(5), 692-702.https://www.nature.com/articles/s41587-021-01145-6
  5. Chandak, P., Huang, K., & Zitnik, M. (2023). Building a knowledge graph to enable precision medicine. Scientific Data, 10(1), 67. https://www.nature.com/articles/s41597-023-01960-3
  6. Gualdi, F., Oliva, B., & Piñero, J. (2024). Predicting gene disease associations with knowledge graph embeddings for diseases with curtailed information. NAR Genomics and Bioinformatics, 6(2), lqae049.https://academic.oup.com/nargab/article/6/2/lqae049/7671327
  7. Keyan Ding, Zhihui Zhu, Yuqi Tang, et al. Bridging Data and Discovery: A Survey on Knowledge Graphs in AI for Science. TechRxiv. November 21, 2025. DOI: 10.36227/techrxiv.176369442.22009541/v1
  8. Queen, O., Huang, Y., Calef, R., Giunchiglia, V., Chen, T., Dasoulas, G., ... & Zitnik, M. (2025). Procyon: A multimodal foundation model for protein phenotypes. BioRxiv, 2024-12.https://doi.org/10.1101/2024.12.10.627665
  9. Noori, A., Polonuer, J., Meyer, K., Budnik, B., Morton, S., Wang, X., ... & Zitnik, M. (2025). Graph AI generates neurological hypotheses validated in molecular, organoid, and clinical systems. arXiv preprint arXiv:2512.13724.
  10. Huang, K., Zhang, S., Wang, H., Qu, Y., Lu, Y., Roohani, Y., ... & Leskovec, J. (2025). Biomni: A general-purpose biomedical ai agent. biorxiv.
  11. Wang, E., Schmidgall, S., Jaeger, P. F., Zhang, F., Pilgrim, R., Matias, Y., ... & Azizi, S. (2025). Txgemma: Efficient and agentic llms for therapeutics. arXiv preprint arXiv:2504.06196. https://doi.org/10.48550/arXiv.2504.06196
  12. Pun, F.W., Podolskiy, D., Izumchenko, E. et al. Target identification and assessment in the era of AI. Nat Rev Drug Discov (2026). https://doi.org/10.1038/s41573-026-01412-8
  13. Chen, R., Duffy, Á. & Do, R. Genomics of drug target prioritization for complex diseases. Nat Rev Genet 27, 231–245 (2026). https://doi.org/10.1038/s41576-025-00904-4
  14. Carss, K.J., Deaton, A.M., Del Rio-Espinola, A. et al. Using human genetics to improve safety assessment of therapeutics. Nat Rev Drug Discov 22, 145–162 (2023). https://doi.org/10.1038/s41573-022-00561-w
  15. de Thé, F. X. B., Baudier, C., Pereira, R. A., Lefebvre, C., Moingeon, P., & Patrimony Working Group. (2023). Transforming drug discovery with a high-throughput AI-powered platform: A 5-year experience with Patrimony. Drug Discovery Today, 28(11), 103772.https://doi.org/10.1016/j.drudis.2023.103772
  16. Pinheiro‐de‐Sousa, I., Fonseca‐Alaniz, M. H., Giudice, G., Valadão, I. C., Modestia, S. M., Mattioli, S. V., ... & Krieger, J. E. (2023). Integrated systems biology approach identifies gene targets for endothelial dysfunction. Molecular Systems Biology, 19(12), e11462.https://doi.org/10.15252/msb.202211462
  17. Zong, N., Chowdhury, S., Zhou, S. et al. Advancing efficacy prediction for electronic health records based emulated trials in repurposing heart failure therapies. npj Digit. Med. 8, 306 (2025). https://doi.org/10.1038/s41746-025-01705-z
  18. Ferreira, J. C., Alshamali, F., Pereira, L., & Fernandes, V. (2022). Characterization of Arabian Peninsula whole exomes: Contributing to the catalogue of human diversity. Iscience, 25(11).DOI: 10.1016/j.isci.2022.105336
  19. Alsentzer, E., Finlayson, S. G., Li, M. M., Undiagnosed Diseases Network, Kobren, S. N., & Kohane, I. S. (2023). Simulation of undiagnosed patients with novel genetic conditions. Nature Communications, 14(1), 6403. https://www.nature.com/articles/s41467-023-41980-6
  20. Shil, A., Arava, N., Levi, N., Levine, L., Golan, H., Meiri, G., ... & Menashe, I. (2025). An integrative scoring approach for prioritization of rare autism spectrum disorder candidate variants from whole exome sequencing data. Scientific Reports, 15(1), 13024.https://doi.org/10.1038/s41598-025-96063-x
  21. Lazo de la Vega, L., Yu, W., Machini, K. et al. A framework for automated gene selection in genomic applications. Genet Med 23, 1993–1997 (2021).https://doi.org/10.1038/s41436-021-01213-x.
  22. Chetta M, Tarsitano M, Rivieccio M, Oro M, Cammarota AL, De Marco M, Marzullo L, Rosati A, Bukvic N. A Copernican revolution of multigenic analysis: A retrospective study on clinical exome sequencing in unclear genetic disorders. Computational and Structural Biotechnology Journal. 2024 Jun 15;23:2615–2622. doi:10.1016/j.csbj.2024.06.011. PMID: 39006921; PMCID: PMC11245952.
  23. Lanzer, J.D., Valdeolivas, A., Pepin, M. et al. A network medicine approach to study comorbidities in heart failure with preserved ejection fraction. BMC Med 21, 267 (2023). https://doi.org/10.1186/s12916-023-02922-7
  24. Sánchez-Valle, J., Tejero, H., Fernández, J.M. et al. Interpreting molecular similarity between patients as a determinant of disease comorbidity relationships. Nat Commun 11, 2854 (2020). https://doi.org/10.1038/s41467-020-16540-x
  25. Rubio-Perez, C., Guney, E., Aguilar, D. et al. Genetic and functional characterization of disease associations explains comorbidity. Sci Rep 7, 6207 (2017). https://doi.org/10.1038/s41598-017-04939-4
  26. Bonner, S., Barrett, I. P., Ye, C., Swiers, R., Engkvist, O., Bender, A., ... & Hamilton, W. L. (2022). A review of biomedical datasets relating to drug discovery: a knowledge graph perspective. Briefings in Bioinformatics, 23(6), bbac404.https://doi.org/10.1093/bib/bbac404
  27. Perdomo-Quinteiro, P., & Belmonte-Hernández, A. (2024). Knowledge Graphs for drug repurposing: a review of databases and methods. Briefings in Bioinformatics, 25(6), bbae461.https://academic.oup.com/bib/article/25/6/bbae461/7774899
  28. Santos, A., Colaço, A. R., Nielsen, A. B., Niu, L., Strauss, M., Geyer, P. E., ... & Mann, M. (2022). A knowledge graph to interpret clinical proteomics data. Nature biotechnology, 40(5), 692-702.https://www.nature.com/articles/s41587-021-01145-6
  29. Chandak, P., Huang, K., & Zitnik, M. (2023). Building a knowledge graph to enable precision medicine. Scientific Data, 10(1), 67. https://www.nature.com/articles/s41597-023-01960-3
  30. Gualdi, F., Oliva, B., & Piñero, J. (2024). Predicting gene disease associations with knowledge graph embeddings for diseases with curtailed information. NAR Genomics and Bioinformatics, 6(2), lqae049.https://academic.oup.com/nargab/article/6/2/lqae049/7671327
  31. Keyan Ding, Zhihui Zhu, Yuqi Tang, et al. Bridging Data and Discovery: A Survey on Knowledge Graphs in AI for Science. TechRxiv. November 21, 2025. DOI: 10.36227/techrxiv.176369442.22009541/v1