Article · Wikipedia archive · Last revised May 28, 2026

Biological database

Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics. Information contained in biological databases includes gene function, structure, localization, clinical effects of mutations as well as similarities of biological sequences and structures.

Last revised
May 28, 2026
Read time
≈ 11 min
Length
2,472 w
Citations
95
Source
Home page of a biological database called characterises functional links between proteins1 source ↗

Biological databases are libraries of biological sciences, collected from scientific experiments, published literature, high-throughput experiment technology, and computational analysis. They contain information from research areas including genomics, proteomics, metabolomics, microarray gene expression, and phylogenetics.2 Information contained in biological databases includes gene function, structure, localization (both cellular and chromosomal), clinical effects of mutations as well as similarities of biological sequences and structures.

Biological databases can be classified by the kind of data they collect (see below). Broadly, there are molecular databases (for sequences, molecules, etc.), functional databases (for physiology, enzyme activities, phenotypes, ecology etc), taxonomic databases (for species and other taxonomic ranks), images and other media, or specimens (for museum collections etc.)

Databases are important tools in assisting scientists to analyze and explain a host of biological phenomena from the structure of biomolecules and their interaction, to the whole metabolism of organisms and to understanding the evolution of species. This knowledge helps facilitate the fight against diseases, assists in the development of medications, predicting certain genetic diseases and in discovering basic relationships among species in the history of life.

Major biological databases

These tables cover a variety of notable biological databses across a wide swath of fields, specialties, data types, and use-cases. Many of these databases are collated in the ELIXIR Core Data Resource list which collects important European data resources critical to life science research.3

ELIXIR Core Data Resources4
Resource Category Host institution Description3
ArrayExpress5 Transcriptomics EMBL-EBI /

European Nucleotide Archive5

Functional genomics data from high-throughput experiments
BacDive6 Microbiology Leibniz Institute DSMZ Taxonomy, morphology, physiology, and ecology of bacterial and archaeal strains
Bgee7 Transcriptomics Swiss Institute of Bioinformatics / University of Lausanne7 Gene expression patterns across multiple animal species for comparative analysis
BioImage Archive8 Imaging EMBL-EBI8 Repository of biological images, supporting the deposition and reuse of reference imaging data that underpin published research across the life sciences.
BioStudies9 Metadata EMBL-EBI9 Descriptions of biological studies and supplementary data linking to other archives
BRENDA10 Enzymology Leibniz Institute DSMZ10 Enzyme and enzyme–ligand information curated from primary literature across all taxa
CATH11 Protein Structure University College London11 Hierarchical domain classification of protein structures from the PDB
Cellosaurus12 Cells Swiss Institute of Bioinformatics12 Knowledge resource describing cell lines used in biomedical research
ChEBI13 Biochemistry EMBL-EBI13 Dictionary of small molecular entities of biological interest
ChEMBL14 Biochemistry EMBL-EBI14 Bioactive drug-like small molecules with 2-D structures. calculated properties, and bioactivities
EGA15 Genome EMBL-EBI / CRG15 Personally identifiable genetic and phenotypic data from biomedical research
EMDB16 Biochemical Structure EMBL-EBI16 Cryo-EM maps and tomograms of macromolecular complexes and subcellular structures
ENA17 Sequence EMBL-EBI17 Nucleotide sequencing data, sequence assemblies, and functional annotation
Ensembl18 Genome EMBL-EBI18 Genome browser for vertebrate genomes supporting comparative and regulatory genomics
Ensembl Genomes19 Genome EMBL-EBI19 Comparative analysis and visualisation for non-vertebrate genomes
Europe PMC20 Literature EMBL-EBI20 Life-sciences articles, books, patents, and clinical guidelines
GWAS Catalog21 Variation EMBL-EBI / NHGRI22 Curated collection of human genome-wide association studies
HGNC23 Nomenclature University of Cambridge / EMBL-EBI23 Approved symbols, names, and families for human genes
Human Protein Atlas24 Proteomics KTH Royal Institute of Technology / Karolinska Institute / Uppsala Universitet24 Human proteome mapped across cells, tissues, and organs via multi-omics and imaging
IntAct25 / MINT26 (IMEx) Interactions EMBL-EBI 25 Experimentally verified protein–protein and molecular interaction data
InterPro27 Protein EMBL-EBI27 Protein families, domains, and functional sites integrated from member databases
JASPAR28 Regulation University of Oslo28 Curated, non-redundant transcription factor binding profiles
MGnify29 Metagenomics EMBL-EBI29 Assembly, analysis, and archiving of microbiome-derived nucleic-acid sequences
LIPID MAPS30 Lipidomics Cardiff University / UCSD / Babraham Institute / Swansea University / University of Edinburgh30 Lipid structures, properties, and biological functions
LPSN31 Nomenclature Leibniz Institute DSMZ31 Authoritative nomenclature of prokaryotes
Orphadata Science32 Disease Inserm / French Ministry of Health32 Computable dataset of rare diseases and orphan drugs
OMA33 Orthology Swiss Institute of Bioinformatics / University of Lausanne33 Inferred orthologs among complete genomes
OrthoDB34 Orthology Swiss Institute of Bioinformatics / University of Geneva34 Orthologous protein-coding genes across a wide range of species
PDBe35 Structure EMBL-EBI35 Biological macromolecular structures
PomBase36 Genome University of Cambridge / University College London / Babraham Institute36 Structural and functional annotation for the fission yeast Schizosaccharomyces pombe
PRIDE37 Proteomics EMBL-EBI37 Mass-spectrometry-based proteomics identifications, quantifications, and spectra
Reactome38 Pathways EMBL-EBI / OICR / NYU38 Manually curated and peer-reviewed biological pathways
Rhea39 Chemistry Swiss Institute of Bioinformatics39 Expert-curated chemical and transport reactions of biological interest
SILVA40 Sequence Leibniz Institute DSMZ40 Comprehensive, quality-checked, and regularly updated datasets of aligned small and large subunit ribosomal RNA sequences for all three domains of life (Bacteria, Archaea and Eukarya)
STRING41 Interactions Swiss Institute of Bioinformatics / Novo Nordisk Foundation Center Protein Research / EMBL-EBI41 Known and predicted protein–protein interactions
SWISS-MODEL42 Structure Swiss Institute of Bioinformatics / University of Basel42 Automated protein structure homology modelling
UniProt43 Protein EMBL-EBI / SIB / PIR43 Comprehensive protein sequence and functional annotation
VEuPathDB44 Pathogens University of Pennsylvania / University of Georgia / University of Liverpool44 Genomic and functional data for eukaryotic pathogens and invertebrate disease vectors

Access

Most biological databases are available through web sites that organise data such that users can browse through the data online. In addition the underlying data is usually available for download in a variety of formats. Biological data comes in many formats. These formats include text, sequence data, protein structure and links. Each of these can be found from certain sources, for example:

  • Text formats are provided by PubMed and OMIM.
  • Sequence data is provided by GenBank, in terms of DNA, and UniProt, in terms of protein.
  • Protein structures are provided by PDB, SCOP, and CATH.

Problems and challenges

Biological knowledge is distributed among countless databases. This sometimes makes it difficult to ensure the consistency of information, e.g. when different names are used for the same species or different data formats. As a consequence, inter-operability is a constant challenge for information exchange. For instance, if a DNA sequence database stores the DNA sequence along the name of a species, a name change of that species may break the links to other databases which may use a different name. Integrative bioinformatics is one field attempting to tackle this problem by providing unified access. One solution is how biological databases cross-reference to other databases with accession numbers to link their related knowledge together (e.g. so that the accession number stays the same even if a species name changes). Redundancy is another problem, as many databases must store the same information, e.g. protein structure databases also contain the sequence of the proteins they cover, their sequence, and their bibliographic information.

Model-organism databases

Species-specific databases are available for some species, mainly those that are often used in research (model organisms). For example, EcoCyc is an E. coli database. Other popular model organism databases include Mouse Genome Informatics for the laboratory mouse, Mus musculus, the Rat Genome Database for Rattus, ZFIN for Danio Rerio (zebrafish), PomBase45 for the fission yeast Schizosaccharomyces pombe, FlyBase for Drosophila, WormBase for the nematodes Caenorhabditis elegans and Caenorhabditis briggsae, and Xenbase for Xenopus tropicalis and Xenopus laevis frogs.

Biodiversity and species databases

Animal groups and their number of species from the Catalogue of Life46 source ↗

Numerous databases attempt to document the diversity of life on earth. A prominent example is the Catalogue of Life, first created in 2001 by Species 2000 and the Integrated Taxonomic Information System.47 The Catalogue of Life is a collaborative project that aims to document taxonomic categorization of all currently accepted species in the world.48 The Catalogue of Life provides a consolidated and consistent database for researchers and policymakers to reference. The Catalogue of Life curates up-to-date datasets from other sources such as Conifer Database, ICTV MSL (for viruses), and LepIndex (for butterflies and moths). In total, the Catalogue of Life draws from 165 databases as of May 2022.49 Operational costs of the Catalogue of Life are paid for by the Global Biodiversity Information Facility, the Illinois Natural History Survey, the Naturalis Biodiversity Center, and the Smithsonian Institution.50

Some biological databases also document geographical distribution of different species. Shuang Dai et al. created a new multi-source database to document spatial/geographical distribution of 1,371 bird species in China, as existing databases had been severely lacking in spatial distribution data for many species.51 Sources for this new database included books, literature, GPS tracking, and online webpage data. The new database displayed taxonomy, distribution, species info, and data sources for each species. After completion of the bird spatial distribution database, it was discovered that 61% of known species in China were found to be distributed in regions beyond where they were previously known.52

Medical databases

Foot wounds from WoundsDB53 source ↗

Medical databases are a special case of biomedical data resource and can range from bibliographies, such as PubMed, to image databases for the development of AI based diagnostic software. For instance, one such image database was developed with the goal of aiding in the development of wound monitoring algorithms.54 Over 188 multi-modal image sets were curated from 79 patient visits, consisting of photographs, thermal images, and 3D mesh depth maps. Wound outlines were manually drawn and added to the photo datasets.55 The database was made publicly available in the form of a program called WoundsDB, downloadable from the Chronic Wound Database website.

Publications

Biological databases are commonly described and updated through peer-reviewed publications, which serve both as documentation and as a means of community dissemination.

A major venue for such publications is the annual Nucleic Acids Research (NAR) Database Issue, typically published in January. This special issue presents articles describing new biological databases as well as updates to existing resources, and is accompanied by the NAR online Molecular Biology Database Collection.56

Dedicated journals focusing on biological data resources include Database: The Journal of Biological Databases and Curation and GigaScience, which publish articles describing databases, curated resources, and large-scale datasets, often alongside associated computational tools and workflows.5758

In addition, general data-focused journals such as Scientific Data publish descriptions of datasets across a wide range of scientific disciplines, including but not limited to the life sciences.

See also

See also

References

References

  1. Szklarczyk D; Franceschini A; Kuhn M; et al. (January 2011). "The STRING database in 2011: functional interaction networks of proteins, globally integrated and scored". Nucleic Acids Res. 39 (Database issue): D561–8. doi:10.1093/nar/gkq973. PMC 3013807. PMID 21045058.
  2. Altman RB (March 2004). "Building successful biological databases". Brief. Bioinformatics. 5 (1): 4–5. doi:10.1093/bib/5.1.4. PMID 15153301.
  3. "ELIXIR Core Data Resources | ELIXIR". elixir-europe.org. Retrieved 2026-04-09.
  4. "Core Data Resources". ELIXIR. Retrieved 2026-04-08.
  5. BioStudies. "BioStudies < The European Bioinformatics Institute < EMBL-EBI". www.ebi.ac.uk. Retrieved 2026-04-09.
  6. "BacDive | The Bacterial Diversity Metadatabase". bacdive.dsmz.de. Retrieved 2026-04-09.
  7. "Bgee: gene expression data in animals". Bgee. Retrieved 2026-04-09.
  8. EMBL-EBI. "Home < BioImage Archive < EMBL-EBI". www.ebi.ac.uk. Retrieved 2026-04-09.
  9. EMBL-EBI. "Home < BioImage Archive < EMBL-EBI". www.ebi.ac.uk. Retrieved 2026-04-09.
  10. "BRENDA Enzyme Database". www.brenda-enzymes.org. Retrieved 2026-04-09.
  11. "CATH: Protein Structure Classification Database at UCL". www.cathdb.info. Retrieved 2026-04-09.
  12. "Cellosaurus - Cell line encyclopedia". www.cellosaurus.org. Retrieved 2026-04-09.
  13. "ChEBI - Chemical Entities of Biological Interest". www.ebi.ac.uk. Retrieved 2026-04-09.
  14. "ChEMBL - ChEMBL". www.ebi.ac.uk. Retrieved 2026-04-09.
  15. "EGA European Genome-Phenome Archive - EGA European Genome-Phenome Archive". ega-archive.org. Retrieved 2026-04-09.
  16. EMDB. "Electron Microscopy Data Bank". Electron Microscopy Data Bank. Retrieved 2026-04-09.
  17. "ENA Browser". www.ebi.ac.uk. Retrieved 2026-04-09.
  18. "Ensembl genome browser 115". www.ensembl.org. Retrieved 2026-04-09.
  19. "Ensembl Genomes". ensemblgenomes.org. Retrieved 2026-04-09.
  20. "Europe PMC". europepmc.org. Retrieved 2026-04-09.
  21. "GWAS Catalog". www.ebi.ac.uk. Retrieved 2026-04-09.
  22. "GWAS Catalog". www.ebi.ac.uk. Retrieved 2026-04-09.
  23. "Home | HUGO Gene Nomenclature Committee". www.genenames.org. Retrieved 2026-04-09.
  24. "The Human Protein Atlas". www.proteinatlas.org. Retrieved 2026-04-09.
  25. "IntAct Portal". www.ebi.ac.uk. Retrieved 2026-04-09.
  26. "The Molecular INTeraction Database – An ELIXIR Core Resource". Retrieved 2026-04-09.
  27. "InterPro". www.ebi.ac.uk. Retrieved 2026-04-09.
  28. "InterPro". www.ebi.ac.uk. Retrieved 2026-04-09.
  29. "MGnify - EBI". www.ebi.ac.uk. Retrieved 2026-04-09.
  30. Conroy, Matthew J; Andrews, Robert M; Andrews, Simon; Cockayne, Lauren; Dennis, Edward A; Fahy, Eoin; Gaud, Caroline; Griffiths, William J; Jukes, Geoff; Kolchin, Maksim; Mendivelso, Karla; Lopez-Clavijo, Andrea F; Ready, Caroline; Subramaniam, Shankar; O’Donnell, Valerie B (2024-01-05). "LIPID MAPS: update to databases and tools for the lipidomics community". Nucleic Acids Research. 52 (D1): D1677–D1682. doi:10.1093/nar/gkad896. ISSN 0305-1048. PMC 10767878. PMID 37855672.
  31. "LPSN - List of Prokaryotic names with Standing in Nomenclature". lpsn.dsmz.de. Retrieved 2026-04-09.
  32. "Science Orphadata – Orphanet datasets". sciences.orphadata.com. Retrieved 2026-04-09.
  33. "OMA Orthology database". omabrowser.org. Retrieved 2026-04-09.
  34. "OrthoDB | genes orthologs | Zdobnov lab". www.orthodb.org. Retrieved 2026-04-09.
  35. "Homepage | Protein Data Bank in Europe". www.ebi.ac.uk. Retrieved 2026-04-09.
  36. "PomBase". www.pombase.org. Retrieved 2026-04-09.
  37. "PRIDE - PRoteomics IDEntifications Database". www.ebi.ac.uk. Retrieved 2026-04-09.
  38. "Home - Reactome Pathway Database". reactome.org. Retrieved 2026-04-09.
  39. "Rhea - reaction knowledgebase". www.rhea-db.org. Retrieved 2026-04-09.
  40. "SILVA: Silva". www.arb-silva.de. Retrieved 2026-04-09.
  41. "STRING: functional protein association networks". string-db.org. Retrieved 2026-04-09.
  42. "SWISS-MODEL". swissmodel.expasy.org. Retrieved 2026-04-09.
  43. "UniProt". UniProt. Retrieved 2026-04-09.
  44. "VEuPathDB". veupathdb.org. Retrieved 2026-04-09.
  45. Rutherford KM, Lera-Ramírez M, Wood V (May 2024). "PomBase: a Global Core Biodata Resource—growth, collaboration, and sustainability". Genetics. 227 (1) iyae007. doi:10.1093/genetics/iyae007. PMC 11075564. PMID 38376816.
  46. Catalogue of Life (2001). "Homepage". Search. Species 2000. Archived from the original on 2022-05-05. Retrieved 2022-05-05.
  47. Jones, Andrew C. (2011). "Identifying and Relating Biological Concepts in the Catalogue of Life". Journal of Biomedical Semantics. 2 (1): 7. doi:10.1186/2041-1480-2-7. PMC 3245425. PMID 22004596.
  48. Catalogue of Life (2001). "What is Catalogue of Life?". Our Mission. Species 2000. Archived from the original on 2022-05-05. Retrieved 2022-05-05.
  49. Catalogue of Life (2001). "Source Datasets". Species 2000. Archived from the original on 2022-05-14. Retrieved 2022-05-05.
  50. Catalogue of Life (2001). "Funding". Species 2000. Archived from the original on 2022-05-05. Retrieved 2022-05-05.
  51. Dai, Shuang (2019). "A Spatialized Digital Database for All Bird Species in China". Science China Life Sciences. 62 (5): 661–667. doi:10.1007/s11427-018-9419-2. PMID 30900164. S2CID 84845653.
  52. Dai, Shuang (2019). "A Spatialized Digital Database for All Bird Species in China". Science China Life Sciences. 62 (5): 661–667. doi:10.1007/s11427-018-9419-2. PMID 30900164. S2CID 84845653.
  53. "Chronic Wound Database". WoundsDB. Silesian University of Technology. 2020. Retrieved 2022-05-05.
  54. Kręcichwost, Michał (2021). "Chronic Wounds Multimodal Image Database". Computerized Medical Imaging and Graphics. 88 101844. doi:10.1016/j.compmedimag.2020.101844. PMID 33477091. S2CID 231676950.
  55. "Chronic Wound Database". WoundsDB. Silesian University of Technology. 2020. Retrieved 2022-05-05.
  56. Rigden, Daniel J.; Fernández, Xosé M. (2026). "The 2026 Nucleic Acids Research database issue and the online molecular biology database collection". Nucleic Acids Research. 54 (D1): D1–D9. doi:10.1093/nar/gkaf1427. PMC 12807761. PMID 41431842.
  57. Landsman, David; Gentleman, Robert; Kelso, Janet; Ouellette, B. F. Francis (2009). "DATABASE: A new forum for biological databases and curation". Database (Oxford): bap002. doi:10.1093/database/bap002. PMC 2790300. PMID 20157475.{{cite journal}}: CS1 maint: article number as page number (link)
  58. Kyrpides, Nikos C.; Andrews-Pfannkoch, Catherine; White, Owen (2012). "GigaScience: a new journal for data-intensive science". GigaScience. 1 (1): 7. doi:10.1186/2047-217X-1-7. PMC 3621314. PMID 23587260.
External links