Biomedical data science

Biomedical data science is a multidisciplinary field which leverages large volumes of data to promote biomedical innovation and discovery. Biomedical data science draws from various fields including Biostatistics, Biomedical informatics, and machine learning, with the goal of understanding biological and medical data. It can be viewed as the study and application of data science to solve biomedical problems.¹ Modern biomedical datasets often have specific features which make their analyses difficult, including:

Large numbers of feature (sometimes billions), typically far larger than the number of samples (typically tens or hundreds)
Noisy and missing data
Privacy concerns (e.g., electronic health record confidentiality)
Requirement of interpretability from decision makers and regulatory bodies

Many biomedical data science projects apply machine learning to such datasets.²³ These characteristics, while also present in many data science applications more generally, make biomedical data science a specific field. Examples of biomedical data science research include:

Computational genomics
Computational imaging³⁴
Electronic health records data mining
Biomedical network science⁵
Clinical Natural Language Processing (NLP)

Computational Imaging and Deep Learning

Journal.pone_.0071275.g002 — A medical image of multiple brain scans. The twenty images show the human brain from a variety of different angles. On each image, the right and left sides show regions highlighted in different colors, including blue, red, yellow and orange. source ↗

Computational imaging is a cornerstone of biomedical data science, focusing on the development of algorithms to enhance, analyze, and interpret medical imagery. In recent years, the field has been transformed by the integration of deep learning, particularly through the use of Convolutional Neural Networks. Deep learning started from researchers manually defining characteristics like edge detection or texture representation learning.⁶ In a more modern approach of computational imaging, models automatically learn a hierarchy of features directly from raw pixel data. This overlap between data science and deep learning is applied across several key tasks:

Classification: Identifying the presence of specific diseases, such as distinguishing between benign and malignant tumors in histopathology slides or detecting pneumonia in chest X-rays.
Segmentation: The precise delineation of anatomical structures or lesions. A notable example is the U-Net architecture,⁷ which is widely used for biomedical image segmentation to help clinicians quantify organ volume or track tumor growth.
Detection: Automating the localization of small objects, such as identifying microcalcifications in mammograms or polyps during colonoscopies.
Registration: The process of aligning multiple images to provide a comprehensive view of the patient's anatomy.

Even with all of these enhancements, the application of deep learning in medical imaging requires accomplishing vigorous challenges. An example of these changes is building large, annotated datasets and creating the imperative for model interpretability in clinical decision-making.

Electronic Health Records

Example of an individual's electronic medical record source ↗

Electronic Health Records (EHRs) are a digital alternative to patient paper charts, usually including individual records or population health information⁸. EHRs can be used in a wide variety of applications, including research and analysation as they often include demographics, diagnoses, medications, test results, and personal statistics⁸.

History

1960s

The earliest precursor is considered Dr. Lawrence Weed's problem-oriented medical record (POMR) published in the 1968 which sorts and groups medical records by medical diagnoses and symptoms⁹. The POMR was the first system to organize based off of patient information rather than the source (doctors, nurses, attendings, etc.)⁹.

In 1969, the Regenstrief Institute developed and published the Regenstrief Medical Record System which established electronic writing, storage, and retrieval of records which served as the basis for modern EHR systems¹⁰.

2000s

In 2009, the Health Information Technology for Economic and Clinical Health Act (HITECH Act) was passed in the United States¹¹. This act standardized privacy and distribution of EHRs and increased the acceptance and utilization of EHRs within medical and academic settings.

Artificial Intelligence and Machine Learning Applications

Machine Learning and Artificial Intelligence have become central tools in biomedical data science. Recent advances in large language models (LLMs) have expanded their role beyond text, with models trained directly on genomic sequences enabling tasks such as gene function prediction, variant effect analysis, and drug discovery. In clinical settings, Natural Language Processing (NLP) models are applied to electronic health records to extract structured insights from unstructured clinical notes and data, supporting diagnosis and treatment planning.¹²¹³

A large language model trained on genomic sequences can be applied to variant effect prediction, gene function prediction, and drug discovery. source ↗

Beyond genomics, AI models have been applied to protein structure prediction. AlphaFold, developed by Google DeepMind, uses deep learning to predict three-dimensional protein structures from amino acid sequences with high accuracy.¹⁴ These predictions have been used to support drug target identification and the study of disease mechanisms.

Knowledge Graphs

Simple knowledge graph depicting the relationships between a gene, a disease and the biological functions they affect source ↗

Knowledge graphs (KGs) are widely used in biomedical data science to represent and analyze complex relationships among biological and medical entities. By structuring data as nodes (e.g., genes, diseases, drugs) and edges (relationships), KGs enable computational methods to extract insights and support decision-making.¹⁵ These biomedical relationships can be efficiently modeled and queried using technologies such as Neo4j.¹⁶

Biomedical Research Applications

KGs provide biomedical researchers with a way to model complex biological systems.¹⁶ They have been used to identify the relationships between diseases and biomolecules, support drug repurposing, and to uncover new biological insights.¹⁵ Additional applications include:

Identification of novel antibiotic resistance genes through graph-based link prediction.¹⁶

Finding associations between miRNA and diseases.¹⁵

Prediction of protein-protein interactions.¹⁵

Clinical Applications

In clinical settings, KGs can be used to make visual representations of a patient's electronic health records.¹⁵¹⁷ The data obtained from these graphs can assist healthcare providers in improving patient diagnoses and prescribing more effective drugs.¹⁵ Additionally, embeddings derived from resources like the Unified Medical Language System (UMLS) enable natural language processing of clinical text and similarity analysis between medical concepts.¹⁶

Limitations

Despite their advantages, knowledge graphs face several challenges. Some of these include:

High algorithmic complexity and large biological datasets make the process computationally expensive.¹⁷

KG construction can be a time-consuming process that requires careful attention to assign appropriate node types and vocabularies.¹⁶

Using data from a wide range of datasets in one KG requires them to be effectively integrated.¹⁶¹⁷

Privacy

A primary challenge in biomedical data science is maintaining medical privacy. Conducting research requires that data be collected on a number of people for training and testing purposes and is stored within biomedical datasets. This poses a risk for violating patient confidentiality and may dissuade people from participating in studies.

The main sources of health statistics are¹⁸

surveys
administrative and medical records
health care claims data, vital records
surveillance
disease registries
grey literature and peer-reviewed literature.

Large data collection is a useful tool for researching various medical conditions. Researchers use these large datasets of information to identify factors that may make people more susceptible to certain diseases.¹⁹ Large amounts of collected data can help researchers identify patterns for disease probabilities. The findings can show a person is more likely for a condition, or identify environmental, social, and personal habits that may lead to adverse health issues.

Institutions researching using personal medical information come with a moral and legal responsibility to protect the use of that information.²⁰ Protection of the collected information has become a big concern.²¹ Sophisticated and coordinated attacks on certain medical systems happen more frequently. Medical companies, medical insurance and private businesses have invested a great deal into the protection of personal data. Despite this, data breaches continue to be documented. The chart below shows the top healthcare breaches in 2025.²²

Top Healthcare Data Breaches
Rank	Company	Affected Users	Type of Breach
1	Yale New Haven (Conn.) Health	5,556,702	Hacking/IT Incident
2	Episource	5,418,866	Ransomware Incident
3	Blue Shield of California	4.7 million	Hacking Incident with Google Analytics
4	DaVita	2,689,826	Ransomware Incident
5	Anne Arundel Dermatology	1,905,000	Unauthorized Access

For these reasons, many people have reservations about giving up their personal data. Aside from the legitimate use of personal data there have been instances where companies have found methods to profit from brokering medical information.²³ Concerns exist regarding unauthorized use of sensitive information within these data companies. If a person is identified within a dataset, then sensitive data can be used to discriminate against them. For example, insurance companies may charge a higher rate if a person is a higher risk for a certain disease. Security breaches and misuse of information continue to discourage many from participating in large scale studies and clinical trials.

Because of these concerns many large-scale studies have developed ways to protect anonymity within these datasets. One of these methods is Differential Privacy, which built on the premise that the query result of a dataset will not change drastically with a single addition or deletion of a record.²⁴

\Pr[{\mathcal {A}}(D_{1})\in S]\leq e^{\varepsilon }\Pr[{\mathcal {A}}(D_{2})\in S]+\delta .

This formula is used to determine whether an algorithm ${A}$ is differentially private where $D_{1}$ and $D_{2}$ refer to two datasets that are different by one record.

Training in Biomedical Data Science

The National Library of Medicine of the US National Institutes of Health (NIH) identified key biomedical data scientist attributes in an NIH-wide review: general biomedical subject matter knowledge; programming language expertise; predictive analytics, modeling, and machine learning; team science and communication; and responsible data stewardship.²⁵

University Departments and Programs

Johns Hopkins University's Department of Biomedical Engineering offers biomedical data science training at the undergraduate, master's, and PhD levels. They were the first university to offer programs at both undergraduate and graduate levels.
Dartmouth College's Geisel School of Medicine houses the Department of Biomedical Data Science where Quantitative Biomedical Sciences programs are available at the master's and PhD levels.
Imperial College London's Faculty of Medicine and Data Science Institute offer an MRes in Biomedical Research (Data Science).
Mount Sinai's Icahn School of Medicine offers a Master of Science in Biomedical Data Science.
Stanford University's Department of Biomedical Data Science offers multiple biomedical informatics graduate programs (MS, PhD, and MD/PhD).
The University of Exeter's College of Healthcare and Medicine offers an MSc in Health Data Science.
Harvard Medical School: The Department of Biomedical Informatics offers a PhD in Biomedical Informatics with two distinct tracks. These two tracks are Bioinformatics and Integrative Genomics and Artificial Intelligence in Medicine. These programs focus on the use of large-scale health data and AI to transform clinical practice and genomic research.
Duke University: The School of Medicine houses the Center for Health Informatics and offers a Master of Management in Clinical Informatics. This program sits at the intersection of business, healthcare, and data science, focusing on the strategic use of data to improve patient care and healthcare operations.
University of Washington: The Department of Biomedical Informatics and Medical Education offers MS and PhD degrees in Biomedical and Health Informatics. They also offer a specialized Data Science option for these degrees as well. These degrees focus on advanced computational methods for biological research.
Ohio State University: The Department of Biomedical Informatics provides comprehensive graduate training and programs. Their research focuses on clinical informatics, computational biology, and AI applications in digital health.
Ohio University: Offers research and coursework in Bioinformatics and Computational Biology through the Russ College of Engineering and Technology and the Department of Biological Sciences.

Biomedical Data Science Research in Academia

Scholarly Journals

The first journal dedicated to biomedical data science appeared in 2018 – Annual Review of Biomedical Data Science.

Other journals have a more general scope than biomedical data science, but regularly publish biomedical data science research such as Health Data Science²⁶ and Nature Machine Intelligence.²⁷ Data science would not exist without curated datasets and the field has seen the rise of journals that are dedicated to describing and validating such datasets, including journals such as Scientific Data,²⁸ Biomedical Data,²⁹ and Data.³⁰

Concerning Electronic Health Records (EHR), there have been a multitude of studies published since 2000, including 1079 articles from 2000-2009, 582 articles published from 2010-2019, and 441 articles published between 2020-2024³¹.

Conferences

Biomedical data science is supported by specialized academic meetings such as the Biomedical Data Science Summer School & Conference (BIOMED-DATA), hosted in Budapest, Hungary, at Semmelweis University. Organized by the Institute of Biostatistics and Network Science, the event is presented as an annual conference and summer school focused on data-intensive biological and medical research, with topics including health data science, machine learning, and biomedical network science.³²

Genomic Data Science

Genomic data science is a subset of biomedical data science that specifically focuses on collecting, processing, and analyzing large amounts of genomic data. This includes DNA, RNA, and epigenetic information. Genomic data science utilizes methods from bioinformatics, statistics, and computer science to study genetic variation, gene expression, and their relationships to disease and biological function. The field relies heavily on large public repositories of both genomic and clinical data sets.³³ Resources like The Database of Genotypes and Phenotypes and the UK Biobank provide this access, allowing studies to expand to whole populations and increases their reproducibility. Genomic data science and biomedical data science have led to the development of different projects and methods that have been used as resources for numerous research projects. Some contributions that have come from this field are the Human Genome Project, The Cancer Genome Atlas, and next-generation sequencing.

The Human Genome Project

View of the human genome from the NCBI genome browser source ↗

The Human Genome Project (HGP), which uncovered the DNA sequences that compose human genes, would not have been possible without biomedical data science. Significant computational resources were required to process the data in the HGP, as the human genome contains over 6 billion DNA base pairs.³⁴ Scientists constructed the genome by piecing together small fragments of DNA, and computing overlaps between these sequences alone required over 10,000 CPU hours. At this massive data scale, scientists relied on advanced algorithms to perform data processing steps such as sequence assembly and sequence alignment for quality control.³⁵ Some of these algorithms, such as BLAST, are still used in modern bioinformatics. Scientists in the HGP also had to address complexities often associated with biomedical data including noisy data, such as DNA read errors, and privacy rights of the research subjects.³⁶ The HGP, completed in 2004, has had immense impact both biologically, shedding light on human evolution, and medically, launching the field of bioinformatics and leading to technologies such as genetic screening and gene therapy.

The Cancer Genome Atlas

The Cancer Genome Atlas (TCGA) is a large research network whose main goal was to generate, quality control, merge, analyze, and interpret the molecular profiles of 33 tumor types at the DNA, RNA, protein, and epigenetic levels.³⁷ TCGA was a collaborative project between the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI). Together, they characterized the molecular profiles of thousands of different tumor samples from all different cancer types. TCGA generated multiple datasets, including genomic, transcriptomic, epigenomic and others, along with clinical information.³⁸ Biomedical data science approaches were essential for analyzing and identifying the different tumor samples. The program launched the Pan-Cancer Atlas, which was developed in an effort to dive into more overarching themes surrounding cancer, like the mechanisms and occurrences of genetic changes in oncogenic signaling pathways. The Pan-Cancer atlas shows the origins of more diverse tumors, leading to the development of more clinical trials and targeting therapies for cancer.³⁷

Next-Generation Sequencing

Next-generation sequencing (NGS) powerful technology important for genomics research. It allows for the rapid sequencing of millions of DNA fragments in parallel. NGS provides detailed information regarding genome structure, genetic variations, gene activity, and changes in gene behavior. This development also lowered cost and increased both speed and accuracy, allowing improved data analysis.³⁹ There are several key steps in sequencing that NGS uses, such as DNA fragmentation, library preparation, massive parallel sequencing, bioinformatics analysis, as well as the annotation and interpretation of variants/mutations.⁴⁰ This technology has led to the development of other genome sequencing projects like The Genome Aggregation Database (gnomAD), which is a database of the sequenced exome and genome from 140,000 humans. It is used as a resource for finding possible allele frequencies of rare diseases, as well as the discovery of disease genes and the biological effect of variation. Databases like gnomAD have major applications in clinical research and diagnostics, mostly in cancer genetics.⁴¹

References

Altman, Russ; Levitt, Michael (2018). "What is Biomedical Data Science and Do We Need an Annual Review of It?". Annual Review of Biomedical Data Science. 1: i–iii. doi:10.1146/annurev-bd-01-041718-100001. S2CID 134950609.
Baldi, Pierre (2018). "Deep learning in biomedical data science". Annual Review of Biomedical Data Science. 1: 181–205. doi:10.1146/annurev-biodatasci-080917-013343. S2CID 67381478.
Ronneberger, Olaf; Fischer, Philipp; Brox, Thomas (2015). "U-net: Convolutional networks for biomedical image segmentation". International Conference on Medical Image Computing and Computer-Assisted Intervention. arXiv:1505.04597.
Duncan, James S; Insana, Michael F; Ayache, Nicholas (2020). "Biomedical imaging and analysis in the age of big data and deep learning [scanning the issue]". Proceedings of the IEEE. 108 (1): 3–10. Bibcode:2020IEEEP.108....3D. doi:10.1109/JPROC.2019.2956422. S2CID 210077608.
Su, Chang; Tong, Jie; Zhu, Yongjun; Cu, Peng; Wang, Fei (2020). "Network embedding in biomedical data science". Briefings in Bioinformatics. 21 (1): 182–197. doi:10.1093/bib/bby117. PMID 30535359.
Litjens, Geert; Kooi, Thijs; Bejnordi, Babak Ehteshami; Setio, Arnaud Arindra Adiyoso; Ciompi, Francesco; Ghafoorian, Mohsen; van der Laak, Jeroen A. W. M.; van Ginneken, Bram; Sánchez, Clarisa I. (2017). "Deep learning in medical image analysis". Medical Image Analysis. 42 (5): 60–88. doi:10.1016/j.media.2017.07.005. PMC 5538966. PMID 28778024. S2CID 5373801.
Ronneberger, Olaf; Fischer, Philipp; Brox, Thomas (2015-05-18), U-Net: Convolutional Networks for Biomedical Image Segmentation, arXiv:1505.04597, retrieved 2026-04-23
Gunter, Tracy D.; Terry, Nicolas P. (2005-03-14). "The Emergence of National Electronic Health Record Architectures in the United States and Australia: Models, Costs, and Questions". Journal of Medical Internet Research. 7 (1): e383. doi:10.2196/jmir.7.1.e3. PMC 1550638. PMID 15829475.
Weed, Lawrence L. (1968-03-21). "Medical Records That Guide and Teach". New England Journal of Medicine. 278 (12): 652–657. doi:10.1056/NEJM196803212781204. ISSN 0028-4793. PMID 5637250.
McDonald, Clement J.; Overhage, J. Marc; Tierney, William M.; Dexter, Paul R.; Martin, Douglas K.; Suico, Jeffrey G.; Zafar, Atif; Schadow, Gunther; Blevins, Lonnie; Glazener, Tull; Meeks-Johnson, Jim; Lemmon, Larry; Warvel, Jill; Porterfield, Brian; Warvel, Jeff (1999-06-01). "The Regenstrief Medical Record System: a quarter century experience". International Journal of Medical Informatics. 54 (3): 225–253. doi:10.1016/S1386-5056(99)00009-X. ISSN 1386-5056. PMID 10405881.
Goldstein, Melissa M.; Thorpe Jane, Hyatt (2010-09-01). "The First Anniversary of the Health Information Technology for Economic and Clinical Health (HITECH) Act: the regulatory outlook for implementation". Perspectives in Health Information Management. 7 (Summer): 1c. ISSN 1559-4122. PMC 2921301. PMID 20808607.
Echendu, Ugochukwu; Udeokechukwu, Chidiebere (2025). "Artificial Intelligence and Machine Learning in Healthcare: Developing Privacy-Preserving Frameworks". doi.org. doi:10.2139/ssrn.5125847.
Al Kuwaiti, Ahmed; Nazer, Khalid; Al-Reedy, Abdullah; Al-Shehri, Shaher; Al-Muhanna, Afnan; Subbarayalu, Arun Vijay; Al Muhanna, Dhoha; Al-Muhanna, Fahad A. (2023-06-05). "A Review of the Role of Artificial Intelligence in Healthcare". Journal of Personalized Medicine. 13 (6): 951. doi:10.3390/jpm13060951. ISSN 2075-4426. PMC 10301994. PMID 37373940.
Jumper, John; Evans, Richard; Pritzel, Alexander; Green, Tim; Figurnov, Michael; Ronneberger, Olaf; Tunyasuvunakool, Kathryn; Bates, Russ; Žídek, Augustin; Potapenko, Anna; Bridgland, Alex; Meyer, Clemens; Kohl, Simon A. A.; Ballard, Andrew J.; Cowie, Andrew (2021-08-26). "Highly accurate protein structure prediction with AlphaFold". Nature. 596 (7873): 583–589. Bibcode:2021Natur.596..583J. doi:10.1038/s41586-021-03819-2. ISSN 0028-0836. PMC 8371605. PMID 34265844.
Nicholson, David N.; Greene, Casey S. (2020-01-01). "Constructing knowledge graphs and their biomedical applications". Computational and Structural Biotechnology Journal. 18: 1414–1428. doi:10.1016/j.csbj.2020.05.017. ISSN 2001-0370. PMC 7327409. PMID 32637040.
Hänsel, Katrin; Dudgeon, Sarah N.; Cheung, Kei-Hoi; Durant, Thomas J. S.; Schulz, Wade L. (2023-05-17). "From Data to Wisdom: Biomedical Knowledge Graphs for Real-World Data Insights". Journal of Medical Systems. 47 (1): 65. doi:10.1007/s10916-023-01951-2. ISSN 1573-689X. PMC 10191934. PMID 37195430.
Callahan, Tiffany J.; Tripodi, Ignacio J.; Pielke-Lombardo, Harrison; Hunter, Lawrence E. (April 7, 2020). "Knowledge-Based Biomedical Data Science". Annual Review of Biomedical Data Science. 3: 23–41. doi:10.1146/annurev-biodatasci-010820-091627. ISSN 2574-3414. PMC 8095730. PMID 33954284.
"Finding and Using Health Statistics". National Library of Medicine. National Institutes of Health. Retrieved 19 April 2026.
"Big Data in Healthcare: Uses, Benefits, Challenges, and Real Examples". MGH Institute of Health Professions. 6 April 2026. Retrieved 19 April 2026.
"Summary of the HIPPA Privacy Rule". U.S. Department of Health and Human Services. Retrieved 19 April 2026.
Tegegne, Masresha Derese; Melaku, Mequannent Sharew; Shimie, Aynadis Worku; Hunegnaw, Degefaw Denekew; Legese, Meseret Gashaw; Ejigu, Tewabe Ambaye; Mengestie, Nebyu Demeke; Zemene, Wondewossen; Zeleke, Tirualem; Chanie, Ashenafi Fentahun (2022-03-14). "Health professionals' knowledge and attitude towards patient confidentiality and associated factors in a resource-limited setting: a cross-sectional study". BMC Medical Ethics. 23 (1): 26. doi:10.1186/s12910-022-00765-0. ISSN 1472-6939. PMC 8922732. PMID 35287659.
Diaz, Naomi. "10 biggest healthcare data breaches of 2025". Becker's Health IT. Retrieved 20 April 2026.
"Your health care data is for sale. Here's how Big Pharma is using it". Straight Arrow News. 18 September 2025. Retrieved 19 April 2026.
Wang, Shuang; Bonomi, Luca; Dai, Wenrui; Chen, Feng; Cheung, Cynthia; Bloss, Cinnamon S.; Cheng, Samuel; Jiang, Xiaoqian (13 September 2013). "Big Data Privacy in Biomedical Research". IEEE Transactions on Big Data. 6 (2): 296–308. doi:10.1109/TBDATA.2016.2608848. ISSN 2332-7790. PMC 7258042. PMID 32478127.
Zaringhalam, Maryam; Federer, Lisa; Huerta, Michael. "Core Skills for Biomedical Data Scientists" (PDF). US National Library of Medicine. US National Institutes of Health. Retrieved 21 February 2022.
"Health Data Science". spj.sciencemag.org. Retrieved 2022-07-05.
"Nature Machine Intelligence". nature.com. Retrieved 2022-07-05.
"Scientific Data". nature.com. Retrieved 2022-07-05.
"Biomedical Data Journal". biomed-data.eu. Retrieved 2022-07-05.
"Data". mdpi.com. Retrieved 2022-07-05.
Shen, Yun; Yu, Jiamin; Zhou, Jian; Hu, Gang (2025-01-09). "Twenty-Five Years of Evolution and Hurdles in Electronic Health Records and Interoperability in Medical Research: Comprehensive Review". Journal of Medical Internet Research. 27 (1) e59024. doi:10.2196/59024. PMC 11757985. PMID 39787599.
"BIOMED-DATA 26". www.biomed-data.semmelweis.hu. Retrieved 2026-04-21.
"Genomic Data Science Fact Sheet". www.genome.gov. Retrieved 2026-04-21.
Piovesan, Allison; Pelleri, Maria C; Antonaros, Francesca; Strippoli, Pierluigi; Vitale, Lorenza (2019). "On the length, weight and GC content of the human genome". BMC Research Notes. 12 (1): 106. Bibcode:2019BMCRN..12..106P. doi:10.1186/s13104-019-4137-z. PMC 6391780. PMID 30813969.
Altschul, Stephen F; Gish, Warren; Miller, Webb; Myers, Eugene W; Lipman, David J (1990). "Basic local alignment search tool". Journal of Molecular Biology. 215 (3): 403–410. doi:10.1016/S0022-2836(05)80360-2. PMID 2231712. S2CID 14441902.
Venter, J. Craig; et al. (2001). "The sequence of the human genome". Science. 291 (5507): 1304–1351. Bibcode:2001Sci...291.1304V. doi:10.1126/science.1058040. PMID 11181995.
Hu, Hao-Liang; Zeng, Dan-Dan; Zang, Jing-Lei; Chen, Zhe (July 2020). "The Pan-Cancer Atlas: a New Chapter in Cancer Molecular Targeting Therapy". Pathology Oncology Research: POR. 26 (3): 1997–1999. doi:10.1007/s12253-019-00709-x. ISSN 1532-2807. PMID 31468361.
Weinstein, John N.; Collisson, Eric A.; Mills, Gordon B.; Shaw, Kenna R. Mills; Ozenberger, Brad A.; Ellrott, Kyle; Shmulevich, Ilya; Sander, Chris; Stuart, Joshua M. (October 2013). "The Cancer Genome Atlas Pan-Cancer analysis project". Nature Genetics. 45 (10): 1113–1120. doi:10.1038/ng.2764. ISSN 1546-1718. PMC 3919969. PMID 24071849.
Satam, Heena; Joshi, Kandarp; Mangrolia, Upasana; Waghoo, Sanober; Zaidi, Gulnaz; Rawool, Shravani; Thakare, Ritesh P.; Banday, Shahid; Mishra, Alok K.; Das, Gautam; Malonia, Sunil K. (2023-07-13). "Next-Generation Sequencing Technology: Current Trends and Advancements". Biology. 12 (7): 997. Bibcode:2023Biol...12..997S. doi:10.3390/biology12070997. ISSN 2079-7737. PMC 10376292. PMID 37508427.
Qin, Dahui (February 2019). "Next-generation sequencing and its clinical application". Cancer Biology & Medicine. 16 (1): 4–10. doi:10.20892/j.issn.2095-3941.2018.0055. ISSN 2095-3941. PMC 6528456. PMID 31119042.
Satam, Heena; Joshi, Kandarp; Mangrolia, Upasana; Waghoo, Sanober; Zaidi, Gulnaz; Rawool, Shravani; Thakare, Ritesh P.; Banday, Shahid; Mishra, Alok K.; Das, Gautam; Malonia, Sunil K. (2023-07-13). "Next-Generation Sequencing Technology: Current Trends and Advancements". Biology. 12 (7): 997. Bibcode:2023Biol...12..997S. doi:10.3390/biology12070997. ISSN 2079-7737. PMC 10376292. PMID 37508427.

[what-is-1] Altman, Russ; Levitt, Michael (2018). "What is Biomedical Data Science and Do We Need an Annual Review of It?". Annual Review of Biomedical Data Science. 1: i–iii. doi:10.1146/annurev-bd-01-041718-100001. S2CID 134950609.

[deep-learning-2] Baldi, Pierre (2018). "Deep learning in biomedical data science". Annual Review of Biomedical Data Science. 1: 181–205. doi:10.1146/annurev-biodatasci-080917-013343. S2CID 67381478.

[u-net-3] Ronneberger, Olaf; Fischer, Philipp; Brox, Thomas (2015). "U-net: Convolutional networks for biomedical image segmentation". International Conference on Medical Image Computing and Computer-Assisted Intervention. arXiv:1505.04597.

[4] Duncan, James S; Insana, Michael F; Ayache, Nicholas (2020). "Biomedical imaging and analysis in the age of big data and deep learning [scanning the issue]". Proceedings of the IEEE. 108 (1): 3–10. Bibcode:2020IEEEP.108....3D. doi:10.1109/JPROC.2019.2956422. S2CID 210077608.

[5] Su, Chang; Tong, Jie; Zhu, Yongjun; Cu, Peng; Wang, Fei (2020). "Network embedding in biomedical data science". Briefings in Bioinformatics. 21 (1): 182–197. doi:10.1093/bib/bby117. PMID 30535359.

[litjens2017-6] Litjens, Geert; Kooi, Thijs; Bejnordi, Babak Ehteshami; Setio, Arnaud Arindra Adiyoso; Ciompi, Francesco; Ghafoorian, Mohsen; van der Laak, Jeroen A. W. M.; van Ginneken, Bram; Sánchez, Clarisa I. (2017). "Deep learning in medical image analysis". Medical Image Analysis. 42 (5): 60–88. doi:10.1016/j.media.2017.07.005. PMC 5538966. PMID 28778024. S2CID 5373801.

[7] Ronneberger, Olaf; Fischer, Philipp; Brox, Thomas (2015-05-18), U-Net: Convolutional Networks for Biomedical Image Segmentation, arXiv:1505.04597, retrieved 2026-04-23

[:4-8] Gunter, Tracy D.; Terry, Nicolas P. (2005-03-14). "The Emergence of National Electronic Health Record Architectures in the United States and Australia: Models, Costs, and Questions". Journal of Medical Internet Research. 7 (1): e383. doi:10.2196/jmir.7.1.e3. PMC 1550638. PMID 15829475.

[:5-9] Weed, Lawrence L. (1968-03-21). "Medical Records That Guide and Teach". New England Journal of Medicine. 278 (12): 652–657. doi:10.1056/NEJM196803212781204. ISSN 0028-4793. PMID 5637250.

[10] McDonald, Clement J.; Overhage, J. Marc; Tierney, William M.; Dexter, Paul R.; Martin, Douglas K.; Suico, Jeffrey G.; Zafar, Atif; Schadow, Gunther; Blevins, Lonnie; Glazener, Tull; Meeks-Johnson, Jim; Lemmon, Larry; Warvel, Jill; Porterfield, Brian; Warvel, Jeff (1999-06-01). "The Regenstrief Medical Record System: a quarter century experience". International Journal of Medical Informatics. 54 (3): 225–253. doi:10.1016/S1386-5056(99)00009-X. ISSN 1386-5056. PMID 10405881.

[11] Goldstein, Melissa M.; Thorpe Jane, Hyatt (2010-09-01). "The First Anniversary of the Health Information Technology for Economic and Clinical Health (HITECH) Act: the regulatory outlook for implementation". Perspectives in Health Information Management. 7 (Summer): 1c. ISSN 1559-4122. PMC 2921301. PMID 20808607.

[12] Echendu, Ugochukwu; Udeokechukwu, Chidiebere (2025). "Artificial Intelligence and Machine Learning in Healthcare: Developing Privacy-Preserving Frameworks". doi.org. doi:10.2139/ssrn.5125847.

[13] Al Kuwaiti, Ahmed; Nazer, Khalid; Al-Reedy, Abdullah; Al-Shehri, Shaher; Al-Muhanna, Afnan; Subbarayalu, Arun Vijay; Al Muhanna, Dhoha; Al-Muhanna, Fahad A. (2023-06-05). "A Review of the Role of Artificial Intelligence in Healthcare". Journal of Personalized Medicine. 13 (6): 951. doi:10.3390/jpm13060951. ISSN 2075-4426. PMC 10301994. PMID 37373940.

[14] Jumper, John; Evans, Richard; Pritzel, Alexander; Green, Tim; Figurnov, Michael; Ronneberger, Olaf; Tunyasuvunakool, Kathryn; Bates, Russ; Žídek, Augustin; Potapenko, Anna; Bridgland, Alex; Meyer, Clemens; Kohl, Simon A. A.; Ballard, Andrew J.; Cowie, Andrew (2021-08-26). "Highly accurate protein structure prediction with AlphaFold". Nature. 596 (7873): 583–589. Bibcode:2021Natur.596..583J. doi:10.1038/s41586-021-03819-2. ISSN 0028-0836. PMC 8371605. PMID 34265844.

[:0-15] Nicholson, David N.; Greene, Casey S. (2020-01-01). "Constructing knowledge graphs and their biomedical applications". Computational and Structural Biotechnology Journal. 18: 1414–1428. doi:10.1016/j.csbj.2020.05.017. ISSN 2001-0370. PMC 7327409. PMID 32637040.

[:1-16] Hänsel, Katrin; Dudgeon, Sarah N.; Cheung, Kei-Hoi; Durant, Thomas J. S.; Schulz, Wade L. (2023-05-17). "From Data to Wisdom: Biomedical Knowledge Graphs for Real-World Data Insights". Journal of Medical Systems. 47 (1): 65. doi:10.1007/s10916-023-01951-2. ISSN 1573-689X. PMC 10191934. PMID 37195430.

[:2-17] Callahan, Tiffany J.; Tripodi, Ignacio J.; Pielke-Lombardo, Harrison; Hunter, Lawrence E. (April 7, 2020). "Knowledge-Based Biomedical Data Science". Annual Review of Biomedical Data Science. 3: 23–41. doi:10.1146/annurev-biodatasci-010820-091627. ISSN 2574-3414. PMC 8095730. PMID 33954284.

[Health_Statistics-18] "Finding and Using Health Statistics". National Library of Medicine. National Institutes of Health. Retrieved 19 April 2026.

[19] "Big Data in Healthcare: Uses, Benefits, Challenges, and Real Examples". MGH Institute of Health Professions. 6 April 2026. Retrieved 19 April 2026.

[HIPPA-20] "Summary of the HIPPA Privacy Rule". U.S. Department of Health and Human Services. Retrieved 19 April 2026.

[21] Tegegne, Masresha Derese; Melaku, Mequannent Sharew; Shimie, Aynadis Worku; Hunegnaw, Degefaw Denekew; Legese, Meseret Gashaw; Ejigu, Tewabe Ambaye; Mengestie, Nebyu Demeke; Zemene, Wondewossen; Zeleke, Tirualem; Chanie, Ashenafi Fentahun (2022-03-14). "Health professionals' knowledge and attitude towards patient confidentiality and associated factors in a resource-limited setting: a cross-sectional study". BMC Medical Ethics. 23 (1): 26. doi:10.1186/s12910-022-00765-0. ISSN 1472-6939. PMC 8922732. PMID 35287659.

[22] Diaz, Naomi. "10 biggest healthcare data breaches of 2025". Becker's Health IT. Retrieved 20 April 2026.

[23] "Your health care data is for sale. Here's how Big Pharma is using it". Straight Arrow News. 18 September 2025. Retrieved 19 April 2026.

[24] Wang, Shuang; Bonomi, Luca; Dai, Wenrui; Chen, Feng; Cheung, Cynthia; Bloss, Cinnamon S.; Cheng, Samuel; Jiang, Xiaoqian (13 September 2013). "Big Data Privacy in Biomedical Research". IEEE Transactions on Big Data. 6 (2): 296–308. doi:10.1109/TBDATA.2016.2608848. ISSN 2332-7790. PMC 7258042. PMID 32478127.

[25] Zaringhalam, Maryam; Federer, Lisa; Huerta, Michael. "Core Skills for Biomedical Data Scientists" (PDF). US National Library of Medicine. US National Institutes of Health. Retrieved 21 February 2022.

[26] "Health Data Science". spj.sciencemag.org. Retrieved 2022-07-05.

[27] "Nature Machine Intelligence". nature.com. Retrieved 2022-07-05.

[28] "Scientific Data". nature.com. Retrieved 2022-07-05.

[29] "Biomedical Data Journal". biomed-data.eu. Retrieved 2022-07-05.

[30] "Data". mdpi.com. Retrieved 2022-07-05.

[31] Shen, Yun; Yu, Jiamin; Zhou, Jian; Hu, Gang (2025-01-09). "Twenty-Five Years of Evolution and Hurdles in Electronic Health Records and Interoperability in Medical Research: Comprehensive Review". Journal of Medical Internet Research. 27 (1) e59024. doi:10.2196/59024. PMC 11757985. PMID 39787599.

[32] "BIOMED-DATA 26". www.biomed-data.semmelweis.hu. Retrieved 2026-04-21.

[33] "Genomic Data Science Fact Sheet". www.genome.gov. Retrieved 2026-04-21.

[hDNA-34] Piovesan, Allison; Pelleri, Maria C; Antonaros, Francesca; Strippoli, Pierluigi; Vitale, Lorenza (2019). "On the length, weight and GC content of the human genome". BMC Research Notes. 12 (1): 106. Bibcode:2019BMCRN..12..106P. doi:10.1186/s13104-019-4137-z. PMC 6391780. PMID 30813969.

[blast-35] Altschul, Stephen F; Gish, Warren; Miller, Webb; Myers, Eugene W; Lipman, David J (1990). "Basic local alignment search tool". Journal of Molecular Biology. 215 (3): 403–410. doi:10.1016/S0022-2836(05)80360-2. PMID 2231712. S2CID 14441902.

[hgp-36] Venter, J. Craig; et al. (2001). "The sequence of the human genome". Science. 291 (5507): 1304–1351. Bibcode:2001Sci...291.1304V. doi:10.1126/science.1058040. PMID 11181995.

[:3-37] Hu, Hao-Liang; Zeng, Dan-Dan; Zang, Jing-Lei; Chen, Zhe (July 2020). "The Pan-Cancer Atlas: a New Chapter in Cancer Molecular Targeting Therapy". Pathology Oncology Research: POR. 26 (3): 1997–1999. doi:10.1007/s12253-019-00709-x. ISSN 1532-2807. PMID 31468361.

[38] Weinstein, John N.; Collisson, Eric A.; Mills, Gordon B.; Shaw, Kenna R. Mills; Ozenberger, Brad A.; Ellrott, Kyle; Shmulevich, Ilya; Sander, Chris; Stuart, Joshua M. (October 2013). "The Cancer Genome Atlas Pan-Cancer analysis project". Nature Genetics. 45 (10): 1113–1120. doi:10.1038/ng.2764. ISSN 1546-1718. PMC 3919969. PMID 24071849.

[39] Satam, Heena; Joshi, Kandarp; Mangrolia, Upasana; Waghoo, Sanober; Zaidi, Gulnaz; Rawool, Shravani; Thakare, Ritesh P.; Banday, Shahid; Mishra, Alok K.; Das, Gautam; Malonia, Sunil K. (2023-07-13). "Next-Generation Sequencing Technology: Current Trends and Advancements". Biology. 12 (7): 997. Bibcode:2023Biol...12..997S. doi:10.3390/biology12070997. ISSN 2079-7737. PMC 10376292. PMID 37508427.

[40] Qin, Dahui (February 2019). "Next-generation sequencing and its clinical application". Cancer Biology & Medicine. 16 (1): 4–10. doi:10.20892/j.issn.2095-3941.2018.0055. ISSN 2095-3941. PMC 6528456. PMID 31119042.

[41] Satam, Heena; Joshi, Kandarp; Mangrolia, Upasana; Waghoo, Sanober; Zaidi, Gulnaz; Rawool, Shravani; Thakare, Ritesh P.; Banday, Shahid; Mishra, Alok K.; Das, Gautam; Malonia, Sunil K. (2023-07-13). "Next-Generation Sequencing Technology: Current Trends and Advancements". Biology. 12 (7): 997. Bibcode:2023Biol...12..997S. doi:10.3390/biology12070997. ISSN 2079-7737. PMC 10376292. PMID 37508427.

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41