Article · Wikipedia archive · Last revised May 28, 2026

Machine-learned interatomic potential

Machine-learned interatomic potentials (MLIPs), or simply machine learning potentials (MLPs), are interatomic potentials constructed using machine learning. Beginning in the 1990s, researchers have employed such programs to construct interatomic potentials by mapping atomic structures to their potential energies. These potentials are referred to as MLIPs or MLPs.

Last revised
May 28, 2026
Read time
≈ 9 min
Length
1,960 w
Citations
30
Source

Machine-learned interatomic potentials (MLIPs), or simply machine learning potentials (MLPs), are interatomic potentials constructed using machine learning. Beginning in the 1990s, researchers have employed such programs to construct interatomic potentials by mapping atomic structures to their potential energies. These potentials are referred to as MLIPs or MLPs.

Such machine learning potentials promised to fill the gap between density functional theory, a highly accurate but computationally intensive modelling method, and empirically derived or intuitively-approximated potentials, which were far lighter computationally but substantially less accurate. Improvements in artificial intelligence technology heightened the accuracy of MLPs while lowering their computational cost, increasing the role of machine learning in fitting potentials.12

Machine learning potentials began by using neural networks to tackle low-dimensional systems. While promising, these models could not systematically account for interatomic energy interactions; they could be applied to small molecules in a vacuum, or molecules interacting with frozen surfaces, but not much else – and even in these applications, the models often relied on force fields or potentials derived empirically or with simulations.1 These models thus remained confined to academia.

Modern neural networks construct highly accurate and computationally light potentials, as theoretical understanding of materials science was increasingly built into their architectures and preprocessing. Almost all are local, accounting for all interactions between an atom and its neighbor up to some cutoff radius. There exist some nonlocal models, but these have been experimental for almost a decade. For most systems, reasonable cutoff radii enable highly accurate results.13

Almost all neural networks intake atomic coordinates and output potential energies. For some, these atomic coordinates are converted into atom-centered symmetry functions. From this data, a separate atomic neural network is trained for each element; each atomic network is evaluated whenever that element occurs in the given structure, and then the results are pooled together at the end. This process – in particular, the atom-centered symmetry functions which convey translational, rotational, and permutational invariances – has greatly improved machine learning potentials by significantly constraining the neural network search space. Other models use a similar process but emphasize bonds over atoms, using pair symmetry functions and training one network per atom pair.14

Other models to learn their own descriptors rather than using predetermined symmetry-dictating functions. These models, called message-passing neural networks (MPNNs), are graph neural networks. Treating molecules as three-dimensional graphs (where atoms are nodes and bonds are edges), the model takes feature vectors describing the atoms as input, and iteratively updates these vectors as information about neighboring atoms is processed through message functions and convolutions. These feature vectors are then used to predict the final potentials. The flexibility of this method often results in stronger, more generalizable models. In 2017, the first-ever MPNN model (a deep tensor neural network) was used to calculate the properties of small organic molecules.

Gaussian Approximation Potential (GAP)

One popular class of machine-learned interatomic potential is the Gaussian Approximation Potential (GAP),567 which combines compact descriptors of local atomic environments8 with Gaussian process regression9 to machine learn the potential energy surface of a given system. To date, the GAP framework has been used to successfully develop a number of MLIPs for various systems, including for elemental systems such as carbon,1011 silicon,12 phosphorus,13 and tungsten,14 as well as for multicomponent systems such as Ge2Sb2Te515 and austenitic stainless steel, Fe7Cr2Ni.16

Equivariant graph neural networks

A significant limitation of early MPNNs was that they were not inherently equivariant to rotations and reflections of atomic structures — meaning predictions could change depending on how a molecule was oriented in space. Beginning around 2021, a new class of models addressed this by incorporating equivariance directly into the message-passing layers using spherical harmonics and irreducible representations. Notable examples include NequIP17 (2021), MACE18 (2022), and GemNet-OC19 (2022). These equivariant architectures proved substantially more data-efficient and accurate than their predecessors, and became the dominant paradigm for high-accuracy MLIPs.

Universal MLIPs and large-scale datasets

Early MLIPs were system-specific, trained on a few thousand structures of a single material. A major shift occurred with the creation of large, chemically diverse datasets enabling models that generalize across many elements, bonding environments, and application domains — so-called universal MLIPs.

A key driver was the Open Catalyst Project20 (OC2021, OC2222), a collaboration between Meta AI (FAIR) and Carnegie Mellon University launched in 2020. OC20 comprises approximately 1.3 million DFT relaxations across 82 elements, designed to accelerate the discovery of catalysts for renewable energy applications. It was among the first datasets large enough to train GNNs that generalize across diverse chemical systems, and established a widely-used benchmark for the field. A subsequent dataset, Open Direct Air Capture (OpenDAC 202323 and OpenDAC 202524), applied the same approach to carbon capture, providing a large computational database of metal-organic frameworks and sorbent candidates evaluated for CO₂ capture, generated using nearly 400 million CPU hours of quantum chemistry calculations in collaboration with Georgia Tech.

These datasets revealed a new challenge: the GNN architectures most effective for atomic simulations were memory-intensive, as they model higher-order interactions between triplets or quadruplets of atoms, making it difficult to scale model size. Graph Parallelism, introduced by Sriram et al. (ICLR 2022)25, addressed this by distributing a single input graph across multiple GPUs — a distinct strategy from data parallelism (which distributes training examples) or model parallelism (which distributes layers). This enabled training GNNs with hundreds of millions to billions of parameters for the first time.

Building on these foundations, Meta FAIR released the Universal Model for Atoms (UMA)26 in 2025, trained on approximately 500 million unique 3D atomic structures spanning molecules, materials, and catalysts — the largest training run to date for an MLIP. UMA introduced a Mixture of Linear Experts (MoLE) architecture, enabling one model to learn from datasets generated by different DFT codes and settings without significant inference overhead. It matches or surpasses specialized models across catalysis, materials, and molecular benchmarks without task-specific fine-tuning, and has been described as marking a "pre/post-UMA" divide in the field.

Applications

Catalyst discovery: MLIPs have significantly accelerated the computational screening of heterogeneous catalysts by replacing expensive DFT relaxations with fast neural network surrogates. The Open Catalyst Project explicitly targets this application, aiming to identify new catalysts for green hydrogen production and other renewable energy reactions.

Carbon capture: The OpenDAC project applies universal MLIPs to screening sorbent materials for direct air capture of CO₂, a key technology for climate change mitigation. AI-accelerated screening allows evaluation of orders of magnitude more candidate materials than traditional DFT workflows.

Drug discovery and molecular design: MLIPs are increasingly used in pharmaceutical research to model molecular conformations and binding energies. The Open Molecules 2025 (OMol25)27 dataset, released by Meta FAIR in 2025, provides high-accuracy calculations for a large set of molecular systems to support this use case.

Materials discovery: Universal MLIPs enable high-throughput screening of novel inorganic materials, including battery electrolytes, semiconductors, and superconductors, by rapidly estimating stability and properties across large chemical spaces.

References

References

  1. Kocer, Emir; Ko, Tsz Wai; Behler, Jorg (2022). "Neural Network Potentials: A Concise Overview of Methods". Annual Review of Physical Chemistry. 73: 163–86. arXiv:2107.03727. Bibcode:2022ARPC...73..163K. doi:10.1146/annurev-physchem-082720-034254. PMID 34982580.
  2. Blank, TB; Brown, SD; Calhoun, AW; Doren, DJ (1995). "Neural network models of potential energy surfaces". Journal of Chemical Physics. 103 (10): 4129–37. Bibcode:1995JChPh.103.4129B. doi:10.1063/1.469597.
  3. Ghasemi, SA; Hofstetter, A; Saha, S; Goedecker, S (2015). "Interatomic potentials for ionic systems with density functional accuracy based on charge densities obtained by a neural network". Physical Review B. 92 (4) 045131. arXiv:1501.07344. Bibcode:2015PhRvB..92d5131G. doi:10.1103/PhysRevB.92.045131.
  4. Behler, J; Parrinello, M (2007). "Generalized neural-network representation of high-dimensional potential-energy surfaces". Physical Review Letters. 148 (14) 146401. Bibcode:2007PhRvL..98n6401B. doi:10.1103/PhysRevLett.98.146401. PMID 17501293.
  5. Bartók, Albert P.; Payne, Mike C.; Kondor, Risi; Csányi, Gábor (2010-04-01). "Gaussian Approximation Potentials: The Accuracy of Quantum Mechanics, without the Electrons". Physical Review Letters. 104 (13) 136403. arXiv:0910.1019. Bibcode:2010PhRvL.104m6403B. doi:10.1103/PhysRevLett.104.136403. PMID 20481899.
  6. Bartók, Albert P.; De, Sandip; Poelking, Carl; Bernstein, Noam; Kermode, James R.; Csányi, Gábor; Ceriotti, Michele (December 2017). "Machine learning unifies the modeling of materials and molecules". Science Advances. 3 (12) e1701816. arXiv:1706.00179. Bibcode:2017SciA....3E1816B. doi:10.1126/sciadv.1701816. ISSN 2375-2548. PMC 5729016. PMID 29242828.
  7. "Gaussian approximation potential – Machine learning atomistic simulation of materials and molecules". Retrieved 2024-04-04.
  8. Bartók, Albert P.; Kondor, Risi; Csányi, Gábor (2013-05-28). "On representing chemical environments". Physical Review B. 87 (18) 184115. arXiv:1209.3140. Bibcode:2013PhRvB..87r4115B. doi:10.1103/PhysRevB.87.184115.
  9. Rasmussen, Carl Edward; Williams, Christopher K. I. (2008). Gaussian processes for machine learning. Adaptive computation and machine learning (3. print ed.). Cambridge, Mass.: MIT Press. ISBN 978-0-262-18253-9.
  10. Rowe, Patrick; Deringer, Volker L.; Gasparotto, Piero; Csányi, Gábor; Michaelides, Angelos (2020-07-21). "An accurate and transferable machine learning potential for carbon". The Journal of Chemical Physics. 153 (3) 034702. arXiv:2006.13655. Bibcode:2020JChPh.153c4702R. doi:10.1063/5.0005084. ISSN 0021-9606. PMID 32716159.
  11. Deringer, Volker L.; Csányi, Gábor (2017-03-03). "Machine learning based interatomic potential for amorphous carbon". Physical Review B. 95 (9) 094203. arXiv:1611.03277. Bibcode:2017PhRvB..95i4203D. doi:10.1103/PhysRevB.95.094203.
  12. Bartók, Albert P.; Kermode, James; Bernstein, Noam; Csányi, Gábor (2018-12-14). "Machine Learning a General-Purpose Interatomic Potential for Silicon". Physical Review X. 8 (4) 041048. arXiv:1805.01568. Bibcode:2018PhRvX...8d1048B. doi:10.1103/PhysRevX.8.041048.
  13. Deringer, Volker L.; Caro, Miguel A.; Csányi, Gábor (2020-10-29). "A general-purpose machine-learning force field for bulk and nanostructured phosphorus". Nature Communications. 11 (1): 5461. Bibcode:2020NatCo..11.5461D. doi:10.1038/s41467-020-19168-z. ISSN 2041-1723. PMC 7596484. PMID 33122630.
  14. Szlachta, Wojciech J.; Bartók, Albert P.; Csányi, Gábor (2014-09-24). "Accuracy and transferability of Gaussian approximation potential models for tungsten". Physical Review B. 90 (10) 104108. Bibcode:2014PhRvB..90j4108S. doi:10.1103/PhysRevB.90.104108.
  15. Mocanu, Felix C.; Konstantinou, Konstantinos; Lee, Tae Hoon; Bernstein, Noam; Deringer, Volker L.; Csányi, Gábor; Elliott, Stephen R. (2018-09-27). "Modeling the Phase-Change Memory Material, Ge 2 Sb 2 Te 5, with a Machine-Learned Interatomic Potential". The Journal of Physical Chemistry B. 122 (38): 8998–9006. Bibcode:2018JPCB..122.8998M. doi:10.1021/acs.jpcb.8b06476. ISSN 1520-6106. PMID 30173522.
  16. Shenoy, Lakshmi; Woodgate, Christopher D.; Staunton, Julie B.; Bartók, Albert P.; Becquart, Charlotte S.; Domain, Christophe; Kermode, James R. (2024-03-22). "Collinear-spin machine learned interatomic potential for ${\mathrm{Fe}}_{7}{\mathrm{Cr}}_{2}\mathrm{Ni}$ alloy". Physical Review Materials. 8 (3): 033804. arXiv:2309.08689. doi:10.1103/PhysRevMaterials.8.033804.{{cite journal}}: CS1 maint: article number as page number (link)
  17. Batzner, Simon; Musaelian, Albert; Sun, Lixin; Geiger, Mario; Mailoa, Jonathan P.; Kornbluth, Mordechai; Molinari, Nicola; Smidt, Tess E.; Kozinsky, Boris (2022-05-04). "E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials". Nature Communications. 13 (1): 2453. arXiv:2101.03164. Bibcode:2022NatCo..13.2453B. doi:10.1038/s41467-022-29939-5. ISSN 2041-1723. PMC 9068614. PMID 35508450.
  18. Batatia, Ilyes; Kovács, Dávid Péter; Simm, Gregor N. C.; Ortner, Christoph; Csányi, Gábor (2023-01-26), MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields, arXiv:2206.07697, retrieved 2026-04-28
  19. Gasteiger, Johannes; Shuaibi, Muhammed; Sriram, Anuroop; Günnemann, Stephan; Ulissi, Zachary; Zitnick, C. Lawrence; Das, Abhishek (2022-09-30), GemNet-OC: Developing Graph Neural Networks for Large and Diverse Molecular Simulation Datasets, arXiv:2204.02782, retrieved 2026-04-28
  20. "Open Catalyst Project". opencatalystproject.org. Retrieved 2026-04-28.
  21. Chanussot, Lowik; Das, Abhishek; Goyal, Siddharth; et al. (2021). "Open Catalyst 2020 (OC20) Dataset and Community Challenges". ACS Catalysis. 11 (10): 6059–6072. arXiv:2010.09990. doi:10.1021/acscatal.0c04525. S2CID 233564583.
  22. Tran, Richard; Lan, Janice; Shuaibi, Muhammed; et al. (2023). "The Open Catalyst 2022 (OC22) Dataset and Challenges for Oxide Electrocatalysis". ACS Catalysis. 13 (5): 3066–3084. doi:10.1021/acscatal.2c05426.
  23. Barroso-Luque, Luis; Sriram, Anuroop; Fu, Xiang; et al. (2024). "The Open DAC 2023 Dataset and Challenges for Sorbent Discovery in Direct Air Capture". ACS Central Science. 10 (5): 923–941. doi:10.1021/acscentsci.3c01629. PMC 11117325. PMID 38799660.
  24. Sriram, Anuroop; Brabson, Logan M.; Yu, Xiaohan; Choi, Sihoon; Abdelmaqsoud, Kareem; Moubarak, Elias; Haan, Pim de; Löwe, Sindy; Brehmer, Johann (2025-09-23), The Open DAC 2025 Dataset for Sorbent Discovery in Direct Air Capture, arXiv:2508.03162, retrieved 2026-04-28
  25. Sriram, Anuroop; Das, Abhishek; Wood, Brandon M.; Goyal, Siddharth; Zitnick, C. Lawrence (2022-03-18), Towards Training Billion Parameter Graph Neural Networks for Atomic Simulations, arXiv:2203.09697, retrieved 2026-04-28
  26. Wood, Brandon M.; Dzamba, Misko; Fu, Xiang; Gao, Meng; Shuaibi, Muhammed; Barroso-Luque, Luis; Abdelmaqsoud, Kareem; Gharakhanyan, Vahe; Kitchin, John R. (2026-03-04), UMA: A Family of Universal Models for Atoms, arXiv:2506.23971, retrieved 2026-04-28
  27. Levine, Daniel S.; Shuaibi, Muhammed; Spotte-Smith, Evan Walter Clark; Taylor, Michael G.; Hasyim, Muhammad R.; Michel, Kyle; Batatia, Ilyes; Csányi, Gábor; Dzamba, Misko (2026-03-04), The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models, arXiv:2505.08762, retrieved 2026-04-28