Get trending papers in your email inbox once a day!
Get trending papers in your email inbox!
SubscribeUnderstanding and Mitigating Distribution Shifts For Machine Learning Force Fields
Machine Learning Force Fields (MLFFs) are a promising alternative to expensive ab initio quantum mechanical molecular simulations. Given the diversity of chemical spaces that are of interest and the cost of generating new data, it is important to understand how MLFFs generalize beyond their training distributions. In order to characterize and better understand distribution shifts in MLFFs, we conduct diagnostic experiments on chemical datasets, revealing common shifts that pose significant challenges, even for large foundation models trained on extensive data. Based on these observations, we hypothesize that current supervised training methods inadequately regularize MLFFs, resulting in overfitting and learning poor representations of out-of-distribution systems. We then propose two new methods as initial steps for mitigating distribution shifts for MLFFs. Our methods focus on test-time refinement strategies that incur minimal computational cost and do not use expensive ab initio reference labels. The first strategy, based on spectral graph theory, modifies the edges of test graphs to align with graph structures seen during training. Our second strategy improves representations for out-of-distribution systems at test-time by taking gradient steps using an auxiliary objective, such as a cheap physical prior. Our test-time refinement strategies significantly reduce errors on out-of-distribution systems, suggesting that MLFFs are capable of and can move towards modeling diverse chemical spaces, but are not being effectively trained to do so. Our experiments establish clear benchmarks for evaluating the generalization capabilities of the next generation of MLFFs. Our code is available at https://tkreiman.github.io/projects/mlff_distribution_shifts/.
Machine Learning Force Fields with Data Cost Aware Training
Machine learning force fields (MLFF) have been proposed to accelerate molecular dynamics (MD) simulation, which finds widespread applications in chemistry and biomedical research. Even for the most data-efficient MLFFs, reaching chemical accuracy can require hundreds of frames of force and energy labels generated by expensive quantum mechanical algorithms, which may scale as O(n^3) to O(n^7), with n proportional to the number of basis functions. To address this issue, we propose a multi-stage computational framework -- ASTEROID, which lowers the data cost of MLFFs by leveraging a combination of cheap inaccurate data and expensive accurate data. The motivation behind ASTEROID is that inaccurate data, though incurring large bias, can help capture the sophisticated structures of the underlying force field. Therefore, we first train a MLFF model on a large amount of inaccurate training data, employing a bias-aware loss function to prevent the model from overfitting tahe potential bias of this data. We then fine-tune the obtained model using a small amount of accurate training data, which preserves the knowledge learned from the inaccurate training data while significantly improving the model's accuracy. Moreover, we propose a variant of ASTEROID based on score matching for the setting where the inaccurate training data are unlabeled. Extensive experiments on MD datasets and downstream tasks validate the efficacy of ASTEROID. Our code and data are available at https://github.com/abukharin3/asteroid.
SimPoly: Simulation of Polymers with Machine Learning Force Fields Derived from First Principles
Polymers are a versatile class of materials with widespread industrial applications. Advanced computational tools could revolutionize their design, but their complex, multi-scale nature poses significant modeling challenges. Conventional force fields often lack the accuracy and transferability required to capture the intricate interactions governing polymer behavior. Conversely, quantum-chemical methods are computationally prohibitive for the large systems and long timescales required to simulate relevant polymer phenomena. Here, we overcome these limitations with a machine learning force field (MLFF) approach. We demonstrate that macroscopic properties for a broad range of polymers can be predicted ab initio, without fitting to experimental data. Specifically, we develop a fast and scalable MLFF to accurately predict polymer densities, outperforming established classical force fields. Our MLFF also captures second-order phase transitions, enabling the prediction of glass transition temperatures. To accelerate progress in this domain, we introduce a benchmark of experimental bulk properties for 130 polymers and an accompanying quantum-chemical dataset. This work lays the foundation for a fully in silico design pipeline for next-generation polymeric materials.
Symmetry-invariant quantum machine learning force fields
Machine learning techniques are essential tools to compute efficient, yet accurate, force fields for atomistic simulations. This approach has recently been extended to incorporate quantum computational methods, making use of variational quantum learning models to predict potential energy surfaces and atomic forces from ab initio training data. However, the trainability and scalability of such models are still limited, due to both theoretical and practical barriers. Inspired by recent developments in geometric classical and quantum machine learning, here we design quantum neural networks that explicitly incorporate, as a data-inspired prior, an extensive set of physically relevant symmetries. We find that our invariant quantum learning models outperform their more generic counterparts on individual molecules of growing complexity. Furthermore, we study a water dimer as a minimal example of a system with multiple components, showcasing the versatility of our proposed approach and opening the way towards larger simulations. Our results suggest that molecular force fields generation can significantly profit from leveraging the framework of geometric quantum machine learning, and that chemical systems represent, in fact, an interesting and rich playground for the development and application of advanced quantum machine learning tools.
BAMBOO: a predictive and transferable machine learning force field framework for liquid electrolyte development
Despite the widespread applications of machine learning force field (MLFF) on solids and small molecules, there is a notable gap in applying MLFF to complex liquid electrolytes. In this work, we introduce BAMBOO (ByteDance AI Molecular Simulation Booster), a novel framework for molecular dynamics (MD) simulations, with a demonstration of its capabilities in the context of liquid electrolytes for lithium batteries. We design a physics-inspired graph equivariant transformer architecture as the backbone of BAMBOO to learn from quantum mechanical simulations. Additionally, we pioneer an ensemble knowledge distillation approach and apply it on MLFFs to improve the stability of MD simulations. Finally, we propose the density alignment algorithm to align BAMBOO with experimental measurements. BAMBOO demonstrates state-of-the-art accuracy in predicting key electrolyte properties such as density, viscosity, and ionic conductivity across various solvents and salt combinations. Our current model, trained on more than 15 chemical species, achieves the average density error of 0.01 g/cm^3 on various compositions compared with experimental data. Moreover, our model demonstrates transferability to molecules not included in the quantum mechanical dataset. We envision this work as paving the way to a "universal MLFF" capable of simulating properties of common organic liquids.
ADAPT: Lightweight, Long-Range Machine Learning Force Fields Without Graphs
Point defects play a central role in driving the properties of materials. First-principles methods are widely used to compute defect energetics and structures, including at scale for high-throughput defect databases. However, these methods are computationally expensive, making machine-learning force fields (MLFFs) an attractive alternative for accelerating structural relaxations. Most existing MLFFs are based on graph neural networks (GNNs), which can suffer from oversmoothing and poor representation of long-range interactions. Both of these issues are especially of concern when modeling point defects. To address these challenges, we introduce the Accelerated Deep Atomic Potential Transformer (ADAPT), an MLFF that replaces graph representations with a direct coordinates-in-space formulation and explicitly considers all pairwise atomic interactions. Atoms are treated as tokens, with a Transformer encoder modeling their interactions. Applied to a dataset of silicon point defects, ADAPT achieves a roughly 33 percent reduction in both force and energy prediction errors relative to a state-of-the-art GNN-based model, while requiring only a fraction of the computational cost.
BoostMD: Accelerating molecular sampling by leveraging ML force field features from previous time-steps
Simulating atomic-scale processes, such as protein dynamics and catalytic reactions, is crucial for advancements in biology, chemistry, and materials science. Machine learning force fields (MLFFs) have emerged as powerful tools that achieve near quantum mechanical accuracy, with promising generalization capabilities. However, their practical use is often limited by long inference times compared to classical force fields, especially when running extensive molecular dynamics (MD) simulations required for many biological applications. In this study, we introduce BoostMD, a surrogate model architecture designed to accelerate MD simulations. BoostMD leverages node features computed at previous time steps to predict energies and forces based on positional changes. This approach reduces the complexity of the learning task, allowing BoostMD to be both smaller and significantly faster than conventional MLFFs. During simulations, the computationally intensive reference MLFF is evaluated only every N steps, while the lightweight BoostMD model handles the intermediate steps at a fraction of the computational cost. Our experiments demonstrate that BoostMD achieves an eight-fold speedup compared to the reference model and generalizes to unseen dipeptides. Furthermore, we find that BoostMD accurately samples the ground-truth Boltzmann distribution when running molecular dynamics. By combining efficient feature reuse with a streamlined architecture, BoostMD offers a robust solution for conducting large-scale, long-timescale molecular simulations, making high-accuracy ML-driven modeling more accessible and practical.
Bridging Quantum Mechanics to Organic Liquid Properties via a Universal Force Field
Molecular dynamics (MD) simulations are essential tools for unraveling atomistic insights into the structure and dynamics of condensed-phase systems. However, the universal and accurate prediction of macroscopic properties from ab initio calculations remains a significant challenge, often hindered by the trade-off between computational cost and simulation accuracy. Here, we present ByteFF-Pol, a graph neural network (GNN)-parameterized polarizable force field, trained exclusively on high-level quantum mechanics (QM) data. Leveraging physically-motivated force field forms and training strategies, ByteFF-Pol exhibits exceptional performance in predicting thermodynamic and transport properties for a wide range of small-molecule liquids and electrolytes, outperforming state-of-the-art (SOTA) classical and machine learning force fields. The zero-shot prediction capability of ByteFF-Pol bridges the gap between microscopic QM calculations and macroscopic liquid properties, enabling the exploration of previously intractable chemical spaces. This advancement holds transformative potential for applications such as electrolyte design and custom-tailored solvent, representing a pivotal step toward data-driven materials discovery.
Learning Inter-Atomic Potentials without Explicit Equivariance
Accurate and scalable machine-learned inter-atomic potentials (MLIPs) are essential for molecular simulations ranging from drug discovery to new material design. Current state-of-the-art models enforce roto-translational symmetries through equivariant neural network architectures, a hard-wired inductive bias that can often lead to reduced flexibility, computational efficiency, and scalability. In this work, we introduce TransIP: Transformer-based Inter-Atomic Potentials, a novel training paradigm for interatomic potentials achieving symmetry compliance without explicit architectural constraints. Our approach guides a generic non-equivariant Transformer-based model to learn SO(3)-equivariance by optimizing its representations in the embedding space. Trained on the recent Open Molecules (OMol25) collection, a large and diverse molecular dataset built specifically for MLIPs and covering different types of molecules (including small organics, biomolecular fragments, and electrolyte-like species), TransIP attains comparable performance in machine-learning force fields versus state-of-the-art equivariant baselines. Further, compared to a data augmentation baseline, TransIP achieves 40% to 60% improvement in performance across varying OMol25 dataset sizes. More broadly, our work shows that learned equivariance can be a powerful and efficient alternative to equivariant or augmentation-based MLIP models.
Cross Learning between Electronic Structure Theories for Unifying Molecular, Surface, and Inorganic Crystal Foundation Force Fields
Creating a single unified interatomic potential capable of attaining ab initio accuracy across all chemistry remains a long-standing challenge in computational chemistry and materials science. This work introduces a training protocol for foundation machine-learning interatomic potentials (MLIPs) that bridge molecular, surface, and materials chemistry through cross-domain learning. First, we introduce enhancements to the MACE architecture that improve its performance on chemically diverse databases by increasing weight sharing across chemical elements and introducing non-linear factors into the tensor decomposition of the product basis. Second, we develop a multi-head replay post-training methodology that enables efficient knowledge transfer across diverse chemical domains. By fine-tuning on datasets at different levels of electronic structure theory, including inorganic crystals, molecular systems, surface chemistry, and reactive organic chemistry, we demonstrate that a single unified model achieves state-of-the-art performance across several chemical domains. Comprehensive benchmarking reveals superior cross-domain transferability compared with existing specialised and multi-task models, with notable improvements in molecular and surface properties while maintaining state-of-the-art performance in materials-property prediction.
AceFF: A State-of-the-Art Machine Learning Potential for Small Molecules
We introduce AceFF, a pre-trained machine learning interatomic potential (MLIP) optimized for small molecule drug discovery. While MLIPs have emerged as efficient alternatives to Density Functional Theory (DFT), generalizability across diverse chemical spaces remains difficult. AceFF addresses this via a refined TensorNet2 architecture trained on a comprehensive dataset of drug-like compounds. This approach yields a force field that balances high-throughput inference speed with DFT-level accuracy. AceFF fully supports the essential medicinal chemistry elements (H, B, C, N, O, F, Si, P, S, Cl, Br, I) and is explicitly trained to handle charged states. Validation against rigorous benchmarks, including complex torsional energy scans, molecular dynamics trajectories, batched minimizations, and forces and anergy accuracy demonstrates that AceFF establishes a new state-of-the-art for organic molecules. The AceFF-2 model weights and inference code are available at https://huggingface.co/Acellera/AceFF-2.0.
Machine Learning Interatomic Potentials: library for efficient training, model development and simulation of molecular systems
Machine Learning Interatomic Potentials (MLIP) are a novel in silico approach for molecular property prediction, creating an alternative to disrupt the accuracy/speed trade-off of empirical force fields and density functional theory (DFT). In this white paper, we present our MLIP library which was created with two core aims: (1) provide to industry experts without machine learning background a user-friendly and computationally efficient set of tools to experiment with MLIP models, (2) provide machine learning developers a framework to develop novel approaches fully integrated with molecular dynamics tools. The library includes in this release three model architectures (MACE, NequIP, and ViSNet), and two molecular dynamics (MD) wrappers (ASE, and JAX-MD), along with a set of pre-trained organics models. The seamless integration with JAX-MD, in particular, facilitates highly efficient MD simulations, bringing MLIP models significantly closer to industrial application. The library is available on GitHub and on PyPI under the Apache license 2.0.
MLIP Arena: Advancing Fairness and Transparency in Machine Learning Interatomic Potentials via an Open, Accessible Benchmark Platform
Machine learning interatomic potentials (MLIPs) have revolutionized molecular and materials modeling, but existing benchmarks suffer from data leakage, limited transferability, and an over-reliance on error-based metrics tied to specific density functional theory (DFT) references. We introduce MLIP Arena, a benchmark platform that evaluates force field performance based on physics awareness, chemical reactivity, stability under extreme conditions, and predictive capabilities for thermodynamic properties and physical phenomena. By moving beyond static DFT references and revealing the important failure modes of current foundation MLIPs in real-world settings, MLIP Arena provides a reproducible framework to guide the next-generation MLIP development toward improved predictive accuracy and runtime efficiency while maintaining physical consistency. The Python package and online leaderboard are available at https://github.com/atomind-ai/mlip-arena.
AutoMat: Enabling Automated Crystal Structure Reconstruction from Microscopy via Agentic Tool Use
Machine learning-based interatomic potentials and force fields depend critically on accurate atomic structures, yet such data are scarce due to the limited availability of experimentally resolved crystals. Although atomic-resolution electron microscopy offers a potential source of structural data, converting these images into simulation-ready formats remains labor-intensive and error-prone, creating a bottleneck for model training and validation. We introduce AutoMat, an end-to-end, agent-assisted pipeline that automatically transforms scanning transmission electron microscopy (STEM) images into atomic crystal structures and predicts their physical properties. AutoMat combines pattern-adaptive denoising, physics-guided template retrieval, symmetry-aware atomic reconstruction, fast relaxation and property prediction via MatterSim, and coordinated orchestration across all stages. We propose the first dedicated STEM2Mat-Bench for this task and evaluate performance using lattice RMSD, formation energy MAE, and structure-matching success rate. By orchestrating external tool calls, AutoMat enables a text-only LLM to outperform vision-language models in this domain, achieving closed-loop reasoning throughout the pipeline. In large-scale experiments over 450 structure samples, AutoMat substantially outperforms existing multimodal large language models and tools. These results validate both AutoMat and STEM2Mat-Bench, marking a key step toward bridging microscopy and atomistic simulation in materials science.The code and dataset are publicly available at https://github.com/yyt-2378/AutoMat and https://huggingface.co/datasets/yaotianvector/STEM2Mat.
CHGNet: Pretrained universal neural network potential for charge-informed atomistic modeling
The simulation of large-scale systems with complex electron interactions remains one of the greatest challenges for the atomistic modeling of materials. Although classical force fields often fail to describe the coupling between electronic states and ionic rearrangements, the more accurate ab-initio molecular dynamics suffers from computational complexity that prevents long-time and large-scale simulations, which are essential to study many technologically relevant phenomena, such as reactions, ion migrations, phase transformations, and degradation. In this work, we present the Crystal Hamiltonian Graph neural Network (CHGNet) as a novel machine-learning interatomic potential (MLIP), using a graph-neural-network-based force field to model a universal potential energy surface. CHGNet is pretrained on the energies, forces, stresses, and magnetic moments from the Materials Project Trajectory Dataset, which consists of over 10 years of density functional theory static and relaxation trajectories of sim 1.5 million inorganic structures. The explicit inclusion of magnetic moments enables CHGNet to learn and accurately represent the orbital occupancy of electrons, enhancing its capability to describe both atomic and electronic degrees of freedom. We demonstrate several applications of CHGNet in solid-state materials, including charge-informed molecular dynamics in Li_xMnO_2, the finite temperature phase diagram for Li_xFePO_4 and Li diffusion in garnet conductors. We critically analyze the significance of including charge information for capturing appropriate chemistry, and we provide new insights into ionic systems with additional electronic degrees of freedom that can not be observed by previous MLIPs.
Position: Machine Learning Conferences Should Establish a "Refutations and Critiques" Track
Science progresses by iteratively advancing and correcting humanity's understanding of the world. In machine learning (ML) research, rapid advancements have led to an explosion of publications, but have also led to misleading, incorrect, flawed or perhaps even fraudulent studies being accepted and sometimes highlighted at ML conferences due to the fallibility of peer review. While such mistakes are understandable, ML conferences do not offer robust processes to help the field systematically correct when such errors are made. This position paper argues that ML conferences should establish a dedicated "Refutations and Critiques" (R&C) Track. This R&C Track would provide a high-profile, reputable platform to support vital research that critically challenges prior research, thereby fostering a dynamic self-correcting research ecosystem. We discuss key considerations including track design, review principles, potential pitfalls, and provide an illustrative example submission concerning a recent ICLR 2025 Oral. We conclude that ML conferences should create official, reputable mechanisms to help ML research self-correct.
Zyxin is all you need: machine learning adherent cell mechanics
Cellular form and function emerge from complex mechanochemical systems within the cytoplasm. No systematic strategy currently exists to infer large-scale physical properties of a cell from its many molecular components. This is a significant obstacle to understanding biophysical processes such as cell adhesion and migration. Here, we develop a data-driven biophysical modeling approach to learn the mechanical behavior of adherent cells. We first train neural networks to predict forces generated by adherent cells from images of cytoskeletal proteins. Strikingly, experimental images of a single focal adhesion protein, such as zyxin, are sufficient to predict forces and generalize to unseen biological regimes. This protein field alone contains enough information to yield accurate predictions even if forces themselves are generated by many interacting proteins. We next develop two approaches - one explicitly constrained by physics, the other more agnostic - that help construct data-driven continuum models of cellular forces using this single focal adhesion field. Both strategies consistently reveal that cellular forces are encoded by two different length scales in adhesion protein distributions. Beyond adherent cell mechanics, our work serves as a case study for how to integrate neural networks in the construction of predictive phenomenological models in cell biology, even when little knowledge of the underlying microscopic mechanisms exist.
Knowledge Graph in Astronomical Research with Large Language Models: Quantifying Driving Forces in Interdisciplinary Scientific Discovery
Identifying and predicting the factors that contribute to the success of interdisciplinary research is crucial for advancing scientific discovery. However, there is a lack of methods to quantify the integration of new ideas and technological advancements in astronomical research and how these new technologies drive further scientific breakthroughs. Large language models, with their ability to extract key concepts from vast literature beyond keyword searches, provide a new tool to quantify such processes. In this study, we extracted concepts in astronomical research from 297,807 publications between 1993 and 2024 using large language models, resulting in a set of 24,939 concepts. These concepts were then used to form a knowledge graph, where the link strength between any two concepts was determined by their relevance through the citation-reference relationships. By calculating this relevance across different time periods, we quantified the impact of numerical simulations and machine learning on astronomical research. The knowledge graph demonstrates two phases of development: a phase where the technology was integrated and another where the technology was explored in scientific discovery. The knowledge graph reveals that despite machine learning has made much inroad in astronomy, there is currently a lack of new concept development at the intersection of AI and Astronomy, which may be the current bottleneck preventing machine learning from further transforming the field of astronomy.
A foundation model for atomistic materials chemistry
Machine-learned force fields have transformed the atomistic modelling of materials by enabling simulations of ab initio quality on unprecedented time and length scales. However, they are currently limited by: (i) the significant computational and human effort that must go into development and validation of potentials for each particular system of interest; and (ii) a general lack of transferability from one chemical system to the next. Here, using the state-of-the-art MACE architecture we introduce a single general-purpose ML model, trained on a public database of 150k inorganic crystals, that is capable of running stable molecular dynamics on molecules and materials. We demonstrate the power of the MACE-MP-0 model -- and its qualitative and at times quantitative accuracy -- on a diverse set problems in the physical sciences, including the properties of solids, liquids, gases, and chemical reactions. The model can be applied out of the box and as a starting or "foundation model" for any atomistic system of interest and is thus a step towards democratising the revolution of ML force fields by lowering the barriers to entry.
Machine-learned molecular mechanics force field for the simulation of protein-ligand systems and beyond
The development of reliable and extensible molecular mechanics (MM) force fields -- fast, empirical models characterizing the potential energy surface of molecular systems -- is indispensable for biomolecular simulation and computer-aided drug design. Here, we introduce a generalized and extensible machine-learned MM force field, espaloma-0.3, and an end-to-end differentiable framework using graph neural networks to overcome the limitations of traditional rule-based methods. Trained in a single GPU-day to fit a large and diverse quantum chemical dataset of over 1.1M energy and force calculations, espaloma-0.3 reproduces quantum chemical energetic properties of chemical domains highly relevant to drug discovery, including small molecules, peptides, and nucleic acids. Moreover, this force field maintains the quantum chemical energy-minimized geometries of small molecules and preserves the condensed phase properties of peptides, self-consistently parametrizing proteins and ligands to produce stable simulations leading to highly accurate predictions of binding free energies. This methodology demonstrates significant promise as a path forward for systematically building more accurate force fields that are easily extensible to new chemical domains of interest.
MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields
Creating fast and accurate force fields is a long-standing challenge in computational chemistry and materials science. Recently, several equivariant message passing neural networks (MPNNs) have been shown to outperform models built using other approaches in terms of accuracy. However, most MPNNs suffer from high computational cost and poor scalability. We propose that these limitations arise because MPNNs only pass two-body messages leading to a direct relationship between the number of layers and the expressivity of the network. In this work, we introduce MACE, a new equivariant MPNN model that uses higher body order messages. In particular, we show that using four-body messages reduces the required number of message passing iterations to just two, resulting in a fast and highly parallelizable model, reaching or exceeding state-of-the-art accuracy on the rMD17, 3BPA, and AcAc benchmark tasks. We also demonstrate that using higher order messages leads to an improved steepness of the learning curves.
TorchMD-Net 2.0: Fast Neural Network Potentials for Molecular Simulations
Achieving a balance between computational speed, prediction accuracy, and universal applicability in molecular simulations has been a persistent challenge. This paper presents substantial advancements in the TorchMD-Net software, a pivotal step forward in the shift from conventional force fields to neural network-based potentials. The evolution of TorchMD-Net into a more comprehensive and versatile framework is highlighted, incorporating cutting-edge architectures such as TensorNet. This transformation is achieved through a modular design approach, encouraging customized applications within the scientific community. The most notable enhancement is a significant improvement in computational efficiency, achieving a very remarkable acceleration in the computation of energy and forces for TensorNet models, with performance gains ranging from 2-fold to 10-fold over previous iterations. Other enhancements include highly optimized neighbor search algorithms that support periodic boundary conditions and the smooth integration with existing molecular dynamics frameworks. Additionally, the updated version introduces the capability to integrate physical priors, further enriching its application spectrum and utility in research. The software is available at https://github.com/torchmd/torchmd-net.
PoseBusters: AI-based docking methods fail to generate physically valid poses or generalise to novel sequences
The last few years have seen the development of numerous deep learning-based protein-ligand docking methods. They offer huge promise in terms of speed and accuracy. However, despite claims of state-of-the-art performance in terms of crystallographic root-mean-square deviation (RMSD), upon closer inspection, it has become apparent that they often produce physically implausible molecular structures. It is therefore not sufficient to evaluate these methods solely by RMSD to a native binding mode. It is vital, particularly for deep learning-based methods, that they are also evaluated on steric and energetic criteria. We present PoseBusters, a Python package that performs a series of standard quality checks using the well-established cheminformatics toolkit RDKit. Only methods that both pass these checks and predict native-like binding modes should be classed as having "state-of-the-art" performance. We use PoseBusters to compare five deep learning-based docking methods (DeepDock, DiffDock, EquiBind, TankBind, and Uni-Mol) and two well-established standard docking methods (AutoDock Vina and CCDC Gold) with and without an additional post-prediction energy minimisation step using a molecular mechanics force field. We show that both in terms of physical plausibility and the ability to generalise to examples that are distinct from the training data, no deep learning-based method yet outperforms classical docking tools. In addition, we find that molecular mechanics force fields contain docking-relevant physics missing from deep-learning methods. PoseBusters allows practitioners to assess docking and molecular generation methods and may inspire new inductive biases still required to improve deep learning-based methods, which will help drive the development of more accurate and more realistic predictions.
Latent Field Discovery In Interacting Dynamical Systems With Neural Fields
Systems of interacting objects often evolve under the influence of field effects that govern their dynamics, yet previous works have abstracted away from such effects, and assume that systems evolve in a vacuum. In this work, we focus on discovering these fields, and infer them from the observed dynamics alone, without directly observing them. We theorize the presence of latent force fields, and propose neural fields to learn them. Since the observed dynamics constitute the net effect of local object interactions and global field effects, recently popularized equivariant networks are inapplicable, as they fail to capture global information. To address this, we propose to disentangle local object interactions -- which are SE(n) equivariant and depend on relative states -- from external global field effects -- which depend on absolute states. We model interactions with equivariant graph networks, and combine them with neural fields in a novel graph network that integrates field forces. Our experiments show that we can accurately discover the underlying fields in charged particles settings, traffic scenes, and gravitational n-body problems, and effectively use them to learn the system and forecast future trajectories.
MACE4IR: A foundation model for molecular infrared spectroscopy
Machine-learned interatomic potentials (MLIPs) have shown significant promise in predicting infrared spectra with high fidelity. However, the absence of general-purpose MLIPs capable of handling a wide range of elements and their combinations has limited their broader applicability. In this work, we introduce MACE4IR, a machine learning foundation model built on the MACE architecture and trained on 10 million geometries and corresponding density-functional theory (DFT) energies, forces and dipole moments from the QCML dataset. The training data encompasses approximately 80 elements and a diverse set of molecules, including organic compounds, inorganic species, and metal complexes. MACE4IR accurately predicts energies, forces, dipole moments, and infrared spectra at significantly reduced computational cost compared to DFT. By combining generality, accuracy, and efficiency, MACE4IR opens the door to rapid and reliable infrared spectra prediction for complex systems across chemistry, biology, and materials science.
Metatensor and metatomic: foundational libraries for interoperable atomistic machine learning
Incorporation of machine learning (ML) techniques into atomic-scale modeling has proven to be an extremely effective strategy to improve the accuracy and reduce the computational cost of simulations. It also entails conceptual and practical challenges, as it involves combining very different mathematical foundations, as well as software ecosystems that are very well developed in their own merit, but do not share many commonalities. To address these issues and facilitate the adoption of ML in atomistic simulations, we introduce two dedicated software libraries. The first one, metatensor, provides multi-platform and multi-language storage and manipulation of arrays with many potentially sparse indices, designed from the ground up for atomistic ML applications. By combining the actual values with metadata that describes their nature and that facilitates the handling of geometric information and gradients with respect to the atomic positions, metatensor provides a common framework to enable data sharing between ML software -- typically written in Python -- and established atomistic modeling tools -- typically written in Fortran, C or C++. The second library, metatomic, provides an interface to store an atomistic ML model and metadata about this model in a portable way, facilitating the implementation, training and distribution of models, and their use across different simulation packages. We showcase a growing ecosystem of tools, from low-level libraries, training utilities, to interfaces with existing software packages that demonstrate the effectiveness of metatensor and metatomic in bridging the gap between traditional simulation software and modern ML frameworks.
The Open Molecules 2025 (OMol25) Dataset, Evaluations, and Models
Machine learning (ML) models hold the promise of transforming atomic simulations by delivering quantum chemical accuracy at a fraction of the computational cost. Realization of this potential would enable high-throughout, high-accuracy molecular screening campaigns to explore vast regions of chemical space and facilitate ab initio simulations at sizes and time scales that were previously inaccessible. However, a fundamental challenge to creating ML models that perform well across molecular chemistry is the lack of comprehensive data for training. Despite substantial efforts in data generation, no large-scale molecular dataset exists that combines broad chemical diversity with a high level of accuracy. To address this gap, Meta FAIR introduces Open Molecules 2025 (OMol25), a large-scale dataset composed of more than 100 million density functional theory (DFT) calculations at the omegaB97M-V/def2-TZVPD level of theory, representing billions of CPU core-hours of compute. OMol25 uniquely blends elemental, chemical, and structural diversity including: 83 elements, a wide-range of intra- and intermolecular interactions, explicit solvation, variable charge/spin, conformers, and reactive structures. There are ~83M unique molecular systems in OMol25 covering small molecules, biomolecules, metal complexes, and electrolytes, including structures obtained from existing datasets. OMol25 also greatly expands on the size of systems typically included in DFT datasets, with systems of up to 350 atoms. In addition to the public release of the data, we provide baseline models and a comprehensive set of model evaluations to encourage community engagement in developing the next-generation ML models for molecular chemistry.
Scalable Bayesian Uncertainty Quantification for Neural Network Potentials: Promise and Pitfalls
Neural network (NN) potentials promise highly accurate molecular dynamics (MD) simulations within the computational complexity of classical MD force fields. However, when applied outside their training domain, NN potential predictions can be inaccurate, increasing the need for Uncertainty Quantification (UQ). Bayesian modeling provides the mathematical framework for UQ, but classical Bayesian methods based on Markov chain Monte Carlo (MCMC) are computationally intractable for NN potentials. By training graph NN potentials for coarse-grained systems of liquid water and alanine dipeptide, we demonstrate here that scalable Bayesian UQ via stochastic gradient MCMC (SG-MCMC) yields reliable uncertainty estimates for MD observables. We show that cold posteriors can reduce the required training data size and that for reliable UQ, multiple Markov chains are needed. Additionally, we find that SG-MCMC and the Deep Ensemble method achieve comparable results, despite shorter training and less hyperparameter tuning of the latter. We show that both methods can capture aleatoric and epistemic uncertainty reliably, but not systematic uncertainty, which needs to be minimized by adequate modeling to obtain accurate credible intervals for MD observables. Our results represent a step towards accurate UQ that is of vital importance for trustworthy NN potential-based MD simulations required for decision-making in practice.
QuantumBind-RBFE: Accurate Relative Binding Free Energy Calculations Using Neural Network Potentials
Accurate prediction of protein-ligand binding affinities is crucial in drug discovery, particularly during hit-to-lead and lead optimization phases, however, limitations in ligand force fields continue to impact prediction accuracy. In this work, we validate relative binding free energy (RBFE) accuracy using neural network potentials (NNPs) for the ligands. We utilize a novel NNP model, AceForce 1.0, based on the TensorNet architecture for small molecules that broadens the applicability to diverse drug-like compounds, including all important chemical elements and supporting charged molecules. Using established benchmarks, we show overall improved accuracy and correlation in binding affinity predictions compared with GAFF2 for molecular mechanics and ANI2-x for NNPs. Slightly less accuracy but comparable correlations with OPLS4. We also show that we can run the NNP simulations at 2 fs timestep, at least two times larger than previous NNP models, providing significant speed gains. The results show promise for further evolutions of free energy calculations using NNPs while demonstrating its practical use already with the current generation. The code and NNP model are publicly available for research use.
Facet: highly efficient E(3)-equivariant networks for interatomic potentials
Computational materials discovery is limited by the high cost of first-principles calculations. Machine learning (ML) potentials that predict energies from crystal structures are promising, but existing methods face computational bottlenecks. Steerable graph neural networks (GNNs) encode geometry with spherical harmonics, respecting atomic symmetries -- permutation, rotation, and translation -- for physically realistic predictions. Yet maintaining equivariance is difficult: activation functions must be modified, and each layer must handle multiple data types for different harmonic orders. We present Facet, a GNN architecture for efficient ML potentials, developed through systematic analysis of steerable GNNs. Our innovations include replacing expensive multi-layer perceptrons (MLPs) for interatomic distances with splines, which match performance while cutting computational and memory demands. We also introduce a general-purpose equivariant layer that mixes node information via spherical grid projection followed by standard MLPs -- faster than tensor products and more expressive than linear or gate layers. On the MPTrj dataset, Facet matches leading models with far fewer parameters and under 10% of their training compute. On a crystal relaxation task, it runs twice as fast as MACE models. We further show SevenNet-0's parameters can be reduced by over 25% with no accuracy loss. These techniques enable more than 10x faster training of large-scale foundation models for ML potentials, potentially reshaping computational materials discovery.
Lifelong Machine Learning Potentials
Machine learning potentials (MLPs) trained on accurate quantum chemical data can retain the high accuracy, while inflicting little computational demands. On the downside, they need to be trained for each individual system. In recent years, a vast number of MLPs has been trained from scratch because learning additional data typically requires to train again on all data to not forget previously acquired knowledge. Additionally, most common structural descriptors of MLPs cannot represent efficiently a large number of different chemical elements. In this work, we tackle these problems by introducing element-embracing atom-centered symmetry functions (eeACSFs) which combine structural properties and element information from the periodic table. These eeACSFs are a key for our development of a lifelong machine learning potential (lMLP). Uncertainty quantification can be exploited to transgress a fixed, pre-trained MLP to arrive at a continuously adapting lMLP, because a predefined level of accuracy can be ensured. To extend the applicability of an lMLP to new systems, we apply continual learning strategies to enable autonomous and on-the-fly training on a continuous stream of new data. For the training of deep neural networks, we propose the continual resilient (CoRe) optimizer and incremental learning strategies relying on rehearsal of data, regularization of parameters, and the architecture of the model.
Exploring Model Transferability through the Lens of Potential Energy
Transfer learning has become crucial in computer vision tasks due to the vast availability of pre-trained deep learning models. However, selecting the optimal pre-trained model from a diverse pool for a specific downstream task remains a challenge. Existing methods for measuring the transferability of pre-trained models rely on statistical correlations between encoded static features and task labels, but they overlook the impact of underlying representation dynamics during fine-tuning, leading to unreliable results, especially for self-supervised models. In this paper, we present an insightful physics-inspired approach named PED to address these challenges. We reframe the challenge of model selection through the lens of potential energy and directly model the interaction forces that influence fine-tuning dynamics. By capturing the motion of dynamic representations to decline the potential energy within a force-driven physical model, we can acquire an enhanced and more stable observation for estimating transferability. The experimental results on 10 downstream tasks and 12 self-supervised models demonstrate that our approach can seamlessly integrate into existing ranking techniques and enhance their performances, revealing its effectiveness for the model selection task and its potential for understanding the mechanism in transfer learning. Code will be available at https://github.com/lixiaotong97/PED.
AdsorbML: Accelerating Adsorption Energy Calculations with Machine Learning
Computational catalysis is playing an increasingly significant role in the design of catalysts across a wide range of applications. A common task for many computational methods is the need to accurately compute the minimum binding energy - the adsorption energy - for an adsorbate and a catalyst surface of interest. Traditionally, the identification of low energy adsorbate-surface configurations relies on heuristic methods and researcher intuition. As the desire to perform high-throughput screening increases, it becomes challenging to use heuristics and intuition alone. In this paper, we demonstrate machine learning potentials can be leveraged to identify low energy adsorbate-surface configurations more accurately and efficiently. Our algorithm provides a spectrum of trade-offs between accuracy and efficiency, with one balanced option finding the lowest energy configuration, within a 0.1 eV threshold, 86.33% of the time, while achieving a 1331x speedup in computation. To standardize benchmarking, we introduce the Open Catalyst Dense dataset containing nearly 1,000 diverse surfaces and 85,658 unique configurations.
Spherical Channels for Modeling Atomic Interactions
Modeling the energy and forces of atomic systems is a fundamental problem in computational chemistry with the potential to help address many of the world's most pressing problems, including those related to energy scarcity and climate change. These calculations are traditionally performed using Density Functional Theory, which is computationally very expensive. Machine learning has the potential to dramatically improve the efficiency of these calculations from days or hours to seconds. We propose the Spherical Channel Network (SCN) to model atomic energies and forces. The SCN is a graph neural network where nodes represent atoms and edges their neighboring atoms. The atom embeddings are a set of spherical functions, called spherical channels, represented using spherical harmonics. We demonstrate, that by rotating the embeddings based on the 3D edge orientation, more information may be utilized while maintaining the rotational equivariance of the messages. While equivariance is a desirable property, we find that by relaxing this constraint in both message passing and aggregation, improved accuracy may be achieved. We demonstrate state-of-the-art results on the large-scale Open Catalyst dataset in both energy and force prediction for numerous tasks and metrics.
Learning Smooth and Expressive Interatomic Potentials for Physical Property Prediction
Machine learning interatomic potentials (MLIPs) have become increasingly effective at approximating quantum mechanical calculations at a fraction of the computational cost. However, lower errors on held out test sets do not always translate to improved results on downstream physical property prediction tasks. In this paper, we propose testing MLIPs on their practical ability to conserve energy during molecular dynamic simulations. If passed, improved correlations are found between test errors and their performance on physical property prediction tasks. We identify choices which may lead to models failing this test, and use these observations to improve upon highly-expressive models. The resulting model, eSEN, provides state-of-the-art results on a range of physical property prediction tasks, including materials stability prediction, thermal conductivity prediction, and phonon calculations.
QH9: A Quantum Hamiltonian Prediction Benchmark for QM9 Molecules
Supervised machine learning approaches have been increasingly used in accelerating electronic structure prediction as surrogates of first-principle computational methods, such as density functional theory (DFT). While numerous quantum chemistry datasets focus on chemical properties and atomic forces, the ability to achieve accurate and efficient prediction of the Hamiltonian matrix is highly desired, as it is the most important and fundamental physical quantity that determines the quantum states of physical systems and chemical properties. In this work, we generate a new Quantum Hamiltonian dataset, named as QH9, to provide precise Hamiltonian matrices for 999 or 2998 molecular dynamics trajectories and 130,831 stable molecular geometries, based on the QM9 dataset. By designing benchmark tasks with various molecules, we show that current machine learning models have the capacity to predict Hamiltonian matrices for arbitrary molecules. Both the QH9 dataset and the baseline models are provided to the community through an open-source benchmark, which can be highly valuable for developing machine learning methods and accelerating molecular and materials design for scientific and technological applications. Our benchmark is publicly available at https://github.com/divelab/AIRS/tree/main/OpenDFT/QHBench.
A Benchmark for Quantum Chemistry Relaxations via Machine Learning Interatomic Potentials
Computational quantum chemistry plays a critical role in drug discovery, chemical synthesis, and materials science. While first-principles methods, such as density functional theory (DFT), provide high accuracy in modeling electronic structures and predicting molecular properties, they are computationally expensive. Machine learning interatomic potentials (MLIPs) have emerged as promising surrogate models that aim to achieve DFT-level accuracy while enabling efficient large-scale atomistic simulations. The development of accurate and transferable MLIPs requires large-scale, high-quality datasets with both energy and force labels. Critically, MLIPs must generalize not only to stable geometries but also to intermediate, non-equilibrium conformations encountered during atomistic simulations. In this work, we introduce PubChemQCR, a large-scale dataset of molecular relaxation trajectories curated from the raw geometry optimization outputs of the PubChemQC project. PubChemQCR is the largest publicly available dataset of DFT-based relaxation trajectories for small organic molecules, comprising approximately 3.5 million trajectories and over 300 million molecular conformations computed at various levels of theory. Each conformation is labeled with both total energy and atomic forces, making the dataset suitable for training and evaluating MLIPs. To provide baselines for future developments, we benchmark nine representative MLIP models on the dataset. Our resources are publicly available at https://huggingface.co/divelab
Transformers Discover Molecular Structure Without Graph Priors
Graph Neural Networks (GNNs) are the dominant architecture for molecular machine learning, particularly for molecular property prediction and machine learning interatomic potentials (MLIPs). GNNs perform message passing on predefined graphs often induced by a fixed radius cutoff or k-nearest neighbor scheme. While this design aligns with the locality present in many molecular tasks, a hard-coded graph can limit expressivity due to the fixed receptive field and slows down inference with sparse graph operations. In this work, we investigate whether pure, unmodified Transformers trained directly on Cartesian coordinatesx2013without predefined graphs or physical priorsx2013can approximate molecular energies and forces. As a starting point for our analysis, we demonstrate how to train a Transformer to competitive energy and force mean absolute errors under a matched training compute budget, relative to a state-of-the-art equivariant GNN on the OMol25 dataset. We discover that the Transformer learns physically consistent patternsx2013such as attention weights that decay inversely with interatomic distancex2013and flexibly adapts them across different molecular environments due to the absence of hard-coded biases. The use of a standard Transformer also unlocks predictable improvements with respect to scaling training resources, consistent with empirical scaling laws observed in other domains. Our results demonstrate that many favorable properties of GNNs can emerge adaptively in Transformers, challenging the necessity of hard-coded graph inductive biases and pointing toward standardized, scalable architectures for molecular modeling.
Sliced Denoising: A Physics-Informed Molecular Pre-Training Method
While molecular pre-training has shown great potential in enhancing drug discovery, the lack of a solid physical interpretation in current methods raises concerns about whether the learned representation truly captures the underlying explanatory factors in observed data, ultimately resulting in limited generalization and robustness. Although denoising methods offer a physical interpretation, their accuracy is often compromised by ad-hoc noise design, leading to inaccurate learned force fields. To address this limitation, this paper proposes a new method for molecular pre-training, called sliced denoising (SliDe), which is based on the classical mechanical intramolecular potential theory. SliDe utilizes a novel noise strategy that perturbs bond lengths, angles, and torsion angles to achieve better sampling over conformations. Additionally, it introduces a random slicing approach that circumvents the computationally expensive calculation of the Jacobian matrix, which is otherwise essential for estimating the force field. By aligning with physical principles, SliDe shows a 42\% improvement in the accuracy of estimated force fields compared to current state-of-the-art denoising methods, and thus outperforms traditional baselines on various molecular property prediction tasks.
Str2Str: A Score-based Framework for Zero-shot Protein Conformation Sampling
The dynamic nature of proteins is crucial for determining their biological functions and properties, for which Monte Carlo (MC) and molecular dynamics (MD) simulations stand as predominant tools to study such phenomena. By utilizing empirically derived force fields, MC or MD simulations explore the conformational space through numerically evolving the system via Markov chain or Newtonian mechanics. However, the high-energy barrier of the force fields can hamper the exploration of both methods by the rare event, resulting in inadequately sampled ensemble without exhaustive running. Existing learning-based approaches perform direct sampling yet heavily rely on target-specific simulation data for training, which suffers from high data acquisition cost and poor generalizability. Inspired by simulated annealing, we propose Str2Str, a novel structure-to-structure translation framework capable of zero-shot conformation sampling with roto-translation equivariant property. Our method leverages an amortized denoising score matching objective trained on general crystal structures and has no reliance on simulation data during both training and inference. Experimental results across several benchmarking protein systems demonstrate that Str2Str outperforms previous state-of-the-art generative structure prediction models and can be orders of magnitude faster compared to long MD simulations. Our open-source implementation is available at https://github.com/lujiarui/Str2Str
Ewald-based Long-Range Message Passing for Molecular Graphs
Neural architectures that learn potential energy surfaces from molecular data have undergone fast improvement in recent years. A key driver of this success is the Message Passing Neural Network (MPNN) paradigm. Its favorable scaling with system size partly relies upon a spatial distance limit on messages. While this focus on locality is a useful inductive bias, it also impedes the learning of long-range interactions such as electrostatics and van der Waals forces. To address this drawback, we propose Ewald message passing: a nonlocal Fourier space scheme which limits interactions via a cutoff on frequency instead of distance, and is theoretically well-founded in the Ewald summation method. It can serve as an augmentation on top of existing MPNN architectures as it is computationally inexpensive and agnostic to architectural details. We test the approach with four baseline models and two datasets containing diverse periodic (OC20) and aperiodic structures (OE62). We observe robust improvements in energy mean absolute errors across all models and datasets, averaging 10% on OC20 and 16% on OE62. Our analysis shows an outsize impact of these improvements on structures with high long-range contributions to the ground truth energy.
A smile is all you need: Predicting limiting activity coefficients from SMILES with natural language processing
Knowledge of mixtures' phase equilibria is crucial in nature and technical chemistry. Phase equilibria calculations of mixtures require activity coefficients. However, experimental data on activity coefficients is often limited due to high cost of experiments. For an accurate and efficient prediction of activity coefficients, machine learning approaches have been recently developed. However, current machine learning approaches still extrapolate poorly for activity coefficients of unknown molecules. In this work, we introduce the SMILES-to-Properties-Transformer (SPT), a natural language processing network to predict binary limiting activity coefficients from SMILES codes. To overcome the limitations of available experimental data, we initially train our network on a large dataset of synthetic data sampled from COSMO-RS (10 Million data points) and then fine-tune the model on experimental data (20 870 data points). This training strategy enables SPT to accurately predict limiting activity coefficients even for unknown molecules, cutting the mean prediction error in half compared to state-of-the-art models for activity coefficient predictions such as COSMO-RS, UNIFAC, and improving on recent machine learning approaches.
Strain Problems got you in a Twist? Try StrainRelief: A Quantum-Accurate Tool for Ligand Strain Calculations
Ligand strain energy, the energy difference between the bound and unbound conformations of a ligand, is an important component of structure-based small molecule drug design. A large majority of observed ligands in protein-small molecule co-crystal structures bind in low-strain conformations, making strain energy a useful filter for structure-based drug design. In this work we present a tool for calculating ligand strain with a high accuracy. StrainRelief uses a MACE Neural Network Potential (NNP), trained on a large database of Density Functional Theory (DFT) calculations to estimate ligand strain of neutral molecules with quantum accuracy. We show that this tool estimates strain energy differences relative to DFT to within 1.4 kcal/mol, more accurately than alternative NNPs. These results highlight the utility of NNPs in drug discovery, and provide a useful tool for drug discovery teams.
nabla^2DFT: A Universal Quantum Chemistry Dataset of Drug-Like Molecules and a Benchmark for Neural Network Potentials
Methods of computational quantum chemistry provide accurate approximations of molecular properties crucial for computer-aided drug discovery and other areas of chemical science. However, high computational complexity limits the scalability of their applications. Neural network potentials (NNPs) are a promising alternative to quantum chemistry methods, but they require large and diverse datasets for training. This work presents a new dataset and benchmark called nabla^2DFT that is based on the nablaDFT. It contains twice as much molecular structures, three times more conformations, new data types and tasks, and state-of-the-art models. The dataset includes energies, forces, 17 molecular properties, Hamiltonian and overlap matrices, and a wavefunction object. All calculations were performed at the DFT level (omegaB97X-D/def2-SVP) for each conformation. Moreover, nabla^2DFT is the first dataset that contains relaxation trajectories for a substantial number of drug-like molecules. We also introduce a novel benchmark for evaluating NNPs in molecular property prediction, Hamiltonian prediction, and conformational optimization tasks. Finally, we propose an extendable framework for training NNPs and implement 10 models within it.
TensorNet: Cartesian Tensor Representations for Efficient Learning of Molecular Potentials
The development of efficient machine learning models for molecular systems representation is becoming crucial in scientific research. We introduce TensorNet, an innovative O(3)-equivariant message-passing neural network architecture that leverages Cartesian tensor representations. By using Cartesian tensor atomic embeddings, feature mixing is simplified through matrix product operations. Furthermore, the cost-effective decomposition of these tensors into rotation group irreducible representations allows for the separate processing of scalars, vectors, and tensors when necessary. Compared to higher-rank spherical tensor models, TensorNet demonstrates state-of-the-art performance with significantly fewer parameters. For small molecule potential energies, this can be achieved even with a single interaction layer. As a result of all these properties, the model's computational cost is substantially decreased. Moreover, the accurate prediction of vector and tensor molecular quantities on top of potential energies and forces is possible. In summary, TensorNet's framework opens up a new space for the design of state-of-the-art equivariant models.
The Open Catalyst 2025 (OC25) Dataset and Models for Solid-Liquid Interfaces
Catalysis at solid-liquid interfaces plays a central role in the advancement of energy storage and sustainable chemical production technologies. By enabling accurate, long-time scale simulations, machine learning (ML) models have the potential to accelerate the discovery of (electro)catalysts. While prior Open Catalyst datasets (OC20 and OC22) have advanced the field by providing large-scale density functional theory (DFT) data of adsorbates on surfaces at solid-gas interfaces, they do not capture the critical role of solvent and electrolyte effects at solid-liquid interfaces. To bridge this gap, we introduce the Open Catalyst 2025 (OC25) dataset, consisting of 7,801,261 calculations across 1,511,270 unique explicit solvent environments. OC25 constitutes the largest and most diverse solid-liquid interface dataset that is currently available and provides configurational and elemental diversity: spanning 88 elements, commonly used solvents/ions, varying solvent layers, and off-equilibrium sampling. State-of-the-art models trained on the OC25 dataset exhibit energy, force, and solvation energy errors as low as 0.1 eV, 0.015 eV/A, and 0.04 eV, respectively; significantly lower than than the recently released Universal Models for Atoms (UMA-OC20). Additionally, we discuss the impact of the quality of DFT-calculated forces on model training and performance. The dataset and accompanying baseline models are made openly available for the community. We anticipate the dataset to facilitate large length-scale and long-timescale simulations of catalytic transformations at solid-liquid interfaces, advancing molecular-level insights into functional interfaces and enabling the discovery of next-generation energy storage and conversion technologies.
Maximizing Efficiency of Dataset Compression for Machine Learning Potentials With Information Theory
Machine learning interatomic potentials (MLIPs) balance high accuracy and lower costs compared to density functional theory calculations, but their performance often depends on the size and diversity of training datasets. Large datasets improve model accuracy and generalization but are computationally expensive to produce and train on, while smaller datasets risk discarding rare but important atomic environments and compromising MLIP accuracy/reliability. Here, we develop an information-theoretical framework to quantify the efficiency of dataset compression methods and propose an algorithm that maximizes this efficiency. By framing atomistic dataset compression as an instance of the minimum set cover (MSC) problem over atom-centered environments, our method identifies the smallest subset of structures that contains as much information as possible from the original dataset while pruning redundant information. The approach is extensively demonstrated on the GAP-20 and TM23 datasets, and validated on 64 varied datasets from the ColabFit repository. Across all cases, MSC consistently retains outliers, preserves dataset diversity, and reproduces the long-tail distributions of forces even at high compression rates, outperforming other subsampling methods. Furthermore, MLIPs trained on MSC-compressed datasets exhibit reduced error for out-of-distribution data even in low-data regimes. We explain these results using an outlier analysis and show that such quantitative conclusions could not be achieved with conventional dimensionality reduction methods. The algorithm is implemented in the open-source QUESTS package and can be used for several tasks in atomistic modeling, from data subsampling, outlier detection, and training improved MLIPs at a lower cost.
ToxBench: A Binding Affinity Prediction Benchmark with AB-FEP-Calculated Labels for Human Estrogen Receptor Alpha
Protein-ligand binding affinity prediction is essential for drug discovery and toxicity assessment. While machine learning (ML) promises fast and accurate predictions, its progress is constrained by the availability of reliable data. In contrast, physics-based methods such as absolute binding free energy perturbation (AB-FEP) deliver high accuracy but are computationally prohibitive for high-throughput applications. To bridge this gap, we introduce ToxBench, the first large-scale AB-FEP dataset designed for ML development and focused on a single pharmaceutically critical target, Human Estrogen Receptor Alpha (ERalpha). ToxBench contains 8,770 ERalpha-ligand complex structures with binding free energies computed via AB-FEP with a subset validated against experimental affinities at 1.75 kcal/mol RMSE, along with non-overlapping ligand splits to assess model generalizability. Using ToxBench, we further benchmark state-of-the-art ML methods, and notably, our proposed DualBind model, which employs a dual-loss framework to effectively learn the binding energy function. The benchmark results demonstrate the superior performance of DualBind and the potential of ML to approximate AB-FEP at a fraction of the computational cost.
Scalable Reactive Atomistic Dynamics with GAIA
The groundbreaking advance in materials and chemical research has been driven by the development of atomistic simulations. However, the broader applicability of the atomistic simulations remains restricted, as they inherently depend on energy models that are either inaccurate or computationally prohibitive. Machine learning interatomic potentials (MLIPs) have recently emerged as a promising class of energy models, but their deployment remains challenging due to the lack of systematic protocols for generating diverse training data. Here we automate the construction of training datasets to enable the development of general-purpose MLIPs, by introducing GAIA, an end-to-end framework to build a wide range of atomic arrangements. By employing systematic evaluation of metadynamics for effective structural exploration, GAIA overcomes the heuristic nature of conventional dataset generation. Using GAIA, we constructed Titan25, a benchmark-scale dataset, and trained MLIPs that closely match both static and dynamic density functional theory results. The models further reproduce experimental observations across reactive regimes, including detonation, coalescence, and catalytic activity. GAIA narrows the gap between experiment and simulation, and paves the way for the development of universal MLIPs that can reliably describe a wide spectrum of materials and chemical processes.
MoleculeNet: A Benchmark for Molecular Machine Learning
Molecular machine learning has been maturing rapidly over the last few years. Improved methods and the presence of larger datasets have enabled machine learning algorithms to make increasingly accurate predictions about molecular properties. However, algorithmic progress has been limited due to the lack of a standard benchmark to compare the efficacy of proposed methods; most new algorithms are benchmarked on different datasets making it challenging to gauge the quality of proposed methods. This work introduces MoleculeNet, a large scale benchmark for molecular machine learning. MoleculeNet curates multiple public datasets, establishes metrics for evaluation, and offers high quality open-source implementations of multiple previously proposed molecular featurization and learning algorithms (released as part of the DeepChem open source library). MoleculeNet benchmarks demonstrate that learnable representations are powerful tools for molecular machine learning and broadly offer the best performance. However, this result comes with caveats. Learnable representations still struggle to deal with complex tasks under data scarcity and highly imbalanced classification. For quantum mechanical and biophysical datasets, the use of physics-aware featurizations can be more important than choice of particular learning algorithm.
An Ensemble of Bayesian Neural Networks for Exoplanetary Atmospheric Retrieval
Machine learning is now used in many areas of astrophysics, from detecting exoplanets in Kepler transit signals to removing telescope systematics. Recent work demonstrated the potential of using machine learning algorithms for atmospheric retrieval by implementing a random forest to perform retrievals in seconds that are consistent with the traditional, computationally-expensive nested-sampling retrieval method. We expand upon their approach by presenting a new machine learning model, plan-net, based on an ensemble of Bayesian neural networks that yields more accurate inferences than the random forest for the same data set of synthetic transmission spectra. We demonstrate that an ensemble provides greater accuracy and more robust uncertainties than a single model. In addition to being the first to use Bayesian neural networks for atmospheric retrieval, we also introduce a new loss function for Bayesian neural networks that learns correlations between the model outputs. Importantly, we show that designing machine learning models to explicitly incorporate domain-specific knowledge both improves performance and provides additional insight by inferring the covariance of the retrieved atmospheric parameters. We apply plan-net to the Hubble Space Telescope Wide Field Camera 3 transmission spectrum for WASP-12b and retrieve an isothermal temperature and water abundance consistent with the literature. We highlight that our method is flexible and can be expanded to higher-resolution spectra and a larger number of atmospheric parameters.
Towards Foundational Models for Molecular Learning on Large-Scale Multi-Task Datasets
Recently, pre-trained foundation models have enabled significant advancements in multiple fields. In molecular machine learning, however, where datasets are often hand-curated, and hence typically small, the lack of datasets with labeled features, and codebases to manage those datasets, has hindered the development of foundation models. In this work, we present seven novel datasets categorized by size into three distinct categories: ToyMix, LargeMix and UltraLarge. These datasets push the boundaries in both the scale and the diversity of supervised labels for molecular learning. They cover nearly 100 million molecules and over 3000 sparsely defined tasks, totaling more than 13 billion individual labels of both quantum and biological nature. In comparison, our datasets contain 300 times more data points than the widely used OGB-LSC PCQM4Mv2 dataset, and 13 times more than the quantum-only QM1B dataset. In addition, to support the development of foundational models based on our proposed datasets, we present the Graphium graph machine learning library which simplifies the process of building and training molecular machine learning models for multi-task and multi-level molecular datasets. Finally, we present a range of baseline results as a starting point of multi-task and multi-level training on these datasets. Empirically, we observe that performance on low-resource biological datasets show improvement by also training on large amounts of quantum data. This indicates that there may be potential in multi-task and multi-level training of a foundation model and fine-tuning it to resource-constrained downstream tasks.
Foundation Models for Discovery and Exploration in Chemical Space
Accurate prediction of atomistic, thermodynamic, and kinetic properties from molecular structures underpins materials innovation. Existing computational and experimental approaches lack the scalability required to efficiently navigate chemical space. Scientific foundation models trained on large unlabeled datasets offer a path toward exploring chemical space across diverse application domains. Here we develop MIST, a family of molecular foundation models with up to an order of magnitude more parameters and data than prior works. Trained using a novel tokenization scheme that comprehensively captures nuclear, electronic, and geometric information, MIST learns from a diverse range of molecules. MIST models have been fine-tuned to predict more than 400 structure -- property relationships and match or exceed state-of-the-art performance across benchmarks spanning physiology, electrochemistry, and quantum chemistry. We demonstrate the ability of these models to solve real-world problems across chemical space, including multiobjective electrolyte solvent screening, olfactory perception mapping, isotope half-life prediction, stereochemical reasoning for chiral organometallic compounds, and binary and multi-component mixture property prediction. Probing MIST models using mechanistic interpretability methods reveals identifiable patterns and trends not explicitly present in the training data, suggesting that the models learn generalizable scientific concepts. We formulate hyperparameter-penalized Bayesian neural scaling laws and use them to reduce the computational cost of model development by an order of magnitude. The methods and findings presented here represent a significant step toward accelerating materials discovery, design, and optimization using foundation models and provide valuable guidance for training compute-optimal scientific foundation models.
Probabilistic Digital Twin for Misspecified Structural Dynamical Systems via Latent Force Modeling and Bayesian Neural Networks
This work presents a probabilistic digital twin framework for response prediction in dynamical systems governed by misspecified physics. The approach integrates Gaussian Process Latent Force Models (GPLFM) and Bayesian Neural Networks (BNNs) to enable end-to-end uncertainty-aware inference and prediction. In the diagnosis phase, model-form errors (MFEs) are treated as latent input forces to a nominal linear dynamical system and jointly estimated with system states using GPLFM from sensor measurements. A BNN is then trained on posterior samples to learn a probabilistic nonlinear mapping from system states to MFEs, while capturing diagnostic uncertainty. For prognosis, this mapping is used to generate pseudo-measurements, enabling state prediction via Kalman filtering. The framework allows for systematic propagation of uncertainty from diagnosis to prediction, a key capability for trustworthy digital twins. The framework is demonstrated using four nonlinear examples: a single degree of freedom (DOF) oscillator, a multi-DOF system, and two established benchmarks -- the Bouc-Wen hysteretic system and the Silverbox experimental dataset -- highlighting its predictive accuracy and robustness to model misspecification.
Thermally Averaged Magnetic Anisotropy Tensors via Machine Learning Based on Gaussian Moments
We propose a machine learning method to model molecular tensorial quantities, namely the magnetic anisotropy tensor, based on the Gaussian-moment neural-network approach. We demonstrate that the proposed methodology can achieve an accuracy of 0.3--0.4 cm^{-1} and has excellent generalization capability for out-of-sample configurations. Moreover, in combination with machine-learned interatomic potential energies based on Gaussian moments, our approach can be applied to study the dynamic behavior of magnetic anisotropy tensors and provide a unique insight into spin-phonon relaxation.
The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine Learning
Machine learning has promised to change the landscape of laboratory chemistry, with impressive results in molecular property prediction and reaction retro-synthesis. However, chemical datasets are often inaccessible to the machine learning community as they tend to require cleaning, thorough understanding of the chemistry, or are simply not available. In this paper, we introduce a novel dataset for yield prediction, providing the first-ever transient flow dataset for machine learning benchmarking, covering over 1200 process conditions. While previous datasets focus on discrete parameters, our experimental set-up allow us to sample a large number of continuous process conditions, generating new challenges for machine learning models. We focus on solvent selection, a task that is particularly difficult to model theoretically and therefore ripe for machine learning applications. We showcase benchmarking for regression algorithms, transfer-learning approaches, feature engineering, and active learning, with important applications towards solvent replacement and sustainable manufacturing.
Gradual Optimization Learning for Conformational Energy Minimization
Molecular conformation optimization is crucial to computer-aided drug discovery and materials design. Traditional energy minimization techniques rely on iterative optimization methods that use molecular forces calculated by a physical simulator (oracle) as anti-gradients. However, this is a computationally expensive approach that requires many interactions with a physical simulator. One way to accelerate this procedure is to replace the physical simulator with a neural network. Despite recent progress in neural networks for molecular conformation energy prediction, such models are prone to distribution shift, leading to inaccurate energy minimization. We find that the quality of energy minimization with neural networks can be improved by providing optimization trajectories as additional training data. Still, it takes around 5 times 10^5 additional conformations to match the physical simulator's optimization quality. In this work, we present the Gradual Optimization Learning Framework (GOLF) for energy minimization with neural networks that significantly reduces the required additional data. The framework consists of an efficient data-collecting scheme and an external optimizer. The external optimizer utilizes gradients from the energy prediction model to generate optimization trajectories, and the data-collecting scheme selects additional training data to be processed by the physical simulator. Our results demonstrate that the neural network trained with GOLF performs on par with the oracle on a benchmark of diverse drug-like molecules using 50x less additional data.
MODNet -- accurate and interpretable property predictions for limited materials datasets by feature selection and joint-learning
In order to make accurate predictions of material properties, current machine-learning approaches generally require large amounts of data, which are often not available in practice. In this work, an all-round framework is presented which relies on a feedforward neural network, the selection of physically-meaningful features and, when applicable, joint-learning. Next to being faster in terms of training time, this approach is shown to outperform current graph-network models on small datasets. In particular, the vibrational entropy at 305 K of crystals is predicted with a mean absolute test error of 0.009 meV/K/atom (four times lower than previous studies). Furthermore, joint-learning reduces the test error compared to single-target learning and enables the prediction of multiple properties at once, such as temperature functions. Finally, the selection algorithm highlights the most important features and thus helps understanding the underlying physics.
Dense Hebbian neural networks: a replica symmetric picture of unsupervised learning
We consider dense, associative neural-networks trained with no supervision and we investigate their computational capabilities analytically, via a statistical-mechanics approach, and numerically, via Monte Carlo simulations. In particular, we obtain a phase diagram summarizing their performance as a function of the control parameters such as the quality and quantity of the training dataset and the network storage, valid in the limit of large network size and structureless datasets. Moreover, we establish a bridge between macroscopic observables standardly used in statistical mechanics and loss functions typically used in the machine learning. As technical remarks, from the analytic side, we implement large deviations and stability analysis within Guerra's interpolation to tackle the not-Gaussian distributions involved in the post-synaptic potentials while, from the computational counterpart, we insert Plefka approximation in the Monte Carlo scheme, to speed up the evaluation of the synaptic tensors, overall obtaining a novel and broad approach to investigate neural networks in general.
Kolmogorov--Arnold networks in molecular dynamics
We explore the integration of Kolmogorov Networks (KANs) into molecular dynamics (MD) simulations to improve interatomic potentials. We propose that widely used potentials, such as the Lennard-Jones (LJ) potential, the embedded atom model (EAM), and artificial neural network (ANN) potentials, can be interpreted within the KAN framework. Specifically, we demonstrate that the descriptors for ANN potentials, typically constructed using polynomials, can be redefined using KAN's non-linear functions. By employing linear or cubic spline interpolations for these KAN functions, we show that the computational cost of evaluating ANN potentials and their derivatives is reduced.
Applications of Machine Learning to Lattice Quantum Field Theory
There is great potential to apply machine learning in the area of numerical lattice quantum field theory, but full exploitation of that potential will require new strategies. In this white paper for the Snowmass community planning process, we discuss the unique requirements of machine learning for lattice quantum field theory research and outline what is needed to enable exploration and deployment of this approach in the future.
The Impact of Large Language Models on Scientific Discovery: a Preliminary Study using GPT-4
In recent years, groundbreaking advancements in natural language processing have culminated in the emergence of powerful large language models (LLMs), which have showcased remarkable capabilities across a vast array of domains, including the understanding, generation, and translation of natural language, and even tasks that extend beyond language processing. In this report, we delve into the performance of LLMs within the context of scientific discovery, focusing on GPT-4, the state-of-the-art language model. Our investigation spans a diverse range of scientific areas encompassing drug discovery, biology, computational chemistry (density functional theory (DFT) and molecular dynamics (MD)), materials design, and partial differential equations (PDE). Evaluating GPT-4 on scientific tasks is crucial for uncovering its potential across various research domains, validating its domain-specific expertise, accelerating scientific progress, optimizing resource allocation, guiding future model development, and fostering interdisciplinary research. Our exploration methodology primarily consists of expert-driven case assessments, which offer qualitative insights into the model's comprehension of intricate scientific concepts and relationships, and occasionally benchmark testing, which quantitatively evaluates the model's capacity to solve well-defined domain-specific problems. Our preliminary exploration indicates that GPT-4 exhibits promising potential for a variety of scientific applications, demonstrating its aptitude for handling complex problem-solving and knowledge integration tasks. Broadly speaking, we evaluate GPT-4's knowledge base, scientific understanding, scientific numerical calculation abilities, and various scientific prediction capabilities.
Interpretable Meta-Learning of Physical Systems
Machine learning methods can be a valuable aid in the scientific process, but they need to face challenging settings where data come from inhomogeneous experimental conditions. Recent meta-learning methods have made significant progress in multi-task learning, but they rely on black-box neural networks, resulting in high computational costs and limited interpretability. Leveraging the structure of the learning problem, we argue that multi-environment generalization can be achieved using a simpler learning model, with an affine structure with respect to the learning task. Crucially, we prove that this architecture can identify the physical parameters of the system, enabling interpreable learning. We demonstrate the competitive generalization performance and the low computational cost of our method by comparing it to state-of-the-art algorithms on physical systems, ranging from toy models to complex, non-analytical systems. The interpretability of our method is illustrated with original applications to physical-parameter-induced adaptation and to adaptive control.
Understanding Biology in the Age of Artificial Intelligence
Modern life sciences research is increasingly relying on artificial intelligence approaches to model biological systems, primarily centered around the use of machine learning (ML) models. Although ML is undeniably useful for identifying patterns in large, complex data sets, its widespread application in biological sciences represents a significant deviation from traditional methods of scientific inquiry. As such, the interplay between these models and scientific understanding in biology is a topic with important implications for the future of scientific research, yet it is a subject that has received little attention. Here, we draw from an epistemological toolkit to contextualize recent applications of ML in biological sciences under modern philosophical theories of understanding, identifying general principles that can guide the design and application of ML systems to model biological phenomena and advance scientific knowledge. We propose that conceptions of scientific understanding as information compression, qualitative intelligibility, and dependency relation modelling provide a useful framework for interpreting ML-mediated understanding of biological systems. Through a detailed analysis of two key application areas of ML in modern biological research - protein structure prediction and single cell RNA-sequencing - we explore how these features have thus far enabled ML systems to advance scientific understanding of their target phenomena, how they may guide the development of future ML models, and the key obstacles that remain in preventing ML from achieving its potential as a tool for biological discovery. Consideration of the epistemological features of ML applications in biology will improve the prospects of these methods to solve important problems and advance scientific understanding of living systems.
A Foundational Potential Energy Surface Dataset for Materials
Accurate potential energy surface (PES) descriptions are essential for atomistic simulations of materials. Universal machine learning interatomic potentials (UMLIPs)^{1-3} offer a computationally efficient alternative to density functional theory (DFT)^4 for PES modeling across the periodic table. However, their accuracy today is fundamentally constrained due to a reliance on DFT relaxation data.^{5,6} Here, we introduce MatPES, a foundational PES dataset comprising sim 400,000 structures carefully sampled from 281 million molecular dynamics snapshots that span 16 billion atomic environments. We demonstrate that UMLIPs trained on the modestly sized MatPES dataset can rival, or even outperform, prior models trained on much larger datasets across a broad range of equilibrium, near-equilibrium, and molecular dynamics property benchmarks. We also introduce the first high-fidelity PES dataset based on the revised regularized strongly constrained and appropriately normed (r^2SCAN) functional^7 with greatly improved descriptions of interatomic bonding. The open source MatPES initiative emphasizes the importance of data quality over quantity in materials science and enables broad community-driven advancements toward more reliable, generalizable, and efficient UMLIPs for large-scale materials discovery and design.
An Empirical Analysis of Feature Engineering for Predictive Modeling
Machine learning models, such as neural networks, decision trees, random forests, and gradient boosting machines, accept a feature vector, and provide a prediction. These models learn in a supervised fashion where we provide feature vectors mapped to the expected output. It is common practice to engineer new features from the provided feature set. Such engineered features will either augment or replace portions of the existing feature vector. These engineered features are essentially calculated fields based on the values of the other features. Engineering such features is primarily a manual, time-consuming task. Additionally, each type of model will respond differently to different kinds of engineered features. This paper reports empirical research to demonstrate what kinds of engineered features are best suited to various machine learning model types. We provide this recommendation by generating several datasets that we designed to benefit from a particular type of engineered feature. The experiment demonstrates to what degree the machine learning model can synthesize the needed feature on its own. If a model can synthesize a planned feature, it is not necessary to provide that feature. The research demonstrated that the studied models do indeed perform differently with various types of engineered features.
Advancing Molecular Machine (Learned) Representations with Stereoelectronics-Infused Molecular Graphs
Molecular representation is a foundational element in our understanding of the physical world. Its importance ranges from the fundamentals of chemical reactions to the design of new therapies and materials. Previous molecular machine learning models have employed strings, fingerprints, global features, and simple molecular graphs that are inherently information-sparse representations. However, as the complexity of prediction tasks increases, the molecular representation needs to encode higher fidelity information. This work introduces a novel approach to infusing quantum-chemical-rich information into molecular graphs via stereoelectronic effects. We show that the explicit addition of stereoelectronic interactions significantly improves the performance of molecular machine learning models. Furthermore, stereoelectronics-infused representations can be learned and deployed with a tailored double graph neural network workflow, enabling its application to any downstream molecular machine learning task. Finally, we show that the learned representations allow for facile stereoelectronic evaluation of previously intractable systems, such as entire proteins, opening new avenues of molecular design.
Knowledge-informed Molecular Learning: A Survey on Paradigm Transfer
Machine learning, notably deep learning, has significantly propelled molecular investigations within the biochemical sphere. Traditionally, modeling for such research has centered around a handful of paradigms. For instance, the prediction paradigm is frequently deployed for tasks such as molecular property prediction. To enhance the generation and decipherability of purely data-driven models, scholars have integrated biochemical domain knowledge into these molecular study models. This integration has sparked a surge in paradigm transfer, which is solving one molecular learning task by reformulating it as another one. With the emergence of Large Language Models, these paradigms have demonstrated an escalating trend towards harmonized unification. In this work, we delineate a literature survey focused on knowledge-informed molecular learning from the perspective of paradigm transfer. We classify the paradigms, scrutinize their methodologies, and dissect the contribution of domain knowledge. Moreover, we encapsulate prevailing trends and identify intriguing avenues for future exploration in molecular learning.
An Embedding-Dynamic Approach to Self-supervised Learning
A number of recent self-supervised learning methods have shown impressive performance on image classification and other tasks. A somewhat bewildering variety of techniques have been used, not always with a clear understanding of the reasons for their benefits, especially when used in combination. Here we treat the embeddings of images as point particles and consider model optimization as a dynamic process on this system of particles. Our dynamic model combines an attractive force for similar images, a locally dispersive force to avoid local collapse, and a global dispersive force to achieve a globally-homogeneous distribution of particles. The dynamic perspective highlights the advantage of using a delayed-parameter image embedding (a la BYOL) together with multiple views of the same image. It also uses a purely-dynamic local dispersive force (Brownian motion) that shows improved performance over other methods and does not require knowledge of other particle coordinates. The method is called MSBReg which stands for (i) a Multiview centroid loss, which applies an attractive force to pull different image view embeddings toward their centroid, (ii) a Singular value loss, which pushes the particle system toward spatially homogeneous density, (iii) a Brownian diffusive loss. We evaluate downstream classification performance of MSBReg on ImageNet as well as transfer learning tasks including fine-grained classification, multi-class object classification, object detection, and instance segmentation. In addition, we also show that applying our regularization term to other methods further improves their performance and stabilize the training by preventing a mode collapse.
Solvation Free Energies from Neural Thermodynamic Integration
We present a method for computing free-energy differences using thermodynamic integration with a neural network potential that interpolates between two target Hamiltonians. The interpolation is defined at the sample distribution level, and the neural network potential is optimized to match the corresponding equilibrium potential at every intermediate time-step. Once the interpolating potentials and samples are well-aligned, the free-energy difference can be estimated using (neural) thermodynamic integration. To target molecular systems, we simultaneously couple Lennard-Jones and electrostatic interactions and model the rigid-body rotation of molecules. We report accurate results for several benchmark systems: a Lennard-Jones particle in a Lennard-Jones fluid, as well as the insertion of both water and methane solutes in a water solvent at atomistic resolution using a simple three-body neural-network potential.
Towards a Physics Foundation Model
Foundation models have revolutionized natural language processing through a ``train once, deploy anywhere'' paradigm, where a single pre-trained model adapts to countless downstream tasks without retraining. Access to a Physics Foundation Model (PFM) would be transformative -- democratizing access to high-fidelity simulations, accelerating scientific discovery, and eliminating the need for specialized solver development. Yet current physics-aware machine learning approaches remain fundamentally limited to single, narrow domains and require retraining for each new system. We present the General Physics Transformer (GPhyT), trained on 1.8 TB of diverse simulation data, that demonstrates foundation model capabilities are achievable for physics. Our key insight is that transformers can learn to infer governing dynamics from context, enabling a single model to simulate fluid-solid interactions, shock waves, thermal convection, and multi-phase dynamics without being told the underlying equations. GPhyT achieves three critical breakthroughs: (1) superior performance across multiple physics domains, outperforming specialized architectures by up to 29x, (2) zero-shot generalization to entirely unseen physical systems through in-context learning, and (3) stable long-term predictions through 50-timestep rollouts. By establishing that a single model can learn generalizable physical principles from data alone, this work opens the path toward a universal PFM that could transform computational science and engineering.
70 years of machine learning in geoscience in review
This review gives an overview of the development of machine learning in geoscience. A thorough analysis of the co-developments of machine learning applications throughout the last 70 years relates the recent enthusiasm for machine learning to developments in geoscience. I explore the shift of kriging towards a mainstream machine learning method and the historic application of neural networks in geoscience, following the general trend of machine learning enthusiasm through the decades. Furthermore, this chapter explores the shift from mathematical fundamentals and knowledge in software development towards skills in model validation, applied statistics, and integrated subject matter expertise. The review is interspersed with code examples to complement the theoretical foundations and illustrate model validation and machine learning explainability for science. The scope of this review includes various shallow machine learning methods, e.g. Decision Trees, Random Forests, Support-Vector Machines, and Gaussian Processes, as well as, deep neural networks, including feed-forward neural networks, convolutional neural networks, recurrent neural networks and generative adversarial networks. Regarding geoscience, the review has a bias towards geophysics but aims to strike a balance with geochemistry, geostatistics, and geology, however excludes remote sensing, as this would exceed the scope. In general, I aim to provide context for the recent enthusiasm surrounding deep learning with respect to research, hardware, and software developments that enable successful application of shallow and deep machine learning in all disciplines of Earth science.
Neural Message Passing for Quantum Chemistry
Supervised learning on molecules has incredible potential to be useful in chemistry, drug discovery, and materials science. Luckily, several promising and closely related neural network models invariant to molecular symmetries have already been described in the literature. These models learn a message passing algorithm and aggregation procedure to compute a function of their entire input graph. At this point, the next step is to find a particularly effective variant of this general approach and apply it to chemical prediction benchmarks until we either solve them or reach the limits of the approach. In this paper, we reformulate existing models into a single common framework we call Message Passing Neural Networks (MPNNs) and explore additional novel variations within this framework. Using MPNNs we demonstrate state of the art results on an important molecular property prediction benchmark; these results are strong enough that we believe future work should focus on datasets with larger molecules or more accurate ground truth labels.
AQCat25: Unlocking spin-aware, high-fidelity machine learning potentials for heterogeneous catalysis
Large-scale datasets have enabled highly accurate machine learning interatomic potentials (MLIPs) for general-purpose heterogeneous catalysis modeling. There are, however, some limitations in what can be treated with these potentials because of gaps in the underlying training data. To extend these capabilities, we introduce AQCat25, a complementary dataset of 13.5 million density functional theory (DFT) single point calculations designed to improve the treatment of systems where spin polarization and/or higher fidelity are critical. We also investigate methodologies for integrating new datasets, such as AQCat25, with the broader Open Catalyst 2020 (OC20) dataset to create spin-aware models without sacrificing generalizability. We find that directly tuning a general model on AQCat25 leads to catastrophic forgetting of the original dataset's knowledge. Conversely, joint training strategies prove effective for improving accuracy on the new data without sacrificing general performance. This joint approach introduces a challenge, as the model must learn from a dataset containing both mixed-fidelity calculations and mixed-physics (spin-polarized vs. unpolarized). We show that explicitly conditioning the model on this system-specific metadata, for example by using Feature-wise Linear Modulation (FiLM), successfully addresses this challenge and further enhances model accuracy. Ultimately, our work establishes an effective protocol for bridging DFT fidelity domains to advance the predictive power of foundational models in catalysis.
Large-Scale Chemical Language Representations Capture Molecular Structure and Properties
Models based on machine learning can enable accurate and fast molecular property predictions, which is of interest in drug discovery and material design. Various supervised machine learning models have demonstrated promising performance, but the vast chemical space and the limited availability of property labels make supervised learning challenging. Recently, unsupervised transformer-based language models pretrained on a large unlabelled corpus have produced state-of-the-art results in many downstream natural language processing tasks. Inspired by this development, we present molecular embeddings obtained by training an efficient transformer encoder model, MoLFormer, which uses rotary positional embeddings. This model employs a linear attention mechanism, coupled with highly distributed training, on SMILES sequences of 1.1 billion unlabelled molecules from the PubChem and ZINC datasets. We show that the learned molecular representation outperforms existing baselines, including supervised and self-supervised graph neural networks and language models, on several downstream tasks from ten benchmark datasets. They perform competitively on two others. Further analyses, specifically through the lens of attention, demonstrate that MoLFormer trained on chemical SMILES indeed learns the spatial relationships between atoms within a molecule. These results provide encouraging evidence that large-scale molecular language models can capture sufficient chemical and structural information to predict various distinct molecular properties, including quantum-chemical properties.
Matbench Discovery -- An evaluation framework for machine learning crystal stability prediction
Matbench Discovery simulates the deployment of machine learning (ML) energy models in a high-throughput search for stable inorganic crystals. We address the disconnect between (i) thermodynamic stability and formation energy and (ii) in-domain vs out-of-distribution performance. Alongside this paper, we publish a Python package to aid with future model submissions and a growing online leaderboard with further insights into trade-offs between various performance metrics. To answer the question which ML methodology performs best at materials discovery, our initial release explores a variety of models including random forests, graph neural networks (GNN), one-shot predictors, iterative Bayesian optimizers and universal interatomic potentials (UIP). Ranked best-to-worst by their test set F1 score on thermodynamic stability prediction, we find CHGNet > M3GNet > MACE > ALIGNN > MEGNet > CGCNN > CGCNN+P > Wrenformer > BOWSR > Voronoi tessellation fingerprints with random forest. The top 3 models are UIPs, the winning methodology for ML-guided materials discovery, achieving F1 scores of ~0.6 for crystal stability classification and discovery acceleration factors (DAF) of up to 5x on the first 10k most stable predictions compared to dummy selection from our test set. We also highlight a sharp disconnect between commonly used global regression metrics and more task-relevant classification metrics. Accurate regressors are susceptible to unexpectedly high false-positive rates if those accurate predictions lie close to the decision boundary at 0 eV/atom above the convex hull where most materials are. Our results highlight the need to focus on classification metrics that actually correlate with improved stability hit rate.
Dense Hebbian neural networks: a replica symmetric picture of supervised learning
We consider dense, associative neural-networks trained by a teacher (i.e., with supervision) and we investigate their computational capabilities analytically, via statistical-mechanics of spin glasses, and numerically, via Monte Carlo simulations. In particular, we obtain a phase diagram summarizing their performance as a function of the control parameters such as quality and quantity of the training dataset, network storage and noise, that is valid in the limit of large network size and structureless datasets: these networks may work in a ultra-storage regime (where they can handle a huge amount of patterns, if compared with shallow neural networks) or in a ultra-detection regime (where they can perform pattern recognition at prohibitive signal-to-noise ratios, if compared with shallow neural networks). Guided by the random theory as a reference framework, we also test numerically learning, storing and retrieval capabilities shown by these networks on structured datasets as MNist and Fashion MNist. As technical remarks, from the analytic side, we implement large deviations and stability analysis within Guerra's interpolation to tackle the not-Gaussian distributions involved in the post-synaptic potentials while, from the computational counterpart, we insert Plefka approximation in the Monte Carlo scheme, to speed up the evaluation of the synaptic tensors, overall obtaining a novel and broad approach to investigate supervised learning in neural networks, beyond the shallow limit, in general.
Exploring the cloud of feature interaction scores in a Rashomon set
Interactions among features are central to understanding the behavior of machine learning models. Recent research has made significant strides in detecting and quantifying feature interactions in single predictive models. However, we argue that the feature interactions extracted from a single pre-specified model may not be trustworthy since: a well-trained predictive model may not preserve the true feature interactions and there exist multiple well-performing predictive models that differ in feature interaction strengths. Thus, we recommend exploring feature interaction strengths in a model class of approximately equally accurate predictive models. In this work, we introduce the feature interaction score (FIS) in the context of a Rashomon set, representing a collection of models that achieve similar accuracy on a given task. We propose a general and practical algorithm to calculate the FIS in the model class. We demonstrate the properties of the FIS via synthetic data and draw connections to other areas of statistics. Additionally, we introduce a Halo plot for visualizing the feature interaction variance in high-dimensional space and a swarm plot for analyzing FIS in a Rashomon set. Experiments with recidivism prediction and image classification illustrate how feature interactions can vary dramatically in importance for similarly accurate predictive models. Our results suggest that the proposed FIS can provide valuable insights into the nature of feature interactions in machine learning models.
Equivariant Scalar Fields for Molecular Docking with Fast Fourier Transforms
Molecular docking is critical to structure-based virtual screening, yet the throughput of such workflows is limited by the expensive optimization of scoring functions involved in most docking algorithms. We explore how machine learning can accelerate this process by learning a scoring function with a functional form that allows for more rapid optimization. Specifically, we define the scoring function to be the cross-correlation of multi-channel ligand and protein scalar fields parameterized by equivariant graph neural networks, enabling rapid optimization over rigid-body degrees of freedom with fast Fourier transforms. The runtime of our approach can be amortized at several levels of abstraction, and is particularly favorable for virtual screening settings with a common binding pocket. We benchmark our scoring functions on two simplified docking-related tasks: decoy pose scoring and rigid conformer docking. Our method attains similar but faster performance on crystal structures compared to the widely-used Vina and Gnina scoring functions, and is more robust on computationally predicted structures. Code is available at https://github.com/bjing2016/scalar-fields.
LLM and Simulation as Bilevel Optimizers: A New Paradigm to Advance Physical Scientific Discovery
Large Language Models have recently gained significant attention in scientific discovery for their extensive knowledge and advanced reasoning capabilities. However, they encounter challenges in effectively simulating observational feedback and grounding it with language to propel advancements in physical scientific discovery. Conversely, human scientists undertake scientific discovery by formulating hypotheses, conducting experiments, and revising theories through observational analysis. Inspired by this, we propose to enhance the knowledge-driven, abstract reasoning abilities of LLMs with the computational strength of simulations. We introduce Scientific Generative Agent (SGA), a bilevel optimization framework: LLMs act as knowledgeable and versatile thinkers, proposing scientific hypotheses and reason about discrete components, such as physics equations or molecule structures; meanwhile, simulations function as experimental platforms, providing observational feedback and optimizing via differentiability for continuous parts, such as physical parameters. We conduct extensive experiments to demonstrate our framework's efficacy in constitutive law discovery and molecular design, unveiling novel solutions that differ from conventional human expectations yet remain coherent upon analysis.
Cross-functional transferability in universal machine learning interatomic potentials
The rapid development of universal machine learning interatomic potentials (uMLIPs) has demonstrated the possibility for generalizable learning of the universal potential energy surface. In principle, the accuracy of uMLIPs can be further improved by bridging the model from lower-fidelity datasets to high-fidelity ones. In this work, we analyze the challenge of this transfer learning problem within the CHGNet framework. We show that significant energy scale shifts and poor correlations between GGA and r^2SCAN pose challenges to cross-functional data transferability in uMLIPs. By benchmarking different transfer learning approaches on the MP-r^2SCAN dataset of 0.24 million structures, we demonstrate the importance of elemental energy referencing in the transfer learning of uMLIPs. By comparing the scaling law with and without the pre-training on a low-fidelity dataset, we show that significant data efficiency can still be achieved through transfer learning, even with a target dataset of sub-million structures. We highlight the importance of proper transfer learning and multi-fidelity learning in creating next-generation uMLIPs on high-fidelity data.
Grad DFT: a software library for machine learning enhanced density functional theory
Density functional theory (DFT) stands as a cornerstone method in computational quantum chemistry and materials science due to its remarkable versatility and scalability. Yet, it suffers from limitations in accuracy, particularly when dealing with strongly correlated systems. To address these shortcomings, recent work has begun to explore how machine learning can expand the capabilities of DFT; an endeavor with many open questions and technical challenges. In this work, we present Grad DFT: a fully differentiable JAX-based DFT library, enabling quick prototyping and experimentation with machine learning-enhanced exchange-correlation energy functionals. Grad DFT employs a pioneering parametrization of exchange-correlation functionals constructed using a weighted sum of energy densities, where the weights are determined using neural networks. Moreover, Grad DFT encompasses a comprehensive suite of auxiliary functions, notably featuring a just-in-time compilable and fully differentiable self-consistent iterative procedure. To support training and benchmarking efforts, we additionally compile a curated dataset of experimental dissociation energies of dimers, half of which contain transition metal atoms characterized by strong electronic correlations. The software library is tested against experimental results to study the generalization capabilities of a neural functional across potential energy surfaces and atomic species, as well as the effect of training data noise on the resulting model accuracy.
Meta-Learning with Fewer Tasks through Task Interpolation
Meta-learning enables algorithms to quickly learn a newly encountered task with just a few labeled examples by transferring previously learned knowledge. However, the bottleneck of current meta-learning algorithms is the requirement of a large number of meta-training tasks, which may not be accessible in real-world scenarios. To address the challenge that available tasks may not densely sample the space of tasks, we propose to augment the task set through interpolation. By meta-learning with task interpolation (MLTI), our approach effectively generates additional tasks by randomly sampling a pair of tasks and interpolating the corresponding features and labels. Under both gradient-based and metric-based meta-learning settings, our theoretical analysis shows MLTI corresponds to a data-adaptive meta-regularization and further improves the generalization. Empirically, in our experiments on eight datasets from diverse domains including image recognition, pose prediction, molecule property prediction, and medical image classification, we find that the proposed general MLTI framework is compatible with representative meta-learning algorithms and consistently outperforms other state-of-the-art strategies.
Generalizable Neural Fields as Partially Observed Neural Processes
Neural fields, which represent signals as a function parameterized by a neural network, are a promising alternative to traditional discrete vector or grid-based representations. Compared to discrete representations, neural representations both scale well with increasing resolution, are continuous, and can be many-times differentiable. However, given a dataset of signals that we would like to represent, having to optimize a separate neural field for each signal is inefficient, and cannot capitalize on shared information or structures among signals. Existing generalization methods view this as a meta-learning problem and employ gradient-based meta-learning to learn an initialization which is then fine-tuned with test-time optimization, or learn hypernetworks to produce the weights of a neural field. We instead propose a new paradigm that views the large-scale training of neural representations as a part of a partially-observed neural process framework, and leverage neural process algorithms to solve this task. We demonstrate that this approach outperforms both state-of-the-art gradient-based meta-learning approaches and hypernetwork approaches.
Protein-ligand binding representation learning from fine-grained interactions
The binding between proteins and ligands plays a crucial role in the realm of drug discovery. Previous deep learning approaches have shown promising results over traditional computationally intensive methods, but resulting in poor generalization due to limited supervised data. In this paper, we propose to learn protein-ligand binding representation in a self-supervised learning manner. Different from existing pre-training approaches which treat proteins and ligands individually, we emphasize to discern the intricate binding patterns from fine-grained interactions. Specifically, this self-supervised learning problem is formulated as a prediction of the conclusive binding complex structure given a pocket and ligand with a Transformer based interaction module, which naturally emulates the binding process. To ensure the representation of rich binding information, we introduce two pre-training tasks, i.e.~atomic pairwise distance map prediction and mask ligand reconstruction, which comprehensively model the fine-grained interactions from both structure and feature space. Extensive experiments have demonstrated the superiority of our method across various binding tasks, including protein-ligand affinity prediction, virtual screening and protein-ligand docking.
Chemically Transferable Generative Backmapping of Coarse-Grained Proteins
Coarse-graining (CG) accelerates molecular simulations of protein dynamics by simulating sets of atoms as singular beads. Backmapping is the opposite operation of bringing lost atomistic details back from the CG representation. While machine learning (ML) has produced accurate and efficient CG simulations of proteins, fast and reliable backmapping remains a challenge. Rule-based methods produce poor all-atom geometries, needing computationally costly refinement through additional simulations. Recently proposed ML approaches outperform traditional baselines but are not transferable between proteins and sometimes generate unphysical atom placements with steric clashes and implausible torsion angles. This work addresses both issues to build a fast, transferable, and reliable generative backmapping tool for CG protein representations. We achieve generalization and reliability through a combined set of innovations: representation based on internal coordinates; an equivariant encoder/prior; a custom loss function that helps ensure local structure, global structure, and physical constraints; and expert curation of high-quality out-of-equilibrium protein data for training. Our results pave the way for out-of-the-box backmapping of coarse-grained simulations for arbitrary proteins.
MDAgent2: Large Language Model for Code Generation and Knowledge Q&A in Molecular Dynamics
Molecular dynamics (MD) simulations are essential for understanding atomic-scale behaviors in materials science, yet writing LAMMPS scripts remains highly specialized and time-consuming tasks. Although LLMs show promise in code generation and domain-specific question answering, their performance in MD scenarios is limited by scarce domain data, the high deployment cost of state-of-the-art LLMs, and low code executability. Building upon our prior MDAgent, we present MDAgent2, the first end-to-end framework capable of performing both knowledge Q&A and code generation within the MD domain. We construct a domain-specific data-construction pipeline that yields three high-quality datasets spanning MD knowledge, question answering, and code generation. Based on these datasets, we adopt a three stage post-training strategy--continued pre-training (CPT), supervised fine-tuning (SFT), and reinforcement learning (RL)--to train two domain-adapted models, MD-Instruct and MD-Code. Furthermore, we introduce MD-GRPO, a closed-loop RL method that leverages simulation outcomes as reward signals and recycles low-reward trajectories for continual refinement. We further build MDAgent2-RUNTIME, a deployable multi-agent system that integrates code generation, execution, evaluation, and self-correction. Together with MD-EvalBench proposed in this work, the first benchmark for LAMMPS code generation and question answering, our models and system achieve performance surpassing several strong baselines.This work systematically demonstrates the adaptability and generalization capability of large language models in industrial simulation tasks, laying a methodological foundation for automatic code generation in AI for Science and industrial-scale simulations. URL: https://github.com/FredericVAN/PKU_MDAgent2
Generating Molecular Conformer Fields
In this paper we tackle the problem of generating conformers of a molecule in 3D space given its molecular graph. We parameterize these conformers as continuous functions that map elements from the molecular graph to points in 3D space. We then formulate the problem of learning to generate conformers as learning a distribution over these functions using a diffusion generative model, called Molecular Conformer Fields (MCF). Our approach is simple and scalable, and achieves state-of-the-art performance on challenging molecular conformer generation benchmarks while making no assumptions about the explicit structure of molecules (e.g. modeling torsional angles). MCF represents an advance in extending diffusion models to handle complex scientific problems in a conceptually simple, scalable and effective manner.
M^{3}-20M: A Large-Scale Multi-Modal Molecule Dataset for AI-driven Drug Design and Discovery
This paper introduces M^{3}-20M, a large-scale Multi-Modal Molecular dataset that contains over 20 million molecules. Designed to support AI-driven drug design and discovery, M^{3}-20M is 71 times more in the number of molecules than the largest existing dataset, providing an unprecedented scale that can highly benefit training or fine-tuning large (language) models with superior performance for drug design and discovery. This dataset integrates one-dimensional SMILES, two-dimensional molecular graphs, three-dimensional molecular structures, physicochemical properties, and textual descriptions collected through web crawling and generated by using GPT-3.5, offering a comprehensive view of each molecule. To demonstrate the power of M^{3}-20M in drug design and discovery, we conduct extensive experiments on two key tasks: molecule generation and molecular property prediction, using large language models including GLM4, GPT-3.5, and GPT-4. Our experimental results show that M^{3}-20M can significantly boost model performance in both tasks. Specifically, it enables the models to generate more diverse and valid molecular structures and achieve higher property prediction accuracy than the existing single-modal datasets, which validates the value and potential of M^{3}-20M in supporting AI-driven drug design and discovery. The dataset is available at https://github.com/bz99bz/M-3.
AIDE: AI-Driven Exploration in the Space of Code
Machine learning, the foundation of modern artificial intelligence, has driven innovations that have fundamentally transformed the world. Yet, behind advancements lies a complex and often tedious process requiring labor and compute intensive iteration and experimentation. Engineers and scientists developing machine learning models spend much of their time on trial-and-error tasks instead of conceptualizing innovative solutions or research hypotheses. To address this challenge, we introduce AI-Driven Exploration (AIDE), a machine learning engineering agent powered by large language models (LLMs). AIDE frames machine learning engineering as a code optimization problem, and formulates trial-and-error as a tree search in the space of potential solutions. By strategically reusing and refining promising solutions, AIDE effectively trades computational resources for enhanced performance, achieving state-of-the-art results on multiple machine learning engineering benchmarks, including our Kaggle evaluations, OpenAI MLE-Bench and METRs RE-Bench.
Benchmarking Pretrained Molecular Embedding Models For Molecular Representation Learning
Pretrained neural networks have attracted significant interest in chemistry and small molecule drug design. Embeddings from these models are widely used for molecular property prediction, virtual screening, and small data learning in molecular chemistry. This study presents the most extensive comparison of such models to date, evaluating 25 models across 25 datasets. Under a fair comparison framework, we assess models spanning various modalities, architectures, and pretraining strategies. Using a dedicated hierarchical Bayesian statistical testing model, we arrive at a surprising result: nearly all neural models show negligible or no improvement over the baseline ECFP molecular fingerprint. Only the CLAMP model, which is also based on molecular fingerprints, performs statistically significantly better than the alternatives. These findings raise concerns about the evaluation rigor in existing studies. We discuss potential causes, propose solutions, and offer practical recommendations.
Towards Trustworthy Machine Learning in Production: An Overview of the Robustness in MLOps Approach
Artificial intelligence (AI), and especially its sub-field of Machine Learning (ML), are impacting the daily lives of everyone with their ubiquitous applications. In recent years, AI researchers and practitioners have introduced principles and guidelines to build systems that make reliable and trustworthy decisions. From a practical perspective, conventional ML systems process historical data to extract the features that are consequently used to train ML models that perform the desired task. However, in practice, a fundamental challenge arises when the system needs to be operationalized and deployed to evolve and operate in real-life environments continuously. To address this challenge, Machine Learning Operations (MLOps) have emerged as a potential recipe for standardizing ML solutions in deployment. Although MLOps demonstrated great success in streamlining ML processes, thoroughly defining the specifications of robust MLOps approaches remains of great interest to researchers and practitioners. In this paper, we provide a comprehensive overview of the trustworthiness property of MLOps systems. Specifically, we highlight technical practices to achieve robust MLOps systems. In addition, we survey the existing research approaches that address the robustness aspects of ML systems in production. We also review the tools and software available to build MLOps systems and summarize their support to handle the robustness aspects. Finally, we present the open challenges and propose possible future directions and opportunities within this emerging field. The aim of this paper is to provide researchers and practitioners working on practical AI applications with a comprehensive view to adopt robust ML solutions in production environments.
The Physics-Informed Neural Network Gravity Model: Generation III
Scientific machine learning and the advent of the Physics-Informed Neural Network (PINN) show considerable potential in their capacity to identify solutions to complex differential equations. Over the past two years, much work has gone into the development of PINNs capable of solving the gravity field modeling problem -- i.e.\ learning a differentiable form of the gravitational potential from position and acceleration estimates. While the past PINN gravity models (PINN-GMs) have demonstrated advantages in model compactness, robustness to noise, and sample efficiency; there remain key modeling challenges which this paper aims to address. Specifically, this paper introduces the third generation of the Physics-Informed Neural Network Gravity Model (PINN-GM-III) which solves the problems of extrapolation error, bias towards low-altitude samples, numerical instability at high-altitudes, and compliant boundary conditions through numerous modifications to the model's design. The PINN-GM-III is tested by modeling a known heterogeneous density asteroid, and its performance is evaluated using seven core metrics which showcases its strengths against its predecessors and other analytic and numerical gravity models.
Unifying Molecular and Textual Representations via Multi-task Language Modelling
The recent advances in neural language models have also been successfully applied to the field of chemistry, offering generative solutions for classical problems in molecular design and synthesis planning. These new methods have the potential to optimize laboratory operations and fuel a new era of data-driven automation in scientific discovery. However, specialized models are still typically required for each task, leading to the need for problem-specific fine-tuning and neglecting task interrelations. The main obstacle in this field is the lack of a unified representation between natural language and chemical representations, complicating and limiting human-machine interaction. Here, we propose a multi-domain, multi-task language model to solve a wide range of tasks in both the chemical and natural language domains. By leveraging multi-task learning, our model can handle chemical and natural language concurrently, without requiring expensive pre-training on single domains or task-specific models. Interestingly, sharing weights across domains remarkably improves our model when benchmarked against state-of-the-art baselines on single-domain and cross-domain tasks. In particular, sharing information across domains and tasks gives rise to large improvements in cross-domain tasks, the magnitude of which increase with scale, as measured by more than a dozen of relevant metrics. Our work suggests that such models can robustly and efficiently accelerate discovery in physical sciences by superseding problem-specific fine-tuning and enhancing human-model interactions.
An Experience Report on Machine Learning Reproducibility: Guidance for Practitioners and TensorFlow Model Garden Contributors
Machine learning techniques are becoming a fundamental tool for scientific and engineering progress. These techniques are applied in contexts as diverse as astronomy and spam filtering. However, correctly applying these techniques requires careful engineering. Much attention has been paid to the technical potential; relatively little attention has been paid to the software engineering process required to bring research-based machine learning techniques into practical utility. Technology companies have supported the engineering community through machine learning frameworks such as TensorFLow and PyTorch, but the details of how to engineer complex machine learning models in these frameworks have remained hidden. To promote best practices within the engineering community, academic institutions and Google have partnered to launch a Special Interest Group on Machine Learning Models (SIGMODELS) whose goal is to develop exemplary implementations of prominent machine learning models in community locations such as the TensorFlow Model Garden (TFMG). The purpose of this report is to define a process for reproducing a state-of-the-art machine learning model at a level of quality suitable for inclusion in the TFMG. We define the engineering process and elaborate on each step, from paper analysis to model release. We report on our experiences implementing the YOLO model family with a team of 26 student researchers, share the tools we developed, and describe the lessons we learned along the way.
Retention Time Prediction for Chromatographic Enantioseparation by Quantile Geometry-enhanced Graph Neural Network
A new research framework is proposed to incorporate machine learning techniques into the field of experimental chemistry to facilitate chromatographic enantioseparation. A documentary dataset of chiral molecular retention times (CMRT dataset) in high-performance liquid chromatography is established to handle the challenge of data acquisition. Based on the CMRT dataset, a quantile geometry-enhanced graph neural network is proposed to learn the molecular structure-retention time relationship, which shows a satisfactory predictive ability for enantiomers. The domain knowledge of chromatography is incorporated into the machine learning model to achieve multi-column prediction, which paves the way for chromatographic enantioseparation prediction by calculating the separation probability. Experiments confirm that the proposed research framework works well in retention time prediction and chromatographic enantioseparation facilitation, which sheds light on the application of machine learning techniques to the experimental scene and improves the efficiency of experimenters to speed up scientific discovery.
A Data Source for Reasoning Embodied Agents
Recent progress in using machine learning models for reasoning tasks has been driven by novel model architectures, large-scale pre-training protocols, and dedicated reasoning datasets for fine-tuning. In this work, to further pursue these advances, we introduce a new data generator for machine reasoning that integrates with an embodied agent. The generated data consists of templated text queries and answers, matched with world-states encoded into a database. The world-states are a result of both world dynamics and the actions of the agent. We show the results of several baseline models on instantiations of train sets. These include pre-trained language models fine-tuned on a text-formatted representation of the database, and graph-structured Transformers operating on a knowledge-graph representation of the database. We find that these models can answer some questions about the world-state, but struggle with others. These results hint at new research directions in designing neural reasoning models and database representations. Code to generate the data will be released at github.com/facebookresearch/neuralmemory
Scaling Physical Reasoning with the PHYSICS Dataset
Large Language Models (LLMs) have achieved remarkable progress on advanced reasoning tasks such as mathematics and coding competitions. Meanwhile, physics, despite being both reasoning-intensive and essential to real-world understanding, received limited academic and industrial attention. This paper introduces PHYSICS, a dataset containing 16,568 high-quality physics problems spanning subjects and difficulty levels, to facilitate this issue. Specifically, PHYSICS is curated with exercises from over 100 textbooks through a carefully designed pipeline for quality control. It covers five major physics domains: Mechanics, Electromagnetism, Thermodynamics, Optics, and Modern Physics. It also spans a wide range of difficulty levels, from high school to graduate-level physics courses. To utilize the data for improving and evaluating the model's physical reasoning capabilities, we split the dataset into training and test sets, and provide reasoning paths generated by powerful reasoning models for the training data to facilitate model training. In addition, for the evaluation part, we find that existing evaluation frameworks exhibit biases in aspects such as units, simplification, and precision in physics domain. To balance efficiency and accuracy, we introduce a Rule+Model evaluation framework tailored to physics problems. Our evaluations on current state-of-the-art open-source and proprietary models highlight the limitations of current models in handling physics-related tasks. We hope that our dataset and evaluation methodology will jointly advance the development of LLMs in the field of physics.
MeLM, a generative pretrained language modeling framework that solves forward and inverse mechanics problems
We report a flexible multi-modal mechanics language model, MeLM, applied to solve various nonlinear forward and inverse problems, that can deal with a set of instructions, numbers and microstructure data. The framework is applied to various examples including bio-inspired hierarchical honeycomb design, carbon nanotube mechanics, and protein unfolding. In spite of the flexible nature of the model-which allows us to easily incorporate diverse materials, scales, and mechanical features-it performs well across disparate forward and inverse tasks. Based on an autoregressive attention-model, MeLM effectively represents a large multi-particle system consisting of hundreds of millions of neurons, where the interaction potentials are discovered through graph-forming self-attention mechanisms that are then used to identify relationships from emergent structures, while taking advantage of synergies discovered in the training data. We show that the model can solve complex degenerate mechanics design problems and determine novel material architectures across a range of hierarchical levels, providing an avenue for materials discovery and analysis. Looking beyond the demonstrations reported in this paper, we discuss other opportunities in applied mechanics and general considerations about the use of large language models in modeling, design, and analysis that can span a broad spectrum of material properties from mechanical, thermal, optical, to electronic.
SPhyR: Spatial-Physical Reasoning Benchmark on Material Distribution
We introduce a novel dataset designed to benchmark the physical and spatial reasoning capabilities of Large Language Models (LLM) based on topology optimization, a method for computing optimal material distributions within a design space under prescribed loads and supports. In this dataset, LLMs are provided with conditions such as 2D boundary, applied forces and supports, and must reason about the resulting optimal material distribution. The dataset includes a variety of tasks, ranging from filling in masked regions within partial structures to predicting complete material distributions. Solving these tasks requires understanding the flow of forces and the required material distribution under given constraints, without access to simulation tools or explicit physical models, challenging models to reason about structural stability and spatial organization. Our dataset targets the evaluation of spatial and physical reasoning abilities in 2D settings, offering a complementary perspective to traditional language and logic benchmarks.
Unsolved Problems in ML Safety
Machine learning (ML) systems are rapidly increasing in size, are acquiring new capabilities, and are increasingly deployed in high-stakes settings. As with other powerful technologies, safety for ML should be a leading research priority. In response to emerging safety challenges in ML, such as those introduced by recent large-scale models, we provide a new roadmap for ML Safety and refine the technical problems that the field needs to address. We present four problems ready for research, namely withstanding hazards ("Robustness"), identifying hazards ("Monitoring"), reducing inherent model hazards ("Alignment"), and reducing systemic hazards ("Systemic Safety"). Throughout, we clarify each problem's motivation and provide concrete research directions.
Discovering Symbolic Models from Deep Learning with Inductive Biases
We develop a general approach to distill symbolic representations of a learned deep model by introducing strong inductive biases. We focus on Graph Neural Networks (GNNs). The technique works as follows: we first encourage sparse latent representations when we train a GNN in a supervised setting, then we apply symbolic regression to components of the learned model to extract explicit physical relations. We find the correct known equations, including force laws and Hamiltonians, can be extracted from the neural network. We then apply our method to a non-trivial cosmology example-a detailed dark matter simulation-and discover a new analytic formula which can predict the concentration of dark matter from the mass distribution of nearby cosmic structures. The symbolic expressions extracted from the GNN using our technique also generalized to out-of-distribution data better than the GNN itself. Our approach offers alternative directions for interpreting neural networks and discovering novel physical principles from the representations they learn.
FlowBack-Adjoint: Physics-Aware and Energy-Guided Conditional Flow-Matching for All-Atom Protein Backmapping
Coarse-grained (CG) molecular models of proteins can substantially increase the time and length scales accessible to molecular dynamics simulations of proteins, but recovery of accurate all-atom (AA) ensembles from CG simulation trajectories can be essential for exposing molecular mechanisms of folding and docking and for calculation of physical properties requiring atomistic detail. The recently reported deep generative model FlowBack restores AA detail to protein C-alpha traces using a flow-matching architecture and demonstrates state-of-the-art performance in generation of AA structural ensembles. Training, however, is performed exclusively on structural data and the absence of any awareness of interatomic energies or forces within training results in small fractions of incorrect bond lengths, atomic clashes, and otherwise high-energy structures. In this work, we introduce FlowBack-Adjoint as a lightweight enhancement that upgrades the pre-trained FlowBack model through a one-time, physics-aware post-training pass. Auxiliary contributions to the flow introduce physical awareness of bond lengths and Lennard-Jones interactions and gradients of a molecular mechanics force field energy are incorporated via adjoint matching to steer the FlowBack-Adjoint vector field to produce lower-energy configurations. In benchmark tests against FlowBack, FlowBack-Adjoint lowers single-point energies by a median of ~78 kcal/mol.residue, reduces errors in bond lengths by >92%, eliminates >98% of molecular clashes, maintains excellent diversity of the AA configurational ensemble, and produces configurations capable of initializing stable all-atom molecular dynamics simulations without requiring energy relaxation. We propose FlowBack-Adjoint as an accurate and efficient physics-aware deep generative model for AA backmapping from C-alpha traces.
PhysiX: A Foundation Model for Physics Simulations
Foundation models have achieved remarkable success across video, image, and language domains. By scaling up the number of parameters and training datasets, these models acquire generalizable world knowledge and often surpass task-specific approaches. However, such progress has yet to extend to the domain of physics simulation. A primary bottleneck is data scarcity: while millions of images, videos, and textual resources are readily available on the internet, the largest physics simulation datasets contain only tens of thousands of samples. This data limitation hinders the use of large models, as overfitting becomes a major concern. As a result, physics applications typically rely on small models, which struggle with long-range prediction due to limited context understanding. Additionally, unlike images, videos, or text-which typically exhibit fixed granularity-physics datasets often vary drastically in scale, amplifying the challenges of scaling up multitask training. We introduce PhysiX, the first large-scale foundation model for physics simulation. PhysiX is a 4.5B parameter autoregressive generative model. It uses a discrete tokenizer to encode physical processes at different scales into a sequence of discrete tokens, and employs an autoregressive next-token prediction objective to model such processes in the token space. To mitigate the rounding error in the discretization process, PhysiX incorporates a specialized refinement module. Through extensive experiments, we show that PhysiX effectively addresses the data bottleneck, outperforming task-specific baselines under comparable settings as well as the previous absolute state-of-the-art approaches on The Well benchmark. Our results indicate that knowledge learned from natural videos can be successfully transferred to physics simulation, and that joint training across diverse simulation tasks enables synergistic learning.
Robust Model-Based Optimization for Challenging Fitness Landscapes
Protein design, a grand challenge of the day, involves optimization on a fitness landscape, and leading methods adopt a model-based approach where a model is trained on a training set (protein sequences and fitness) and proposes candidates to explore next. These methods are challenged by sparsity of high-fitness samples in the training set, a problem that has been in the literature. A less recognized but equally important problem stems from the distribution of training samples in the design space: leading methods are not designed for scenarios where the desired optimum is in a region that is not only poorly represented in training data, but also relatively far from the highly represented low-fitness regions. We show that this problem of "separation" in the design space is a significant bottleneck in existing model-based optimization tools and propose a new approach that uses a novel VAE as its search model to overcome the problem. We demonstrate its advantage over prior methods in robustly finding improved samples, regardless of the imbalance and separation between low- and high-fitness training samples. Our comprehensive benchmark on real and semi-synthetic protein datasets as well as solution design for physics-informed neural networks, showcases the generality of our approach in discrete and continuous design spaces. Our implementation is available at https://github.com/sabagh1994/PGVAE.
Learning towards Minimum Hyperspherical Energy
Neural networks are a powerful class of nonlinear functions that can be trained end-to-end on various applications. While the over-parametrization nature in many neural networks renders the ability to fit complex functions and the strong representation power to handle challenging tasks, it also leads to highly correlated neurons that can hurt the generalization ability and incur unnecessary computation cost. As a result, how to regularize the network to avoid undesired representation redundancy becomes an important issue. To this end, we draw inspiration from a well-known problem in physics -- Thomson problem, where one seeks to find a state that distributes N electrons on a unit sphere as evenly as possible with minimum potential energy. In light of this intuition, we reduce the redundancy regularization problem to generic energy minimization, and propose a minimum hyperspherical energy (MHE) objective as generic regularization for neural networks. We also propose a few novel variants of MHE, and provide some insights from a theoretical point of view. Finally, we apply neural networks with MHE regularization to several challenging tasks. Extensive experiments demonstrate the effectiveness of our intuition, by showing the superior performance with MHE regularization.
The Open Polymers 2026 (OPoly26) Dataset and Evaluations
Polymers-macromolecular systems composed of repeating chemical units-constitute the molecular foundation of living organisms, while their synthetic counterparts drive transformative advances across medicine, consumer products, and energy technologies. While machine learning (ML) models have been trained on millions of quantum chemical atomistic simulations for materials and/or small molecular structures to enable efficient, accurate, and transferable predictions of chemical properties, polymers have largely not been included in prior datasets due to the computational expense of high quality electronic structure calculations on representative polymeric structures. Here, we address this shortcoming with the creation of the Open Polymers 2026 (OPoly26) dataset, which contains more than 6.57 million density functional theory (DFT) calculations on up to 360 atom clusters derived from polymeric systems, comprising over 1.2 billion total atoms. OPoly26 captures the chemical diversity that makes polymers intrinsically tunable and versatile materials, encompassing variations in monomer composition, degree of polymerization, chain architectures, and solvation environments. We show that augmenting ML model training with the OPoly26 dataset improves model performance for polymer prediction tasks. We also publicly release the OPoly26 dataset to help further the development of ML models for polymers, and more broadly, strive towards universal atomistic models.
RealPDEBench: A Benchmark for Complex Physical Systems with Real-World Data
Predicting the evolution of complex physical systems remains a central problem in science and engineering. Despite rapid progress in scientific Machine Learning (ML) models, a critical bottleneck is the lack of expensive real-world data, resulting in most current models being trained and validated on simulated data. Beyond limiting the development and evaluation of scientific ML, this gap also hinders research into essential tasks such as sim-to-real transfer. We introduce RealPDEBench, the first benchmark for scientific ML that integrates real-world measurements with paired numerical simulations. RealPDEBench consists of five datasets, three tasks, eight metrics, and ten baselines. We first present five real-world measured datasets with paired simulated datasets across different complex physical systems. We further define three tasks, which allow comparisons between real-world and simulated data, and facilitate the development of methods to bridge the two. Moreover, we design eight evaluation metrics, spanning data-oriented and physics-oriented metrics, and finally benchmark ten representative baselines, including state-of-the-art models, pretrained PDE foundation models, and a traditional method. Experiments reveal significant discrepancies between simulated and real-world data, while showing that pretraining with simulated data consistently improves both accuracy and convergence. In this work, we hope to provide insights from real-world data, advancing scientific ML toward bridging the sim-to-real gap and real-world deployment. Our benchmark, datasets, and instructions are available at https://realpdebench.github.io/.
MechAgents: Large language model multi-agent collaborations can solve mechanics problems, generate new data, and integrate knowledge
Solving mechanics problems using numerical methods requires comprehensive intelligent capability of retrieving relevant knowledge and theory, constructing and executing codes, analyzing the results, a task that has thus far mainly been reserved for humans. While emerging AI methods can provide effective approaches to solve end-to-end problems, for instance via the use of deep surrogate models or various data analytics strategies, they often lack physical intuition since knowledge is baked into the parametric complement through training, offering less flexibility when it comes to incorporating mathematical or physical insights. By leveraging diverse capabilities of multiple dynamically interacting large language models (LLMs), we can overcome the limitations of conventional approaches and develop a new class of physics-inspired generative machine learning platform, here referred to as MechAgents. A set of AI agents can solve mechanics tasks, here demonstrated for elasticity problems, via autonomous collaborations. A two-agent team can effectively write, execute and self-correct code, in order to apply finite element methods to solve classical elasticity problems in various flavors (different boundary conditions, domain geometries, meshes, small/finite deformation and linear/hyper-elastic constitutive laws, and others). For more complex tasks, we construct a larger group of agents with enhanced division of labor among planning, formulating, coding, executing and criticizing the process and results. The agents mutually correct each other to improve the overall team-work performance in understanding, formulating and validating the solution. Our framework shows the potential of synergizing the intelligence of language models, the reliability of physics-based modeling, and the dynamic collaborations among diverse agents, opening novel avenues for automation of solving engineering problems.
Neural Operator: Is data all you need to model the world? An insight into the impact of Physics Informed Machine Learning
Numerical approximations of partial differential equations (PDEs) are routinely employed to formulate the solution of physics, engineering and mathematical problems involving functions of several variables, such as the propagation of heat or sound, fluid flow, elasticity, electrostatics, electrodynamics, and more. While this has led to solving many complex phenomena, there are some limitations. Conventional approaches such as Finite Element Methods (FEMs) and Finite Differential Methods (FDMs) require considerable time and are computationally expensive. In contrast, data driven machine learning-based methods such as neural networks provide a faster, fairly accurate alternative, and have certain advantages such as discretization invariance and resolution invariance. This article aims to provide a comprehensive insight into how data-driven approaches can complement conventional techniques to solve engineering and physics problems, while also noting some of the major pitfalls of machine learning-based approaches. Furthermore, we highlight, a novel and fast machine learning-based approach (~1000x) to learning the solution operator of a PDE operator learning. We will note how these new computational approaches can bring immense advantages in tackling many problems in fundamental and applied physics.
Reasoning-Enhanced Large Language Models for Molecular Property Prediction
Molecular property prediction is crucial for drug discovery and materials science, yet existing approaches suffer from limited interpretability, poor cross-task generalization, and lack of chemical reasoning capabilities. Traditional machine learning models struggle with task transferability, while specialized molecular language models provide little insight into their decision-making processes. To address these limitations, we propose MPPReasoner, a multimodal large language model that incorporates chemical reasoning for molecular property prediction. Our approach, built upon Qwen2.5-VL-7B-Instruct, integrates molecular images with SMILES strings to enable comprehensive molecular understanding. We develop a two-stage training strategy: supervised fine-tuning (SFT) using 16,000 high-quality reasoning trajectories generated through expert knowledge and multiple teacher models, followed by Reinforcement Learning from Principle-Guided Rewards (RLPGR). RLPGR employs verifiable, rule-based rewards that systematically evaluate chemical principle application, molecular structure analysis, and logical consistency through computational verification. Extensive experiments across 8 datasets demonstrate significant performance improvements, with MPPReasoner outperforming the best baselines by 7.91\% and 4.53\% on in-distribution and out-of-distribution tasks respectively. MPPReasoner exhibits exceptional cross-task generalization and generates chemically sound reasoning paths that provide valuable insights into molecular property analysis, substantially enhancing both interpretability and practical utility for chemists. Code is available at https://anonymous.4open.science/r/MPPReasoner-12687.
A Graph Neural Network for the Era of Large Atomistic Models
Foundation models, or large atomistic models (LAMs), aim to universally represent the ground-state potential energy surface (PES) of atomistic systems as defined by density functional theory (DFT). The scaling law is pivotal in the development of large models, suggesting that their generalizability in downstream tasks consistently improves with increased model size, expanded training datasets, and larger computational budgets. In this study, we present DPA3, a multi-layer graph neural network founded on line graph series (LiGS), designed explicitly for the era of LAMs. We demonstrate that the generalization error of the DPA3 model adheres to the scaling law. The scalability in the number of model parameters is attained by stacking additional layers within DPA3. Additionally, the model employs a dataset encoding mechanism that decouples the scaling of training data size from the model size within its multi-task training framework. When trained as problem-oriented potential energy models, the DPA3 model exhibits superior accuracy in the majority of benchmark cases, encompassing systems with diverse features, including molecules, bulk materials, surface and cluster catalysts, two-dimensional materials, and battery materials. When trained as a LAM on the OpenLAM-v1 dataset, the DPA-3.1-3M model exhibits state-of-the-art performance in the LAMBench benchmark suite for LAMs, demonstrating lowest overall zero-shot generalization error across 17 downstream tasks from a broad spectrum of research domains. This performance suggests superior accuracy as an out-of-the-box potential model, requiring minimal fine-tuning data for downstream scientific applications.
E(3)-equivariant models cannot learn chirality: Field-based molecular generation
Obtaining the desired effect of drugs is highly dependent on their molecular geometries. Thus, the current prevailing paradigm focuses on 3D point-cloud atom representations, utilizing graph neural network (GNN) parametrizations, with rotational symmetries baked in via E(3) invariant layers. We prove that such models must necessarily disregard chirality, a geometric property of the molecules that cannot be superimposed on their mirror image by rotation and translation. Chirality plays a key role in determining drug safety and potency. To address this glaring issue, we introduce a novel field-based representation, proposing reference rotations that replace rotational symmetry constraints. The proposed model captures all molecular geometries including chirality, while still achieving highly competitive performance with E(3)-based methods across standard benchmarking metrics.
Energy-conserving equivariant GNN for elasticity of lattice architected metamaterials
Lattices are architected metamaterials whose properties strongly depend on their geometrical design. The analogy between lattices and graphs enables the use of graph neural networks (GNNs) as a faster surrogate model compared to traditional methods such as finite element modelling. In this work, we generate a big dataset of structure-property relationships for strut-based lattices. The dataset is made available to the community which can fuel the development of methods anchored in physical principles for the fitting of fourth-order tensors. In addition, we present a higher-order GNN model trained on this dataset. The key features of the model are (i) SE(3) equivariance, and (ii) consistency with the thermodynamic law of conservation of energy. We compare the model to non-equivariant models based on a number of error metrics and demonstrate its benefits in terms of predictive performance and reduced training requirements. Finally, we demonstrate an example application of the model to an architected material design task. The methods which we developed are applicable to fourth-order tensors beyond elasticity such as piezo-optical tensor etc.
Adapting Quantum Machine Learning for Energy Dissociation of Bonds
Accurate prediction of bond dissociation energies (BDEs) underpins mechanistic insight and the rational design of molecules and materials. We present a systematic, reproducible benchmark comparing quantum and classical machine learning models for BDE prediction using a chemically curated feature set encompassing atomic properties (atomic numbers, hybridization), bond characteristics (bond order, type), and local environmental descriptors. Our quantum framework, implemented in Qiskit Aer on six qubits, employs ZZFeatureMap encodings with variational ansatz (RealAmplitudes) across multiple architectures Variational Quantum Regressors (VQR), Quantum Support Vector Regressors (QSVR), Quantum Neural Networks (QNN), Quantum Convolutional Neural Networks (QCNN), and Quantum Random Forests (QRF). These are rigorously benchmarked against strong classical baselines, including Support Vector Regression (SVR), Random Forests (RF), and Multi-Layer Perceptrons (MLP). Comprehensive evaluation spanning absolute and relative error metrics, threshold accuracies, and error distributions shows that top-performing quantum models (QCNN, QRF) match the predictive accuracy and robustness of classical ensembles and deep networks, particularly within the chemically prevalent mid-range BDE regime. These findings establish a transparent baseline for quantum-enhanced molecular property prediction and outline a practical foundation for advancing quantum computational chemistry toward near chemical accuracy.
FABind: Fast and Accurate Protein-Ligand Binding
Modeling the interaction between proteins and ligands and accurately predicting their binding structures is a critical yet challenging task in drug discovery. Recent advancements in deep learning have shown promise in addressing this challenge, with sampling-based and regression-based methods emerging as two prominent approaches. However, these methods have notable limitations. Sampling-based methods often suffer from low efficiency due to the need for generating multiple candidate structures for selection. On the other hand, regression-based methods offer fast predictions but may experience decreased accuracy. Additionally, the variation in protein sizes often requires external modules for selecting suitable binding pockets, further impacting efficiency. In this work, we propose FABind, an end-to-end model that combines pocket prediction and docking to achieve accurate and fast protein-ligand binding. FABind incorporates a unique ligand-informed pocket prediction module, which is also leveraged for docking pose estimation. The model further enhances the docking process by incrementally integrating the predicted pocket to optimize protein-ligand binding, reducing discrepancies between training and inference. Through extensive experiments on benchmark datasets, our proposed FABind demonstrates strong advantages in terms of effectiveness and efficiency compared to existing methods. Our code is available at https://github.com/QizhiPei/FABind
MLCopilot: Unleashing the Power of Large Language Models in Solving Machine Learning Tasks
The field of machine learning (ML) has gained widespread adoption, leading to a significant demand for adapting ML to specific scenarios, which is yet expensive and non-trivial. The predominant approaches towards the automation of solving ML tasks (e.g., AutoML) are often time consuming and hard to understand for human developers. In contrast, though human engineers have the incredible ability to understand tasks and reason about solutions, their experience and knowledge are often sparse and difficult to utilize by quantitative approaches. In this paper, we aim to bridge the gap between machine intelligence and human knowledge by introducing a novel framework MLCopilot, which leverages the state-of-the-art LLMs to develop ML solutions for novel tasks. We showcase the possibility of extending the capability of LLMs to comprehend structured inputs and perform thorough reasoning for solving novel ML tasks. And we find that, after some dedicated design, the LLM can (i) observe from the existing experiences of ML tasks and (ii) reason effectively to deliver promising results for new tasks. The solution generated can be used directly to achieve high levels of competitiveness.
GaussianProperty: Integrating Physical Properties to 3D Gaussians with LMMs
Estimating physical properties for visual data is a crucial task in computer vision, graphics, and robotics, underpinning applications such as augmented reality, physical simulation, and robotic grasping. However, this area remains under-explored due to the inherent ambiguities in physical property estimation. To address these challenges, we introduce GaussianProperty, a training-free framework that assigns physical properties of materials to 3D Gaussians. Specifically, we integrate the segmentation capability of SAM with the recognition capability of GPT-4V(ision) to formulate a global-local physical property reasoning module for 2D images. Then we project the physical properties from multi-view 2D images to 3D Gaussians using a voting strategy. We demonstrate that 3D Gaussians with physical property annotations enable applications in physics-based dynamic simulation and robotic grasping. For physics-based dynamic simulation, we leverage the Material Point Method (MPM) for realistic dynamic simulation. For robot grasping, we develop a grasping force prediction strategy that estimates a safe force range required for object grasping based on the estimated physical properties. Extensive experiments on material segmentation, physics-based dynamic simulation, and robotic grasping validate the effectiveness of our proposed method, highlighting its crucial role in understanding physical properties from visual data. Online demo, code, more cases and annotated datasets are available on https://Gaussian-Property.github.io{this https URL}.
Seeing the Wind from a Falling Leaf
A longstanding goal in computer vision is to model motions from videos, while the representations behind motions, i.e. the invisible physical interactions that cause objects to deform and move, remain largely unexplored. In this paper, we study how to recover the invisible forces from visual observations, e.g., estimating the wind field by observing a leaf falling to the ground. Our key innovation is an end-to-end differentiable inverse graphics framework, which jointly models object geometry, physical properties, and interactions directly from videos. Through backpropagation, our approach enables the recovery of force representations from object motions. We validate our method on both synthetic and real-world scenarios, and the results demonstrate its ability to infer plausible force fields from videos. Furthermore, we show the potential applications of our approach, including physics-based video generation and editing. We hope our approach sheds light on understanding and modeling the physical process behind pixels, bridging the gap between vision and physics. Please check more video results in our https://chaoren2357.github.io/seeingthewind/{project page}.
