AtomGen Documentation#

Welcome to the documentation for AtomGen, a toolkit for atomistic machine learning and generative modeling. AtomGen provides researchers and developers with tools to explore, experiment, and innovate in the realm of molecular and materials science using state-of-the-art deep learning techniques.

Overview#

AtomGen offers a comprehensive framework for handling atomistic datasets, training various models, and experimenting with different pre-training and fine-tuning tasks. It streamlines the process of working with diverse molecular and materials datasets, enabling large-scale pre-training and task-specific fine-tuning on atomistic data.

Key Features#

  • Data Handling: Efficient processing and loading of large-scale atomistic datasets.

  • Model Architectures: Implementation of advanced models like AtomFormer, designed for molecular representation learning.

  • Pre-training: Support for various pre-training tasks such as Structure to Energy and Forces (S2EF) prediction.

  • Fine-tuning: Easy adaptation of pre-trained models to downstream tasks using ATOM3D benchmarks.

  • Scalability: Designed for distributed training on multiple GPUs.

Datasets#

AtomGen supports a variety of datasets, including:

  • S2EF-15M: A large-scale dataset aggregated from multiple sources (OC20, OC22, ODAC23, MPtrj, SPICE) for pre-training.

  • ATOM3D Benchmarks: Task-specific datasets for molecular property prediction, including:

    • SMP (Small Molecule Properties)

    • PPI (Protein-Protein Interfaces)

    • RES (Residue Identity)

    • MSP (Mutation Stability Prediction)

    • LBA (Ligand Binding Affinity)

    • LEP (Ligand Efficacy Prediction)

    • PSR (Protein Structure Ranking)

    • RSR (RNA Structure Ranking)

Models#

The implemented model architectures in AtomGen is:

  • AtomFormer: A transformer encoder model adapted for atomistic data, leveraging 3D spatial information.

  • SchNet: A continuous-filter convolutional neural network for modeling quantum interactions.

  • TokenGT: Tokenized graph transformer that treats all nodes and edges as independent tokens.

The pre-trained models are based on AtomFormer, which can be fine-tuned on ATOM3D benchmarks for specific molecular property predictions.

Tasks#

AtomGen facilitates various tasks in molecular machine learning:

  • Structure to Energy & Forces (S2EF): Predicting energies and forces for atomistic systems.

  • Masked Atom Modeling (MAM): Self-supervised learning by masking and predicting atom properties.

  • Coordinate Denoising: Improving structural predictions by denoising perturbed coordinates.

  • Downstream Tasks: Fine-tuning on ATOM3D benchmarks for specific molecular property predictions.

Getting Started#

To get started with AtomGen, check out our User Guide for installation instructions, basic usage examples, and more detailed information on training and inference.

For a deep dive into the API, explore the reference documentation for the data and models modules.

AtomGen is designed to be user-friendly while providing powerful capabilities for atomistic machine learning. Whether you’re conducting research, developing new models, or applying machine learning to molecular systems, AtomGen provides a versatile toolkit to support your work.