openfold / openfold3

Model Overview

Description:

OpenFold3 is a biomolecular complex structure prediction model from the OpenFold Consortium and the Alquraishi Laboratory. OpenFold3 is a pytorch re-implementation of Google Deepmind's AlphaFold3, with support for both training and inference. See the github repo https://github.com/aqlaboratory/openfold3.

This model is available for commercial use.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case.

License/Terms of Use:

GOVERNING TERMS: The trial service is governed by the NVIDIA API Trial Terms of Service. Use of this model is governed by the NVIDIA Open Model License. ADDITIONAL INFORMATION: Apache 2.0 License.

You are responsible for ensuring that your use of NVIDIA provided models complies with all applicable laws.

Deployment Geography:

Global

Use Case

The OpenFold3 NIM can be used at academic and pharmaceutical industry research labs. The structure prediction functionality supports computer-aided drug design.

Release Date

Build.Nvidia.com: 10/28/2025 at build.nvidia.com/openfold/openfold3
NGC: 10/28/2025 at catalog.ngc.nvidia.com/orgs/nim/teams/openfold/containers/openfold3

References:

@article{Abramson2024,
  author  = {Abramson, Josh and Adler, Jonas and Dunger, Jack and Evans, Richard and Green, Tim and Pritzel, Alexander and Ronneberger, Olaf and Willmore, Lindsay and Ballard, Andrew J. and Bambrick, Joshua and Bodenstein, Sebastian W. and Evans, David A. and Hung, Chia-Chun and O’Neill, Michael and Reiman, David and Tunyasuvunakool, Kathryn and Wu, Zachary and Žemgulytė, Akvilė and Arvaniti, Eirini and Beattie, Charles and Bertolli, Ottavia and Bridgland, Alex and Cherepanov, Alexey and Congreve, Miles and Cowen-Rivers, Alexander I. and Cowie, Andrew and Figurnov, Michael and Fuchs, Fabian B. and Gladman, Hannah and Jain, Rishub and Khan, Yousuf A. and Low, Caroline M. R. and Perlin, Kuba and Potapenko, Anna and Savy, Pascal and Singh, Sukhdeep and Stecula, Adrian and Thillaisundaram, Ashok and Tong, Catherine and Yakneen, Sergei and Zhong, Ellen D. and Zielinski, Michal and Žídek, Augustin and Bapst, Victor and Kohli, Pushmeet and Jaderberg, Max and Hassabis, Demis and Jumper, John M.},
  journal = {Nature},
  title   = {Accurate structure prediction of biomolecular interactions with AlphaFold 3},
  year    = {2024},
  volume  = {630},
  number  = {8016},
  pages   = {493–-500},
  doi     = {10.1038/s41586-024-07487-w}
}

Model Architecture:

Architecture Type: Protein Structure Prediction

Network Architecture: AlphaFold3

This model was developed based on AlphaFold3

Number of model parameters: 3.68×10⁸

Input:

Input Type(s): Protein Sequence, Multiple Sequence Alignments; DNA Sequence; RNA Sequence; Ligand CCD code; Ligand SMILES code

Input Format(s): String (less than or equal to 1000), a3m-format strings, csv-format string, string

Input Parameters: One-Dimensional (1D), One-Dimensional (1D), One-Dimensional (1D); One-Dimensional (1D); One-Dimensional (1D);One-Dimensional (1D); One-Dimensional (1D)

Other Properties Related to Input: a3m is a standard file format for storing multiple sequence alignment results. a3m-format strings, csv-format string is a standard format for atomic structures

Output:

Output Type(s): Biomolecular Complex Structure(s) in mmCIF format

Output Format: mmCIF (text)

Output Parameters: 1D

Other Properties Related to Output: Pose (numatm x 3)

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA’s hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration:

Runtime Engine(s):

TRT
PyTorch

Supported Hardware Microarchitecture Compatibility:

NVIDIA Ampere
NVIDIA Hopper

[Preferred/Supported] Operating System(s):

Linux

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.

Model Version(s):

OpenFold Consortium trained weights
OpenFold3 version 1.0.0

Training & Evaluation:

Training Dataset:

Link: Accurate structure prediction of biomolecular interactions with AlphaFold 3

The model weights trained by the OpenFold Consortium followed the procedure in Accurate structure prediction of biomolecular interactions with AlphaFold 3. This data was not collected by NVIDIA.

** Data Collection Method by dataset

Hybrid: Automatic/Sensors, Human
See the description at Accurate structure prediction of biomolecular interactions with AlphaFold 3.

** Labeling Method by dataset

Hybrid: Automatic/Sensors, Human
See the description at Accurate structure prediction of biomolecular interactions with AlphaFold 3.

Properties (Quantity, Dataset Descriptions, Sensor(s)):

During training, each learning example is composed of the input and the target, where the target is a crop (a piece) of the 3D structure of a biomolecular complex. The biomolecular complex structure is either (a) experimentally determined, or (b) model-predicted. The input is composed of the crop-restricted portions of (i) ligand identifiers and protein, DNA, and RNA sequences, (ii) protein and RNA multiple sequence alignments, and (iii) protein structural templates.

The experimental complex structures are sourced from the PDB, and are organized into a

Weighted PDB Dataset (~400k unique structures)

The predicted complex structures are organized into

Protein Monomer Distillation Dataset
- OF2 code, AF2 weights (checkpoint)
Short Monomer Distillation Dataset
- OF2 code, AF2 weights (checkpoint)
Disordered Protein PDB Distillation Dataset
- AF2 Multimer code, AF2 Multimer weights (checkpoint)

In total, these datasets are composed of ~13 million complexes. Throughout the training process a total of ~20 million sample complex crops are drawn from these datasets, using probability weights described at Accurate structure prediction of biomolecular interactions with AlphaFold 3, Supplementary Information.

All of the experimental complex structures in the Weighted PDB Dataset were deposited before 2021-09-30.

For details on the computation of protein and RNA multiple sequence alignments, and protein structure templates, these methods follow Accurate structure prediction of biomolecular interactions with AlphaFold 3, Supplementary Information

Evaluation Dataset:

Link: See the description at Accurate structure prediction of biomolecular interactions with AlphaFold 3.

** Data Collection Method by dataset

Hybrid: Automatic/Sensors, Human
See the description at Accurate structure prediction of biomolecular interactions with AlphaFold 3.

** Labeling Method by dataset

Hybrid: Automatic/Sensors, Human
See the description at Accurate structure prediction of biomolecular interactions with AlphaFold 3.

Properties (Quantity, Dataset Descriptions, Sensor(s)):

For evaluation (post-validation benchmarks), each learning example has an input and target, similar to the Training Dataset, but not restricted to a crop. Every complex in the Evaluation Dataset has structure determined after 2021-09-30.

Inference:

Engine: TRT, PyTorch

Test Hardware:

NVIDIA H200
NVIDIA H100
NVIDIA A100

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have
established policies and practices to enable development for a wide array of AI
applications. When downloaded or used in accordance with our terms of service,
developers should work with their supporting model team to ensure this model
meets requirements for the relevant industry and use case and addresses
unforeseen product misuse.

Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns
here.