Model Overview
Description:
OpenFold2 is a protein structure prediction model from the OpenFold Consortium and the Alquraishi Laboratory.  OpenFold2 is a pytorch re-implementation of Google Deepmind's AlphaFold2, with support for both training and inference.  OpenFold2 demonstrates accuracy parity with AlphaFold2, and improved speed. For more information, please visit the OpenFold repository see the OpenFold repository https://github.com/aqlaboratory/openfold.
This model is available for commercial use.
Third-Party Community Consideration
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case.
License/Terms of Use:
GOVERNING TERMS: The trial service is governed by the NVIDIA API Trial Terms of Service; and the use of this model is governed by the NVIDIA Community Model License. ADDITIONAL INFORMATION: Apache 2.0 License.
You are responsible for ensuring that your use of NVIDIA AI Foundation Models
complies with all applicable laws.
Deployment Geography:
Global
Use Case
The OpenFold2 NIM can be used at academic and pharmaceutical industry research labs. The structure prediction functionality supports computer-aided drug design.
Release Date
- Build.Nvidia.com: 3/18/2025 at build.nvidia.com/openfold/openfold2
- NGC: 3/18/2025
References:
@article {Ahdritz2022.11.20.517210,
	author = {Ahdritz, Gustaf and Bouatta, Nazim and Floristean, Christina and Kadyan, Sachin and Xia, Qinghui and Gerecke, William and O{\textquoteright}Donnell, Timothy J and Berenberg, Daniel and Fisk, Ian and Zanichelli, Niccolò and Zhang, Bo and Nowaczynski, Arkadiusz and Wang, Bei and Stepniewska-Dziubinska, Marta M and Zhang, Shang and Ojewole, Adegoke and Guney, Murat Efe and Biderman, Stella and Watkins, Andrew M and Ra, Stephen and Lorenzo, Pablo Ribalta and Nivon, Lucas and Weitzner, Brian and Ban, Yih-En Andrew and Sorger, Peter K and Mostaque, Emad and Zhang, Zhao and Bonneau, Richard and AlQuraishi, Mohammed},
	title = {{O}pen{F}old: {R}etraining {A}lpha{F}old2 yields new insights into its learning mechanisms and capacity for generalization},
	elocation-id = {2022.11.20.517210},
	year = {2022},
	doi = {10.1101/2022.11.20.517210},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/10.1101/2022.11.20.517210},
	eprint = {https://www.biorxiv.org/content/early/2022/11/22/2022.11.20.517210.full.pdf},
	journal = {bioRxiv}
}
@ARTICLE{jumper2021alphafold,
    title    = "Highly accurate protein structure prediction with {AlphaFold}",
    author   = "Jumper, John and Evans, Richard and Pritzel, Alexander and Green,
                Tim and Figurnov, Michael and Ronneberger, Olaf and
                Tunyasuvunakool, Kathryn and Bates, Russ and {\v Z}{\'\i}dek,
                Augustin and Potapenko, Anna and Bridgland, Alex and Meyer,
                Clemens and Kohl, Simon A A and Ballard, Andrew J and Cowie,
                Andrew and Romera-Paredes, Bernardino and Nikolov, Stanislav and
                Jain, Rishub and Adler, Jonas and Back, Trevor and Petersen, Stig
                and Reiman, David and Clancy, Ellen and Zielinski, Michal and
                Steinegger, Martin and Pacholska, Michalina and Berghammer, Tamas
                and Bodenstein, Sebastian and Silver, David and Vinyals, Oriol
                and Senior, Andrew W and Kavukcuoglu, Koray and Kohli, Pushmeet
                and Hassabis, Demis",
    journal  = "Nature",
    volume   =  596,
    number   =  7873,
    pages    = "583--589",
    month    =  aug,
    year     =  2021,
    language = "en",
    doi = {10.1038/s41586-021-03819-2},
}
Model Architecture:
Architecture Type: Protein Structure Prediction 
Network Architecture: AlphaFold2 
Input:
Input Type(s): Protein Sequence, Multiple Sequence Alignments, Templates 
Input Format(s): String (less than or equal to 1000), a3m-format strings, hhr-format strings 
Input Parameters: One-Dimensional (1D), One-Dimensional (1D), One-Dimensional (1D)
Other Properties Related to Input: a3m is a standard file format for storing multiple sequence alignment results.  hhr is the file format output by the tool hh-search.  For more informaiton see hh-suite  
Output:
Output Type(s): Protein Structure(s) in PDB Format 
Output Format: PDB (text file)
Output Parameters: 1D 
Other Properties Related to Output: Pose (numatm x 3)
Software Integration:
Runtime Engine(s):
- PyTorch 
Supported Hardware Microarchitecture Compatibility: 
- NVIDIA Hopper
- NVIDIA Ampere 
- NVIDIA Lovelace 
[Preferred/Supported] Operating System(s): 
- Linux 
Model Version(s):
- AlphaFold weights 2.3.2
- OpenFold 2.1.0 (pl_upgrade)
Training & Evaluation:
Training Dataset:
Link: Highly Accurate ... Data Availability
The model parameter sets were trained by Google Deepmind as part of AlphaFold2 development.  A description of the training dataset and relevant download links are available at Highly Accurate ... Data Availability. This data was not collected by NVIDIA. 
** Data Collection Method by dataset 
- Hybrid: Automatic/Sensors, Human
- See the description at Highly Accurate ... Data Availability. 
** Labeling Method by dataset 
- Hybrid: Automatic/Sensors, Human
- See the description at Highly Accurate ... Data Availability. 
Properties (Quantity, Dataset Descriptions, Sensor(s)): Uniclust dataset of 355,993 sequences with the full MSAs. These predictions were then used to train a final model with identical hyperparameters, except for sampling examples 75% of the time from the Uniclust prediction set, with sub-sampled MSAs, and 25% of the time from the clustered PDB set.
Evaluation Dataset:
Link: See the description at Highly Accurate...
Sec10.  
** Data Collection Method by dataset 
- Hybrid: Automatic/Sensors, Human
- See the description at Highly Accurate ... Data Availability. 
** Labeling Method by dataset 
- Hybrid: Automatic/Sensors, Human
- See the description at Highly Accurate ... Data Availability. 
Properties (Quantity, Dataset Descriptions, Sensor(s)): Uniclust dataset of 355,993 sequences with the full MSAs. These predictions were then used to train a final model with identical hyperparameters, except for sampling examples 75% of the time from the Uniclust prediction set, with sub-sampled MSAs, and 25% of the time from the clustered PDB set.
Inference:
Engine: PyTorch 
Test Hardware: 
- NVIDIA H100  
- NVIDIA A100  
Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have
established policies and practices to enable development for a wide array of AI
applications.  When downloaded or used in accordance with our terms of service,
developers should work with their supporting model team to ensure this model
meets requirements for the relevant industry and use case and addresses
unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns
here.

