openfold / openfold2

Model Overview

Description:

OpenFold2 is a protein structure prediction model from the OpenFold Consortium and the Alquraishi Laboratory. OpenFold2 is a pytorch re-implementation of Google Deepmind's AlphaFold2, with support for both training and inference. OpenFold2 demonstrates accuracy parity with AlphaFold2, and improved speed. For more information, please visit the OpenFold repository see the OpenFold repository https://github.com/aqlaboratory/openfold.

This model is available for commercial use.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case.

License/Terms of Use:

You are responsible for ensuring that your use of NVIDIA AI Foundation Models
complies with all applicable laws.

Deployment Geography:

Global

Use Case

The OpenFold2 NIM can be used at academic and pharmaceutical industry research labs. The structure prediction functionality supports computer-aided drug design.

Release Date

References:

@article {Ahdritz2022.11.20.517210,
	author = {Ahdritz, Gustaf and Bouatta, Nazim and Floristean, Christina and Kadyan, Sachin and Xia, Qinghui and Gerecke, William and O{\textquoteright}Donnell, Timothy J and Berenberg, Daniel and Fisk, Ian and Zanichelli, Niccolò and Zhang, Bo and Nowaczynski, Arkadiusz and Wang, Bei and Stepniewska-Dziubinska, Marta M and Zhang, Shang and Ojewole, Adegoke and Guney, Murat Efe and Biderman, Stella and Watkins, Andrew M and Ra, Stephen and Lorenzo, Pablo Ribalta and Nivon, Lucas and Weitzner, Brian and Ban, Yih-En Andrew and Sorger, Peter K and Mostaque, Emad and Zhang, Zhao and Bonneau, Richard and AlQuraishi, Mohammed},
	title = {{O}pen{F}old: {R}etraining {A}lpha{F}old2 yields new insights into its learning mechanisms and capacity for generalization},
	elocation-id = {2022.11.20.517210},
	year = {2022},
	doi = {10.1101/2022.11.20.517210},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/10.1101/2022.11.20.517210},
	eprint = {https://www.biorxiv.org/content/early/2022/11/22/2022.11.20.517210.full.pdf},
	journal = {bioRxiv}
}
@ARTICLE{jumper2021alphafold,
    title    = "Highly accurate protein structure prediction with {AlphaFold}",
    author   = "Jumper, John and Evans, Richard and Pritzel, Alexander and Green,
                Tim and Figurnov, Michael and Ronneberger, Olaf and
                Tunyasuvunakool, Kathryn and Bates, Russ and {\v Z}{\'\i}dek,
                Augustin and Potapenko, Anna and Bridgland, Alex and Meyer,
                Clemens and Kohl, Simon A A and Ballard, Andrew J and Cowie,
                Andrew and Romera-Paredes, Bernardino and Nikolov, Stanislav and
                Jain, Rishub and Adler, Jonas and Back, Trevor and Petersen, Stig
                and Reiman, David and Clancy, Ellen and Zielinski, Michal and
                Steinegger, Martin and Pacholska, Michalina and Berghammer, Tamas
                and Bodenstein, Sebastian and Silver, David and Vinyals, Oriol
                and Senior, Andrew W and Kavukcuoglu, Koray and Kohli, Pushmeet
                and Hassabis, Demis",
    journal  = "Nature",
    volume   =  596,
    number   =  7873,
    pages    = "583--589",
    month    =  aug,
    year     =  2021,
    language = "en",
    doi = {10.1038/s41586-021-03819-2},
}

Model Architecture:

Architecture Type: Protein Structure Prediction

Network Architecture: AlphaFold2

Input:

Input Type(s): Protein Sequence, Multiple Sequence Alignments, Templates

Input Format(s): String (less than or equal to 1000), a3m-format strings, hhr-format strings

Input Parameters: One-Dimensional (1D), One-Dimensional (1D), One-Dimensional (1D)

Other Properties Related to Input: a3m is a standard file format for storing multiple sequence alignment results. hhr is the file format output by the tool hh-search. For more informaiton see hh-suite

Output:

Output Type(s): Protein Structure(s) in PDB Format

Output Format: PDB (text file)

Output Parameters: 1D

Other Properties Related to Output: Pose (numatm x 3)

Software Integration:

Runtime Engine(s):

  • PyTorch

Supported Hardware Microarchitecture Compatibility:

  • NVIDIA Hopper
  • NVIDIA Ampere
  • NVIDIA Lovelace

[Preferred/Supported] Operating System(s):

  • Linux

Model Version(s):

  • AlphaFold weights 2.3.2
  • OpenFold 2.1.0 (pl_upgrade)

Training & Evaluation:

Training Dataset:

Link: Highly Accurate ... Data Availability

The model parameter sets were trained by Google Deepmind as part of AlphaFold2 development. A description of the training dataset and relevant download links are available at Highly Accurate ... Data Availability. This data was not collected by NVIDIA.

** Data Collection Method by dataset

** Labeling Method by dataset

Properties (Quantity, Dataset Descriptions, Sensor(s)): Uniclust dataset of 355,993 sequences with the full MSAs. These predictions were then used to train a final model with identical hyperparameters, except for sampling examples 75% of the time from the Uniclust prediction set, with sub-sampled MSAs, and 25% of the time from the clustered PDB set.

Evaluation Dataset:

Link: See the description at Highly Accurate...
Sec10
.

** Data Collection Method by dataset

** Labeling Method by dataset

Properties (Quantity, Dataset Descriptions, Sensor(s)): Uniclust dataset of 355,993 sequences with the full MSAs. These predictions were then used to train a final model with identical hyperparameters, except for sampling examples 75% of the time from the Uniclust prediction set, with sub-sampled MSAs, and 25% of the time from the clustered PDB set.

Inference:

Engine: PyTorch

Test Hardware:

  • NVIDIA H100
  • NVIDIA A100

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have
established policies and practices to enable development for a wide array of AI
applications. When downloaded or used in accordance with our terms of service,
developers should work with their supporting model team to ensure this model
meets requirements for the relevant industry and use case and addresses
unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns
here.