Model Overview
Description:
RFdiffusion (RoseTTAFold Diffusion) is a generative model that can be used for
protein scaffolding and protein binder design tasks.
Third-Party Community Consideration
This model is not owned or developed by NVIDIA. This model has been developed
and built to a third-party’s requirements for this application and use case; see
link to Non-NVIDIA Model Card.
References:
@ARTICLE{nat2023rfdiffusion,
title = "De novo design of protein structure and function with RFdiffusion",
author = "Watson, Joseph L. and Juergens, David and Bennett, Nathaniel R.
and Trippe, Brian L. and Yim, Jason and Eisenach, Helen E. and Ahern, Woody
and Borst, Andrew J. and Ragotte, Robert J. and Milles, Lukas F. and Wicky,
Basile I. M. and Hanikel, Nikita and Pellock, Samuel J. and Courbet, Alexis
and Sheffler, William and Wang, Jue and Venkatesh, Preetham and Sappington,
Isaac and Torres, Susana Vázquez and Lauko, Anna and De Bortoli, Valentin
and Mathieu, Emile and Ovchinnikov, Sergey and Barzilay, Regina and
Jaakkola, Tommi S. and DiMaio, Frank and Baek, Minkyung and Baker, David",
journal = "Nature",
volume = 620,
number = 7976,
pages = "1089--1100",
month = aug,
year = 2023,
language = "en",
doi = {10.1038/s41586-023-06415-8}
}
Model Architecture:
Architecture Type: Protein Structure Generation
Network Architecture: RFdiffusion
Input:
Input Type(s): Protein in PDB format
Input Format(s): String
Input Parameters: 1D
Other Properties Related to Input:
Output:
Output Type(s): Protein Structure in PDB format
Output Format: PDB (text file)
Output Parameters: 1D
Other Properties Related to Output:
Software Integration:
Runtime Engine(s):
- [Not Applicable (N/A)- Name Platform If Multiple]
Supported Hardware Microarchitecture Compatibility:
- [Turing]
- [Ampere]
- [L40]
[Preferred/Supported] Operating System(s):
- [Linux]
Model Version(s): RFdiffusion
Training & Evaluation:
Training Dataset:
Link:
The Protein Data Bank
** Data Collection Method by dataset
- [Not Applicable]
** Labeling Method by dataset
- [Not Applicable]
Properties (Quantity, Dataset Descriptions, Sensor(s)): The training dataset
used for RFdiffusion, as detailed in the paper, consists of structures sampled
from the Protein Data Bank (PDB). To prepare these structures for training, a
noising process is applied. This process involves simulating up to 200 steps of
random modifications on the protein structures. Specifically, the modifications
include perturbing the Cα coordinates with 3D Gaussian noise and applying
Brownian motion to the residue orientations on the manifold of rotation
matrices.
Dataset License(s): CC0 1.0.
Evaluation Dataset:
The evaluation strategy involved training the model on PDB structures (as
described in Training Dataset) with added noise and then assessing its ability
to denoise these structures, as well as evaluating its performance on design
tasks with auxiliary conditioning information.
Inference:
Engine: Triton
Test Hardware:
- [Other (Not Listed)]
Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have
established policies and practices to enable development for a wide array of AI
applications. When downloaded or used in accordance with our terms of service,
developers should work with their supporting model team to ensure this model
meets requirements for the relevant industry and use case and addresses
unforeseen product misuse. For more detailed information on ethical
considerations for this model, please see the Model Card++ Explainability, Bias,
Safety & Security, and Privacy Subcards [Insert Link to Model Card++ here].
Please report security vulnerabilities or NVIDIA AI Concerns
here.