Model Overview
Description:
ESMFold is a protein structure prediction deep learning model developed by Facebook AI Research (FAIR) {cite:p}lin2023esmfold
. The model was inspired by AlphaFold, but does not require multiple sequence alignment (MSA) as an input, leading to significantly faster inference times for protein structure prediction that is nearly as accurate as alignment-based methods.
Third-Party Community Consideration [(Optional: Only for Community Models)]
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA Model Card.
References:
@ARTICLE{lin2023esmfold,
title = "Evolutionary-scale prediction of atomic-level protein structure
with a language model",
author = "Lin, Zeming and Akin, Halil and Rao, Roshan and Hie, Brian and
Zhu, Zhongkai and Lu, Wenting and Smetanin, Nikita and Verkuil,
Robert and Kabeli, Ori and Shmueli, Yaniv and Dos Santos Costa,
Allan and Fazel-Zarandi, Maryam and Sercu, Tom and Candido,
Salvatore and Rives, Alexander",
journal = "Science",
volume = 379,
number = 6637,
pages = "1123--1130",
month = mar,
year = 2023,
language = "en",
doi = {10.1101/2022.07.20.500902}
}
Model Architecture:
Architecture Type: Pose Estimation
Network Architecture: ESMFold
Input: (Enter "None" As Needed)
Input Type(s): Protein Sequence
Input Format(s): String
Input Parameters: 1D
Other Properties Related to Input: Protein Sequence matching the regular expression ^[ARNDCQEGHILKMFPSTWYVXBOU]*$
upto 1024 characters
Output:
Output Type(s): Protrin Structure Pose(s)
Output Format: PDB (text file)
Output Parameters: 1D
Other Properties Related to Output: Pose
Software Integration:
Runtime Engine(s):
- [Not Applicable (N/A)- Name Platform If Multiple]
Supported Hardware Microarchitecture Compatibility:
- [Ampere]
- [L40]
[Preferred/Supported] Operating System(s):
- [Linux]
Model Version(s): ESMFold
Training & Evaluation:
Training Dataset:
Link:
UniRef50
** Data Collection Method by dataset
- [Not Applicable]
** Labeling Method by dataset
- [Not Applicable]
Properties (Quantity, Dataset Descriptions, Sensor(s)): UniRef50, September 2021 version, is used for the training of ESM models. The training dataset was partitioned by randomly selecting 0.5% (≈ 250,000) sequences to form the validation set. The training set has sequences removed via the procedure described
Dataset License(s): CC BY 4.0.
Evaluation Dataset:
UniRef50
** Data Collection Method by dataset
- [Not Applicable]
** Labeling Method by dataset
- [Not Applicable]
Properties (Quantity, Dataset Descriptions, Sensor(s)): UniRef50, September 2021 version, is used for the training of ESM models. The training dataset was partitioned by randomly selecting 0.5% (≈ 250,000) sequences to form the validation set. The training set has sequences removed via the procedure described
Dataset License(s): CC BY 4.0.
Inference:
Engine: Triton
Test Hardware:
- [Other (Not Listed)]
Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards [Insert Link to Model Card++ here]. Please report security vulnerabilities or NVIDIA AI Concerns here.
**If anything is meant for internal-purposes only (including this statement and pre-filled content recommendations, please alert Trustworthy AI Product Manager or designee before publishing)