Model Overview
Description:
The MSA search NIM is powered by GPU MMSeqs2. GPU MMSeqs2 is a GPU-accelerated toolkit for protein database search and Multiple Sequence Alignment (MSA). While not a deep learning model, MMSeqs2 does require large protein databases for sequence similarity search.
This NIM is ready for commercial use.
Third-Party Community Consideration
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case. ColabFold was developed by the authors of Mirdita et al. 2022. GPU MMSeqs2 was developed by the authors of Kallenborn et al. 2025.
License / Terms of Use
API Catalog |
---|
GOVERNING TERMS: The trial service is governed by the NVIDIA API Trial Terms of Service. |
You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.
Deployment Geography
Global
Use Case
The MSA Search NIM enables researchers and commercial entities in the Drug Discovery, Life Sciences, and Digital Biology fields to rapidly generate multiple sequence alignments (MSA). The output MSA can be used in downstream protein structure prediction and evolutionary analysis applications.
Release Date
Build.nvidia.com March 16, 2025 via build.nvidia.com/colabfold/msa-search
NGC March 16, 2025
References:
@ARTICLE{jumper2021alphafold,
title = "Highly accurate protein structure prediction with {AlphaFold}",
author = "Jumper, John and Evans, Richard and Pritzel, Alexander and Green,
Tim and Figurnov, Michael and Ronneberger, Olaf and
Tunyasuvunakool, Kathryn and Bates, Russ and {\v Z}{\'\i}dek,
Augustin and Potapenko, Anna and Bridgland, Alex and Meyer,
Clemens and Kohl, Simon A A and Ballard, Andrew J and Cowie,
Andrew and Romera-Paredes, Bernardino and Nikolov, Stanislav and
Jain, Rishub and Adler, Jonas and Back, Trevor and Petersen, Stig
and Reiman, David and Clancy, Ellen and Zielinski, Michal and
Steinegger, Martin and Pacholska, Michalina and Berghammer, Tamas
and Bodenstein, Sebastian and Silver, David and Vinyals, Oriol
and Senior, Andrew W and Kavukcuoglu, Koray and Kohli, Pushmeet
and Hassabis, Demis",
journal = "Nature",
volume = 596,
number = 7873,
pages = "583--589",
month = aug,
year = 2021,
language = "en",
doi = {10.1038/s41586-021-03819-2},
}
@ARTICLE{mirdita2022colabfold,
title = "ColabFold: making protein folding accessible to all",
author = "Mirdita, Milot and Sch{\"u}tze, Konstantin and Moriwaki, Yoshitaka and Heo, Lim and Ovchinnikov, Sergey and Steinegger, Martin",
journal = "Nature Methods",
volume = 19,
number = 6,
pages = "679--682",
month = jun,
year = 2022,
language = "en",
doi = {10.1038/s41592-022-01488-1},
}
@ARTICLE{kallenborn2025gpu,
title = "GPU-accelerated homology search with MMseqs2",
author = "Kallenborn, Felix and Chacon, Alejandro and Hundt, Christian and Sirelkhatim, Hassan and Didi, Kieran and Cha, Sooyoung and Dallago, Christian and Mirdita, Milot and Schmidt, Bertil and Steinegger, Martin",
journal = "bioRxiv",
year = 2025,
month = jan,
day = 20,
language = "en",
doi = {10.1101/2024.11.13.623350},
}
Model Architecture:
Architecture Type: Not Applicable
Network Architecture: Not Applicable
Input Type(s): Protein Sequence, Databases
Input Format(s): String (less than or equal to 4096 characters), Constrained List of Strings (one or more valid database names)
Input Parameters: String: 1D; Constrained List of Strings: 1D
Other Properties Related to Input: NA
Output:
Output Type(s): Multiple Sequence Alignment in A3M or FASTA format
Output Format: A3M or FASTA (text file)
Output Parameters: 1D
Other Properties Related to Output: N/A
Software Integration:
Runtime Engine(s):
- Python, C++, CUDA
Supported Hardware Microarchitecture Compatibility:
- NVIDIA Ampere, NVIDIA Hopper, NVIDIA Ada Lovelace
[Preferred/Supported] Operating System(s):
- [Linux]
Model Version(s):
MMSeqs2 GPU 17-b804f
Uniref30_2302
colabfold_envdb_202108
PDB70_220313
Training & Evaluation:
Not Applicable.
Training Dataset:
Link: Not Applicable.
** Data Collection Method by dataset
- [Not Applicable]
** Labeling Method by dataset
- [Not Applicable]
Properties:
Not Applicable.
Evaluation Dataset:
Link: Not Applicable.
** Data Collection Method by dataset
- [Not Applicable]
** Labeling Method by dataset
- [Not Applicable]
Properties:
Not Applicable
Inference:
Engine: Python, C++, CUDA
Test Hardware:
- NVIDIA A6000
- NVIDIA A100
- NVIDIA L40
- NVIDIA H100
Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.
You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.