mit / boltz2

Model Overview

Description:

Boltz-2 NIM is a next-generation structural biology foundation model that shows strong performance for both structure and affinity prediction. Boltz-2 is the first deep learning model to approach the accuracy of free energy perturbation (FEP) methods in predicting binding affinities of small molecules and proteins—achieving strong correlations on benchmarks while being nearly 1000x more computationally efficient.

Key Features:

Trunk optimizations: Mixed-precision (bfloat16) and trifast triangle attention cut runtime/memory; enables training with 768-token crops (as in AlphaFold3).

Physical quality: Integrates Boltz-steering at inference (Boltz-2x) to reduce steric clashes and stereochemistry errors without losing accuracy.

Controllability:

  • Method conditioning: Steers predictions to resemble X-ray, NMR, or MD-style structures.
  • Template conditioning + steering: Uses single or multimeric templates; supports strict template enforcement or soft guidance.
  • Contact/pocket conditioning: Accepts distance constraints from experiments or expert priors.

Affinity module: PairFormer refines protein–ligand and intra-ligand interactions; predicts both binding likelihood and a continuous affinity on log µM scale (trained on mixed Ki, Kd, IC50). Output is an IC50-like measure suitable for ranking.

Key advances vs Boltz-1/1x: Faster/more memory-efficient trunk, improved physical plausibility via integrated steering, markedly enhanced controllability, and added affinity prediction head.

This NIM is ready for commercial use.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case.

License / Terms of Use

GOVERNING TERMS: This trial service is governed by the NVIDIA API Trial Terms of Service. Use of this model is governed by the NVIDIA Community Model License. Additional Information: MIT.

You are responsible for ensuring that your use of NVIDIA provided models complies with all applicable laws.

Deployment Geography

Global

Use Case

Boltz-2 NIM enables researchers and commercial entities in the Drug Discovery, Life Sciences, and Digital Biology fields to predict the three-dimensional structure of biomolecular complexes and predict small-molecule binding affinities. Trained on millions of curated experimental datapoints with a novel training strategy tailored for noisy biochemical assay data, Boltz-2 demonstrates robust performance across hit-discovery, hit-to-lead, and lead optimization.

Release Date

build.nvidia.com: January 30, 2026 via build.nvidia.com/mit/boltz2

NGC: January 30, 2026 via catalog.ngc.nvidia.com

References:

@article{passaro2025boltz2,
  author = {Passaro, Saro and Corso, Gabriele and Wohlwend, Jeremy and Reveiz, Mateo and Thaler, Stephan and Somnath, Vignesh Ram and Getz, Noah and Portnoi, Tally and Roy, Julien and Stark, Hannes and Kwabi-Addo, David and Beaini, Dominique and Jaakkola, Tommi and Barzilay, Regina},
  title = {Boltz-2: Towards Accurate and Efficient Binding Affinity Prediction},
  year = {2025},
  doi = {10.1101/2025.06.14.659707},
  journal = {bioRxiv},
  language = "en"
}

Model Architecture:

Architecture Type: Four components — trunk, denoising module (with steering), confidence module, and a new affinity module

Network Architecture: PairFormer

Input:

Input Type(s): Biomolecular sequences (protein, DNA, RNA), ligand SMILES or CCD strings, molecular modifications, structural constraints, conditioning parameters, optional booleans

Input Format(s): Dictionary containing sequence strings, modification records, and constraint parameters

Input Parameters: Sequences (strings), predict_affinity(boolean), modifications (list of residue-specific changes), constraints (dictionary of structural parameters)

Other Properties Related to Input: Maximum sequence length of 4096 residues per chain. Maximum of 12 input polymers. Maximum of 20 input ligands. Passing boolean options such as predict_affinity will increase the runtime of the request.

Model Parameters:
Tables 1 and 2 record some of the hyperparameters of Boltz-2's architecture, training and inference procedures that differ from Boltz-1's and were not previously mentioned in the manuscript.

Table 1: Extra model architecture and training hyperparameters that differ from Boltz-1 and were not previously mentioned in the manuscript.

ParameterValue
Max number of MSA sequences during training8192
Template pairwise dim64
Num template blocks2
Training diffusion multiplicity32
bfactor loss weight1 × 10⁻³

Table 2: Diffusion process hyperparameters that differ from Boltz-1, with the exception of sigma_min we opted for AlphaFold3's default hyperparameters, see Abramson et al. (2024) for more details.

ParameterValue
sigma_min0.0001
rho7
gamma_00.8
gamma_min1.0
noise_scale1.003
step_scale1.5

Output:

Output Type(s): Structure prediction in mmcif format; scores in numeric arrays; runtime metrics as a dictionary

Output Format: mmcif (text file); numeric arrays; scalar numeric values

Output Parameters: 3D atomic coordinates, predicted scores, and metadata

Other Properties Related to Output: All Boltz-2 scores are returned by default. Runtime metrics are optional.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration:

Runtime Engine(s):

  • PyTorch, TensorRT

Supported Hardware Microarchitecture Compatibility:

  • NVIDIA Ampere, NVIDIA Hopper, NVIDIA Lovelace, NVIDIA Blackwell

[Preferred/Supported] Operating System(s):

  • [Linux]
  • The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.

Model Version(s):

Boltz2 v2.2.1

Training & Evaluation:

Training Dataset:

** Data Modality

  • [Text]

Link: Protein Data Bank as used by AlphaFold3

** Data Collection Method by dataset

  • Human

** Labeling Method by dataset

  • Human

Properties:
All Protein Data Bank structures before 2021-09-30 with a resolution of at least 9 Angstroms, preprocessed to match each structure to its sequence. Ligands were processed similarly. All data was cleaned as described in AlphaFold3.

Evaluation Dataset:

Link: Boltz Evaluation Performed on 744 Structures from the Protein Data Bank

** Data Collection Method by dataset

  • Human

** Labeling Method by dataset

  • Hybrid: Human and Automated

Properties:
The test and validation datasets were generated by extensive filtering of PDB sequences deposited between 2021-09-31 and 2023-01-13. In total, 593 structures passed filters and were used for validation.

Inference:

Acceleration Engine: PyTorch, TensorRT

Test Hardware:

  • NVIDIA B200
  • NVIDIA H100
  • NVIDIA L40
  • NVIDIA A100
  • NVIDIA RTX6000/NVIDIA RTX6000-Ada
  • NVIDIA GB200
  • NVIDIA GB10 (DGX Spark)

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

You are responsible for ensuring the physical properties of model-generated molecules are appropriately evaluated, and comply with applicable safety regulations and ethical standards.

Get Help

Enterprise Support

Get access to knowledge base articles and support cases or submit a ticket.

country_code