MiniMax-M2.5 Overview
Description:
MiniMax-M2.5 is a text generation model trained to perform complex agentic tasks, including software engineering, tool use, search, and office-work style workflows. It is extensively trained with reinforcement learning in hundreds of thousands of complex real-world environments, M2.5 is SOTA in coding, agentic tool use and search, office work, and a range of other economically valuable tasks, boasting scores of 80.2% in SWE-Bench Verified, 51.3% in Multi-SWE-Bench, and 76.3% in BrowseComp (with context management). Trained to reason efficiently and decompose tasks optimally, M2.5 exhibits tremendous speed in performing complicated agentic tasks, completing the SWE-Bench Verified evaluation 37% faster than M2.1, matching the speed of Claude Opus 4.6.
MiniMax-M2.5 was developed by MiniMaxAI.
This model is ready for commercial/non-commercial use.
Third-Party Community Consideration:
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA MiniMax-M2.5 Model Card.
License and Terms of Use:
GOVERNING TERMS: The trial service is governed by the NVIDIA API Trial Terms of Service; and use of this model is governed by the NVIDIA Open Model License. ADDITIONAL INFORMATION: Modified MIT License. MiniMax M2.5.
Deployment Geography:
Global
Use Case:
Enterprises and developers building AI agents, chatbots, and tool-using applications across coding, office work, and information-retrieval tasks. The model is suited for NLP workloads that require advanced reasoning, long-context handling, and agentic tool use, including:
- Coding and software engineering assistance (e.g., SWE-bench style tasks)
- Search and tool calling workflows
- Office productivity tasks (e.g., document/spreadsheet oriented workflows)
- General conversational assistant use
Release Date:
HuggingFace 02/12/2026 via MiniMaxAI/MiniMax-M2.5
Build.NVIDIA.com 02/26/2026 via link
NGC 02/26/2026 via MiniMax-M2.5 on NGC
Reference(s):
Model Architecture:
Architecture Type: Transformer
Network Architecture: Mixture of Experts (MoE) with Lightning Attention, 8 experts per token (MiniMaxM2ForCausalLM)
Total Parameters: Undisclosed
Active Parameters: Undisclosed
Vocabulary Size: Undisclosed
Base Model: MiniMax M2-series (e.g., MiniMax-M2.1)
Input:
Input Types: Text
Input Formats: String
Input Parameters: One Dimensional (1D)
Other Input Properties: Context length up to 204,800 tokens.
Output:
Output Types: Text
Output Format: String
Output Parameters: One Dimensional (1D)
Other Output Properties: Autoregressive text generation (may include tool-calling structured outputs depending on serving stack).
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Software Integration:
Runtime Engines:
- SGLang: via NVIDIA NIM
Supported Hardware:
- NVIDIA Blackwell: B200
- NVIDIA Hopper: H100, H200, H20, H20-3e
Operating Systems: Linux
The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
Model Version(s)
MiniMaxAI/MiniMax-M2.5
Training, Testing, and Evaluation Datasets:
Training Dataset
Data Modality: Text
Text Training Data Size: Undisclosed
Data Collection Method by dataset: Undisclosed
Labeling Method by dataset: Undisclosed
Properties (Quantity, Dataset Descriptions, Sensor(s)): The model is described as trained across 10+ programming languages and 200,000+ real-world environments, with extensive reinforcement learning over complex environments.
Testing Dataset
Data Collection Method by dataset: Undisclosed
Labeling Method by dataset: Undisclosed
Properties (Quantity, Dataset Descriptions, Sensor(s)): Undisclosed
Evaluation Dataset
Benchmark Score: SWE-Bench Verified (80.2%), Multi-SWE-Bench (51.3%), BrowseComp (76.3%)
Data Collection Method by dataset: Automated
Labeling Method by dataset: Automated
Properties (Quantity, Dataset Descriptions, Sensor(s)): Evaluated on a mix of coding, tool-use, web-browsing, and multi-step reasoning benchmarks such as SWE-Bench, Terminal Bench 2, VIBE-Pro, BrowseComp, Wide Search, RISE, GDPval-MM, MEWC, Finance Modeling, as well as standard academic benchmarks (AIME25, GPQA-D, HLE w/o tools, SciCode, IFBench, AA-LCR).
| Benchmark | MiniMax-M2.5 |
|---|---|
| AIME25 | 86.3 |
| GPQA-D | 85.2 |
| HLE w/o tools | 19.4 |
| SciCode | 44.4 |
| IFBench | 70.0 |
| AA-LCR | 69.5 |
| Benchmark | Description |
|---|---|
| SWE-bench Verified | Coding agent benchmark |
| SWE-bench Multilingual | Multilingual coding benchmark |
| SWE-bench-pro | Professional coding benchmark |
| Multi-SWE-bench | Combined coding benchmark |
| Terminal Bench 2 | Terminal tool-use benchmark |
| VIBE-Pro | Visual-interactive benchmark |
| BrowseComp | Web-browsing benchmark |
| Wide Search | Search benchmark |
| RISE | Multi-step information-retrieval benchmark |
| GDPval-MM | Multi-modal evaluation benchmark |
| MEWC | Excel-world-championship benchmark |
| Finance Modeling | Financial modeling benchmark |
Inference
Acceleration Engine: SGLang
Test Hardware:
- NVIDIA B200
- NVIDIA H100
- NVIDIA H200
- NVIDIA H20
- NVIDIA H20-3e
Additional Details
The model can be integrated via multiple runtimes: Transformers (loading from Hugging Face with trust_remote_code=True), vLLM, SGLang, KTransformers, and other supported engines.
Ethical Considerations
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.
