MiniMax-M2.1
Description
MiniMax-M2.1 is a large language model optimized for agentic capabilities including coding, tool use, instruction following, and long-horizon planning. The model is designed to shatter the stereotype that high-performance agents must remain behind closed doors, enabling developers to build autonomous applications for multilingual software development and complex multi-step workflows.
This model is ready for commercial/non-commercial use.
Third-Party Community Consideration:
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA MiniMax-M2.1 Model Card
License and Terms of Use:
GOVERNING TERMS: Your use of the service is governed by the NVIDIA API Catalog Terms of Service. Your use of the model is governed by the NVIDIA Open Model License Agreement. ADDITIONAL INFORMATION: Modified MIT License.
Deployment Geography:
Global
Use Case:
Use Case: Developers and enterprises building autonomous AI agents for software engineering tasks, multilingual code development, automated workflows, tool calling, and long-horizon planning applications.
Release Date:
Build.NVIDIA.com: 01/2026 via link
Huggingface: 12/20/2025 via link
Reference(s):
References:
- MiniMax-M2.1 on Hugging Face
- MiniMax Open Platform API
- MiniMax Agent
- arXiv Paper: WebExplorer
- VIBE Benchmark
Model Architecture:
Architecture Type: Transformer
Network Architecture: Mixture-of-Experts Transformer
Total Parameters: 230B
Input:
Input Types: Text
Input Formats: String
Input Parameters: One Dimensional (1D)
Other Input Properties: Input text is tokenized using the model's native tokenizer. Recommended inference parameters: temperature=1.0, top_p=0.95, top_k=40.
Output:
Output Types: Text
Output Format: String
Output Parameters: One Dimensional (1D)
Other Output Properties: Generated text responses with support for tool calling and structured outputs.
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Software Integration:
Runtime Engines:
- SGLang: Recommended for serving MiniMax-M2.1
- vLLM: Recommended for serving MiniMax-M2.1
- Transformers: Supported for local deployment
- Other: KTransformers
Supported Hardware:
- NVIDIA Ampere: A100, A6000, A40
- NVIDIA Blackwell: B200, B100, GB200
- NVIDIA Hopper: H100, H200
- NVIDIA Lovelace: L40S, L40, RTX 6000 Ada Generation
Preferred/Supported Operating Systems: Linux
The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
Model Version(s)
MiniMax-M2.1 v2.1
Training, Testing, and Evaluation Datasets:
Training Dataset
Data Modality: Text
Training Data Collection: Undisclosed
Training Labeling: Undisclosed
Training Properties: Undisclosed
Testing Dataset
Testing Data Collection: Undisclosed
Testing Labeling: Undisclosed
Testing Properties: Undisclosed
Evaluation Dataset
Evaluation Benchmark Score: MiniMax-M2.1 achieves 74.0% on SWE-bench Verified, 49.4% on Multi-SWE-bench, 72.5% on SWE-bench Multilingual, and 47.9% on Terminal-bench 2.0. The model demonstrates strong performance across coding, tool use, and full-stack development benchmarks.
Detailed Benchmark Comparison Table
| Benchmark | MiniMax-M2.1 | MiniMax-M2 | Claude Sonnet 4.5 | Claude Opus 4.5 | Gemini 3 Pro | GPT-5.2 (thinking) | DeepSeek V3.2 |
|---|---|---|---|---|---|---|---|
| SWE-bench Verified | 74.0 | 69.4 | 77.2 | 80.9 | 78.0 | 80.0 | 73.1 |
| Multi-SWE-bench | 49.4 | 36.2 | 44.3 | 50.0 | 42.7 | x | 37.4 |
| SWE-bench Multilingual | 72.5 | 56.5 | 68 | 77.5 | 65.0 | 72.0 | 70.2 |
| Terminal-bench 2.0 | 47.9 | 30.0 | 50.0 | 57.8 | 54.2 | 54.0 | 46.4 |
| Benchmark | MiniMax-M2.1 | MiniMax-M2 | Claude Sonnet 4.5 | Claude Opus 4.5 | Gemini 3 Pro | GPT-5.2 (thinking) | DeepSeek V3.2 |
|---|---|---|---|---|---|---|---|
| SWE-bench Verified (Droid) | 71.3 | 68.1 | 72.3 | 75.2 | x | x | 67.0 |
| SWE-bench Verified (mini-swe-agent) | 67.0 | 61.0 | 70.6 | 74.4 | 71.8 | 74.2 | 60.0 |
| SWT-bench | 69.3 | 32.8 | 69.5 | 80.2 | 79.7 | 80.7 | 62.0 |
| SWE-Perf | 3.1 | 1.4 | 3.0 | 4.7 | 6.5 | 3.6 | 0.9 |
| SWE-Review | 8.9 | 3.4 | 10.5 | 16.2 | x | x | 6.4 |
| OctoCodingbench | 26.1 | 13.3 | 22.8 | 36.2 | 22.9 | x | 26.0 |
| Benchmark | MiniMax-M2.1 | MiniMax-M2 | Claude Sonnet 4.5 | Claude Opus 4.5 | Gemini 3 Pro |
|---|---|---|---|---|---|
| VIBE (Average) | 88.6 | 67.5 | 85.2 | 90.7 | 82.4 |
| VIBE-Web | 91.5 | 80.4 | 87.3 | 89.1 | 89.5 |
| VIBE-Simulation | 87.1 | 77.0 | 79.1 | 84.0 | 89.2 |
| VIBE-Android | 89.7 | 69.2 | 87.5 | 92.2 | 78.7 |
| VIBE-iOS | 88.0 | 39.5 | 81.2 | 90.0 | 75.8 |
| VIBE-Backend | 86.7 | 67.8 | 90.8 | 98.0 | 78.7 |
| Benchmark | MiniMax-M2.1 | MiniMax-M2 | Claude Sonnet 4.5 | Claude Opus 4.5 | Gemini 3 Pro | GPT-5.2 (thinking) | DeepSeek V3.2 |
|---|---|---|---|---|---|---|---|
| Toolathlon | 43.5 | 16.7 | 38.9 | 43.5 | 36.4 | 41.7 | 35.2 |
| BrowseComp | 47.4 | 44.0 | 19.6 | 37.0 | 37.8 | 65.8 | 51.4 |
| BrowseComp (context management) | 62.0 | 56.9 | 26.1 | 57.8 | 59.2 | 70.0 | 67.6 |
| AIME25 | 83.0 | 78.0 | 88.0 | 91.0 | 96.0 | 98.0 | 92.0 |
| MMLU-Pro | 88.0 | 82.0 | 88.0 | 90.0 | 90.0 | 87.0 | 86.0 |
| GPQA-D | 83.0 | 78.0 | 83.0 | 87.0 | 91.0 | 90.0 | 84.0 |
| HLE w/o tools | 22.2 | 12.5 | 17.3 | 28.4 | 37.2 | 31.4 | 22.2 |
| LCB | 81.0 | 83.0 | 71.0 | 87.0 | 92.0 | 89.0 | 86.0 |
| SciCode | 41.0 | 36.0 | 45.0 | 50.0 | 56.0 | 52.0 | 39.0 |
| IFBench | 70.0 | 72.0 | 57.0 | 58.0 | 70.0 | 75.0 | 61.0 |
| AA-LCR | 62.0 | 61.0 | 66.0 | 74.0 | 71.0 | 73.0 | 65.0 |
| τ²-Bench Telecom | 87.0 | 87.0 | 78.0 | 90.0 | 87.0 | 85.0 | 91.0 |
Evaluation Methodology Notes:
- SWE-bench Verified: Tested on internal infrastructure using Claude Code, Droid, or mini-swe-agent as scaffolding. Default system prompt was overridden. Results represent the average of 4 runs.
- Multi-SWE-Bench & SWE-bench Multilingual & SWT-bench & SWE-Perf: Tested using Claude Code as scaffolding, with default system prompt overridden. Results represent the average of 4 runs.
- Terminal-bench 2.0: Tested using Claude Code. Full dataset verified and environmental issues fixed. Timeout limits removed, other configurations consistent with official settings. Average of 4 runs.
- SWE Review: Internal benchmark for code defect review covering diverse languages and scenarios. Evaluates both defect recall and hallucination rates. "Correct" only if model accurately identifies target defect with no hallucinations. Average of 4 runs.
- OctoCodingbench: Internal benchmark for long-horizon instruction following in complex development scenarios. Uses "single-violation-failure" scoring mechanism. Average of 4 runs.
- VIBE: Uses Claude Code as scaffolding to automatically verify interactive logic and visual effects. Unified pipeline with containerized deployment and dynamic interaction environments. Average of 3 runs.
- Toolathlon: Evaluation protocol consistent with original paper.
- BrowseComp: Same agent framework as WebExplorer with minor tool description fine-tuning. Uses 103-sample GAIA text-only validation subset.
- BrowseComp (context management): When token usage exceeds 30% of max context window, retains first AI response, last five AI responses, and tool outputs.
- AIME25 ~ τ²-Bench Telecom: Based on evaluation datasets and methodology from Artificial Analysis Intelligence Index.
Evaluation Data Collection: Hybrid: Automated, Human
Evaluation Labeling: Hybrid: Automated, Human
Evaluation Properties: See Evaluation Methodology Notes above for detailed testing conditions per benchmark.
Inference
Acceleration Engine: SGLang
Test Hardware: H100x4
Additional Details
Recommended Inference Parameters
- Temperature: 1.0
- Top-p: 0.95
- Top-k: 40
Default System Prompt
You are a helpful assistant. Your name is MiniMax-M2.1 and is built by MiniMax.
Tool Calling
MiniMax-M2.1 supports tool calling capabilities. Refer to the Tool Calling Guide for implementation details.
Deployment Options
- API Access: Available via MiniMax Open Platform
- MiniMax Agent: Production deployment available at agent.minimax.io
- Local Deployment: Supported via SGLang, vLLM, or Transformers
Known Capabilities
- Multilingual software development
- Complex multi-step office workflows
- Long-horizon planning
- Tool use and function calling
- Code generation and review
- Test case generation
- Code performance optimization
Ethical Considerations
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.
