minimaxai / minimax-m2.1

MiniMax-M2.1

Description

MiniMax-M2.1 is a large language model optimized for agentic capabilities including coding, tool use, instruction following, and long-horizon planning. The model is designed to shatter the stereotype that high-performance agents must remain behind closed doors, enabling developers to build autonomous applications for multilingual software development and complex multi-step workflows.

This model is ready for commercial/non-commercial use.

Third-Party Community Consideration:

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA MiniMax-M2.1 Model Card

License and Terms of Use:

GOVERNING TERMS: Your use of the service is governed by the NVIDIA API Catalog Terms of Service. Your use of the model is governed by the NVIDIA Open Model License Agreement. ADDITIONAL INFORMATION: Modified MIT License.

Deployment Geography:

Global

Use Case:

Use Case: Developers and enterprises building autonomous AI agents for software engineering tasks, multilingual code development, automated workflows, tool calling, and long-horizon planning applications.

Release Date:

Build.NVIDIA.com: 01/2026 via link
Huggingface: 12/20/2025 via link

Reference(s):

References:

Model Architecture:

Architecture Type: Transformer
Network Architecture: Mixture-of-Experts Transformer
Total Parameters: 230B

Input:

Input Types: Text
Input Formats: String
Input Parameters: One Dimensional (1D)
Other Input Properties: Input text is tokenized using the model's native tokenizer. Recommended inference parameters: temperature=1.0, top_p=0.95, top_k=40.

Output:

Output Types: Text
Output Format: String
Output Parameters: One Dimensional (1D)
Other Output Properties: Generated text responses with support for tool calling and structured outputs.

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration:

Runtime Engines:

  • SGLang: Recommended for serving MiniMax-M2.1
  • vLLM: Recommended for serving MiniMax-M2.1
  • Transformers: Supported for local deployment
  • Other: KTransformers

Supported Hardware:

  • NVIDIA Ampere: A100, A6000, A40
  • NVIDIA Blackwell: B200, B100, GB200
  • NVIDIA Hopper: H100, H200
  • NVIDIA Lovelace: L40S, L40, RTX 6000 Ada Generation

Preferred/Supported Operating Systems: Linux

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.

Model Version(s)

MiniMax-M2.1 v2.1

Training, Testing, and Evaluation Datasets:

Training Dataset

Data Modality: Text
Training Data Collection: Undisclosed
Training Labeling: Undisclosed
Training Properties: Undisclosed

Testing Dataset

Testing Data Collection: Undisclosed
Testing Labeling: Undisclosed
Testing Properties: Undisclosed

Evaluation Dataset

Evaluation Benchmark Score: MiniMax-M2.1 achieves 74.0% on SWE-bench Verified, 49.4% on Multi-SWE-bench, 72.5% on SWE-bench Multilingual, and 47.9% on Terminal-bench 2.0. The model demonstrates strong performance across coding, tool use, and full-stack development benchmarks.

Detailed Benchmark Comparison Table
BenchmarkMiniMax-M2.1MiniMax-M2Claude Sonnet 4.5Claude Opus 4.5Gemini 3 ProGPT-5.2 (thinking)DeepSeek V3.2
SWE-bench Verified74.069.477.280.978.080.073.1
Multi-SWE-bench49.436.244.350.042.7x37.4
SWE-bench Multilingual72.556.56877.565.072.070.2
Terminal-bench 2.047.930.050.057.854.254.046.4
BenchmarkMiniMax-M2.1MiniMax-M2Claude Sonnet 4.5Claude Opus 4.5Gemini 3 ProGPT-5.2 (thinking)DeepSeek V3.2
SWE-bench Verified (Droid)71.368.172.375.2xx67.0
SWE-bench Verified (mini-swe-agent)67.061.070.674.471.874.260.0
SWT-bench69.332.869.580.279.780.762.0
SWE-Perf3.11.43.04.76.53.60.9
SWE-Review8.93.410.516.2xx6.4
OctoCodingbench26.113.322.836.222.9x26.0
BenchmarkMiniMax-M2.1MiniMax-M2Claude Sonnet 4.5Claude Opus 4.5Gemini 3 Pro
VIBE (Average)88.667.585.290.782.4
VIBE-Web91.580.487.389.189.5
VIBE-Simulation87.177.079.184.089.2
VIBE-Android89.769.287.592.278.7
VIBE-iOS88.039.581.290.075.8
VIBE-Backend86.767.890.898.078.7
BenchmarkMiniMax-M2.1MiniMax-M2Claude Sonnet 4.5Claude Opus 4.5Gemini 3 ProGPT-5.2 (thinking)DeepSeek V3.2
Toolathlon43.516.738.943.536.441.735.2
BrowseComp47.444.019.637.037.865.851.4
BrowseComp (context management)62.056.926.157.859.270.067.6
AIME2583.078.088.091.096.098.092.0
MMLU-Pro88.082.088.090.090.087.086.0
GPQA-D83.078.083.087.091.090.084.0
HLE w/o tools22.212.517.328.437.231.422.2
LCB81.083.071.087.092.089.086.0
SciCode41.036.045.050.056.052.039.0
IFBench70.072.057.058.070.075.061.0
AA-LCR62.061.066.074.071.073.065.0
τ²-Bench Telecom87.087.078.090.087.085.091.0

Evaluation Methodology Notes:

  • SWE-bench Verified: Tested on internal infrastructure using Claude Code, Droid, or mini-swe-agent as scaffolding. Default system prompt was overridden. Results represent the average of 4 runs.
  • Multi-SWE-Bench & SWE-bench Multilingual & SWT-bench & SWE-Perf: Tested using Claude Code as scaffolding, with default system prompt overridden. Results represent the average of 4 runs.
  • Terminal-bench 2.0: Tested using Claude Code. Full dataset verified and environmental issues fixed. Timeout limits removed, other configurations consistent with official settings. Average of 4 runs.
  • SWE Review: Internal benchmark for code defect review covering diverse languages and scenarios. Evaluates both defect recall and hallucination rates. "Correct" only if model accurately identifies target defect with no hallucinations. Average of 4 runs.
  • OctoCodingbench: Internal benchmark for long-horizon instruction following in complex development scenarios. Uses "single-violation-failure" scoring mechanism. Average of 4 runs.
  • VIBE: Uses Claude Code as scaffolding to automatically verify interactive logic and visual effects. Unified pipeline with containerized deployment and dynamic interaction environments. Average of 3 runs.
  • Toolathlon: Evaluation protocol consistent with original paper.
  • BrowseComp: Same agent framework as WebExplorer with minor tool description fine-tuning. Uses 103-sample GAIA text-only validation subset.
  • BrowseComp (context management): When token usage exceeds 30% of max context window, retains first AI response, last five AI responses, and tool outputs.
  • AIME25 ~ τ²-Bench Telecom: Based on evaluation datasets and methodology from Artificial Analysis Intelligence Index.

Evaluation Data Collection: Hybrid: Automated, Human
Evaluation Labeling: Hybrid: Automated, Human
Evaluation Properties: See Evaluation Methodology Notes above for detailed testing conditions per benchmark.

Inference

Acceleration Engine: SGLang
Test Hardware: H100x4

Additional Details

Recommended Inference Parameters

  • Temperature: 1.0
  • Top-p: 0.95
  • Top-k: 40

Default System Prompt

You are a helpful assistant. Your name is MiniMax-M2.1 and is built by MiniMax.

Tool Calling

MiniMax-M2.1 supports tool calling capabilities. Refer to the Tool Calling Guide for implementation details.

Deployment Options

Known Capabilities

  • Multilingual software development
  • Complex multi-step office workflows
  • Long-horizon planning
  • Tool use and function calling
  • Code generation and review
  • Test case generation
  • Code performance optimization

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

country_code