MiniMax-M2 Overview
Description
MiniMax-M2 is a compact, fast, and cost-effective Mixture-of-Experts (MoE) model with 230 billion total parameters and 10 billion active parameters, built for elite performance in coding and agentic tasks while maintaining powerful general intelligence. With just 10 billion activated parameters, MiniMax-M2 provides sophisticated end-to-end tool use performance expected from today's leading models, but in a streamlined form factor that makes deployment and scaling easier than ever. The model excels at multi-file edits, coding-run-fix loops, test-validated repairs, and complex long-horizon toolchains across shell, browser, retrieval, and code runners.
This model is ready for commercial and non-commercial use
Third-Party Community Consideration:
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA MiniMax-M2 Model Card.
License and Terms of Use:
GOVERNING TERMS: The trial service is governed by the NVIDIA API Trial Terms of Service. Use of this model is governed by the NVIDIA Open Model License Agreement. Additional Information: MIT License.
Deployment Geography:
Deployment Geography: Global
Use Case:
Use Case: End-to-end developer workflows, multi-file code editing, coding-run-fix loops, test-validated repairs, agentic tool use, complex long-horizon toolchains, terminal and IDE operations, web browsing and retrieval tasks, research and commercial applications.
Release Date:
build.nvidia.com: 10/31/2025 via link
Huggingface: 10/27/2025 via link
Reference(s):
MiniMax Official Website
MiniMax Open Platform
MiniMax-M2 Technical Report
Model Architecture:
Architecture Type: Mixture-of-Experts (MoE) Transformer
Network Architecture: Transformer-based MoE with interleaved thinking capabilities
Total Parameters: 230B
Active Parameters: 10B
Vocabulary Size: Undisclosed
Input:
Input Types: Text
Input Parameters: [One-Dimensional (1D)]
Other Input Properties: Supports tool calling, function calling, and interleaved thinking with <think>...</think> tags
Input Context Length (ISL): 128,000 tokens
Output:
Output Type: Text
Output Parameters: [One-Dimensional (1D)]
Other Output Properties: Includes interleaved thinking content wrapped in <think>...</think> tags that must be preserved in conversation history for optimal performance
Output Context Length (OSL): Up to 128,000 tokens (shared with input)
Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
Software Integration:
Runtime Engines: PyTorch, transformers, vLLM, SGLang, MLX
Supported Hardware:
- NVIDIA Ada Lovelace
- NVIDIA Ampere
- NVIDIA Blackwell
- NVIDIA Hopper
Operating Systems: Linux
Model Version(s)
MiniMax-M2
Training, Testing, and Evaluation Datasets:
Training Dataset
Training Data Collection: Undisclosed
Training Labeling: Undisclosed
Data Modality: Text, Code
Text Training Data Size: Undisclosed
Training Properties: Trained with emphasis on coding, agentic workflows, and tool use capabilities
Testing Dataset
Testing Data Collection: Human and Automated
Testing Labeling: Human and Automated
Testing Properties: Comprehensive evaluation across coding benchmarks (SWE-bench Verified, Multi-SWE-Bench, Terminal-Bench), agentic benchmarks (BrowseComp, GAIA, AgentCompany), and intelligence benchmarks (MMLU-Pro, GPQA-Diamond, LiveCodeBench)
Evaluation Dataset
Evaluation Benchmark Scores:
- SWE-bench Verified: 69.4
- Multi-SWE-Bench: 36.2
- Terminal-Bench: 46.3
- ArtifactsBench: 66.8
- BrowseComp: 44.0
- GAIA (text only): 75.7
- MMLU-Pro: 82
- GPQA-Diamond: 78
- LiveCodeBench: 83
- AA Intelligence Composite Score: 61 (Rank #1 among open-source models globally per Artificial Analysis)
Evaluation Data Collection: Automated
Evaluation Labeling: Automated, Human
Evaluation Properties: Multi-domain evaluation including real-world end-to-end coding, terminal operations, web browsing, agentic tool use, mathematics, science, and instruction following
Inference
Acceleration Engine: vLLM, SGLang, TensorRT-LLM, MLX
Test Hardware: NVIDIA GPU clusters, Apple Silicon M3 Ultra+
Recommended Parameters: temperature=1.0, top_p=0.95, top_k=40
Additional Information
MiniMax-M2 is an interleaved thinking model that uses <think>...</think> tags to wrap reasoning content. When using the model, historical conversation context must retain these thinking tags in their original format - do not remove them, as this will negatively impact performance. The model's efficient 10B activation design enables faster feedback cycles in compile-run-test and browse-retrieve-cite chains, more concurrent runs on the same budget, and simpler capacity planning with smaller per-request memory and steadier tail latency.
MiniMax Agent, built on MiniMax-M2, is publicly available and free for a limited time at agent.minimax.io. The MiniMax-M2 API is available on the MiniMax Open Platform and is free for a limited time.
Ethical Considerations
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.
