minimaxai / minimax-m2

MiniMax-M2 Overview

Description

MiniMax-M2 is a compact, fast, and cost-effective Mixture-of-Experts (MoE) model with 230 billion total parameters and 10 billion active parameters, built for elite performance in coding and agentic tasks while maintaining powerful general intelligence. With just 10 billion activated parameters, MiniMax-M2 provides sophisticated end-to-end tool use performance expected from today's leading models, but in a streamlined form factor that makes deployment and scaling easier than ever. The model excels at multi-file edits, coding-run-fix loops, test-validated repairs, and complex long-horizon toolchains across shell, browser, retrieval, and code runners.

This model is ready for commercial and non-commercial use

Third-Party Community Consideration:

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA MiniMax-M2 Model Card.

License and Terms of Use:

GOVERNING TERMS: The trial service is governed by the NVIDIA API Trial Terms of Service. Use of this model is governed by the NVIDIA Open Model License Agreement. Additional Information: MIT License.

Deployment Geography:

Deployment Geography: Global

Use Case:

Use Case: End-to-end developer workflows, multi-file code editing, coding-run-fix loops, test-validated repairs, agentic tool use, complex long-horizon toolchains, terminal and IDE operations, web browsing and retrieval tasks, research and commercial applications.

Release Date:

build.nvidia.com: 10/31/2025 via link

Huggingface: 10/27/2025 via link

Reference(s):

MiniMax Official Website

MiniMax Open Platform

MiniMax-M2 Technical Report

Model Architecture:

Architecture Type: Mixture-of-Experts (MoE) Transformer

Network Architecture: Transformer-based MoE with interleaved thinking capabilities

Total Parameters: 230B

Active Parameters: 10B

Vocabulary Size: Undisclosed

Input:

Input Types: Text

Input Parameters: [One-Dimensional (1D)]

Other Input Properties: Supports tool calling, function calling, and interleaved thinking with <think>...</think> tags

Input Context Length (ISL): 128,000 tokens

Output:

Output Type: Text

Output Parameters: [One-Dimensional (1D)]

Other Output Properties: Includes interleaved thinking content wrapped in <think>...</think> tags that must be preserved in conversation history for optimal performance

Output Context Length (OSL): Up to 128,000 tokens (shared with input)

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration:

Runtime Engines: PyTorch, transformers, vLLM, SGLang, MLX
Supported Hardware:

NVIDIA Ada Lovelace
NVIDIA Ampere
NVIDIA Blackwell
NVIDIA Hopper

Operating Systems: Linux

Model Version(s)

MiniMax-M2

Training, Testing, and Evaluation Datasets:

Training Dataset

Training Data Collection: Undisclosed

Training Labeling: Undisclosed

Data Modality: Text, Code

Text Training Data Size: Undisclosed

Training Properties: Trained with emphasis on coding, agentic workflows, and tool use capabilities

Testing Dataset

Testing Data Collection: Human and Automated

Testing Labeling: Human and Automated

Testing Properties: Comprehensive evaluation across coding benchmarks (SWE-bench Verified, Multi-SWE-Bench, Terminal-Bench), agentic benchmarks (BrowseComp, GAIA, AgentCompany), and intelligence benchmarks (MMLU-Pro, GPQA-Diamond, LiveCodeBench)

Evaluation Dataset

Evaluation Benchmark Scores:

SWE-bench Verified: 69.4
Multi-SWE-Bench: 36.2
Terminal-Bench: 46.3
ArtifactsBench: 66.8
BrowseComp: 44.0
GAIA (text only): 75.7
MMLU-Pro: 82
GPQA-Diamond: 78
LiveCodeBench: 83
AA Intelligence Composite Score: 61 (Rank #1 among open-source models globally per Artificial Analysis)

Evaluation Data Collection: Automated

Evaluation Labeling: Automated, Human

Evaluation Properties: Multi-domain evaluation including real-world end-to-end coding, terminal operations, web browsing, agentic tool use, mathematics, science, and instruction following

Inference

Acceleration Engine: vLLM, SGLang, TensorRT-LLM, MLX

Test Hardware: NVIDIA GPU clusters, Apple Silicon M3 Ultra+

Recommended Parameters: temperature=1.0, top_p=0.95, top_k=40

Additional Information

MiniMax-M2 is an interleaved thinking model that uses <think>...</think> tags to wrap reasoning content. When using the model, historical conversation context must retain these thinking tags in their original format - do not remove them, as this will negatively impact performance. The model's efficient 10B activation design enables faster feedback cycles in compile-run-test and browse-retrieve-cite chains, more concurrent runs on the same budget, and simpler capacity planning with smaller per-request memory and steadier tail latency.

MiniMax Agent, built on MiniMax-M2, is publicly available and free for a limited time at agent.minimax.io. The MiniMax-M2 API is available on the MiniMax Open Platform and is free for a limited time.

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.