Mistral Medium 3 Overview
Description:
Mistral Medium 3 is a frontier-class dense language model optimized for enterprise use. It delivers state-of-the-art performance at significantly lower cost—up to 8× cheaper than leading alternatives—while maintaining high usability, adaptability, and deployability in enterprise environments. Designed to excel in professional workloads like coding, STEM reasoning, and multimodal understanding, it supports hybrid and self-hosted deployment, full model customization, and seamless integration into enterprise systems.
This model is ready for commercial/non-commercial use.
Third-Party Community Consideration
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA Mistral Medium 3 Model Card.
License and Terms of Use
GOVERNING TERMS: The trial service is governed by the NVIDIA API Trial Terms of Service. To deploy and customize the model in your environment, please contact Mistral.
Deployment Geography:
Global
Use Case:
Enterprise and research users leveraging high-performance LLMs for reasoning, multilingual understanding, and coding tasks.
- Hybrid or on-premises / in-VPC deployment
- Custom post-training
- Integration into enterprise tools and systems
Release Date:
- May of 2025
Reference(s):
** https://mistral.ai/news/mistral-medium-3
Model Architecture:
Architecture Type: Transformer-based dense decoder-only autoregressive LLM
This model was developed based on: Proprietary Mistral architecture
Input:
Input Type(s): Text
Input Format(s): String
Input Parameters: 2D token sequences
Other Properties Related to Input:
- Up to 128k tokens context length
- Pre-tokenized using
mistral-tokenizer
Output:
Output Type(s): Text
Output Format: String
Output Parameters: 2D token sequences
Other Properties Related to Output:
- Output is in plain text, produced autoregressively
- Post-processing required to decode tokens to readable text
Supported Hardware Microarchitecture Compatibility:
[Preferred/Supported] Operating System(s):
- Linux
- Windows
Model Version(s):
Mistral Medium 3 (2505)
Training, Testing, and Evaluation Datasets:
Training Dataset :
Data Collection Method by dataset: Undisclosed
Labeling Method by dataset: Undisclosed
Properties: Undisclosed
Testing Dataset:
Data Collection Method by dataset: Undisclosed
Labeling Method by dataset: Undisclosed
Properties: Undisclosed
Evaluation Dataset:
Top-tier performance.
Mistral Medium 3 is designed to be frontier-class, particularly in categories of professional use. In the evaluations below, we use numbers reported previously by other providers wherever available, otherwise we use our own evaluation harness. Performance accuracy on all benchmarks were obtained through the same internal evaluation pipeline. Mistral Medium 3 particular stands out in coding and STEM tasks where it comes close to its very large and much slower competitors.
Mistral Medium Benchmarking
Benchmark | Mistral Medium 3 | Llama 4 Maverick | GPT-4o | Claude Sonnet 3.7 | Command-A | DeepSeek 3.1 |
---|---|---|---|---|---|---|
CODING | ||||||
HumanEval 0-shot | 92.1% | 85.4% | 91.5% | 92.1% | 82.9% | 93.3% |
LiveCodeBench (v6) 0-shot | 30.3% | 28.7% | 31.4% | 36.0% | 26.3% | 42.9% |
MultiPL-E average 0-shot | 81.4% | 76.4% | 79.8% | 83.4% | 73.1% | 84.9% |
INSTRUCTION FOLLOWING | ||||||
ArenaHard 0-shot | 97.1% | 91.8% | 95.4% | 93.2% | 95.1% | 97.3% |
IFEval 0-shot | 89.4% | 88.9% | 87.2% | 91.8% | 89.7% | 89.1% |
MATH | ||||||
Math500 Instruct 0-shot | 91.0% | 90.0% | 76.4% | 83.0% | 82.0% | 93.8% |
KNOWLEDGE | ||||||
GPQA Diamond 0-shot CoT | 57.1% | 61.1% | 52.5% | 69.7% | 46.5% | 61.1% |
MMLU Pro 0-shot CoT | 77.2% | 80.4% | 75.8% | 80.0% | 68.9% | 81.1% |
LONG CONTEXT | ||||||
RULER 32K | 96.0% | 94.8% | 96.0% | 95.7% | 95.6% | 95.8% |
RULER 128K | 90.2% | 86.7% | 88.9% | 93.8% | 91.2% | 91.9% |
MULTIMODAL | ||||||
MMMU 0-shot | 66.1% | 71.8% | 66.1% | 71.3% | ||
DocVQA 0-shot | 95.3% | 94.1% | 85.9% | 84.3% | No multimodal support | No multimodal support |
AI2D 0-shot | 93.7% | 84.4% | 93.3% | 78.8% | ||
ChartQA 0-shot | 82.6% | 90.4% | 86.0% | 76.3% |
*Performance accuracy on all benchmarks were obtained through the same internal evaluation pipeline.
Human Evals
In addition to academic benchmarks we report third-party human evaluations that are more representative of real-world use cases. Mistral Medium 3 continues to shine in the coding domain and delivers much better performance, across the board, than some of its much larger competitors.
Competitor | Mistral Wins (%) | Other Model Wins (%) |
---|---|---|
Claude Sonnet 3.7 | 40.00 | 60.00 |
DeepSeek 3.1 | 37.50 | 62.50 |
GPT-4o | 50.00 | 50.00 |
Command-A | 69.23 | 30.77 |
Llama 4 Maverick | 81.82 | 18.18 |
Domain | Mistral Win Rate (%) | Llama 4 Maverick Win Rate (%) |
---|---|---|
Coding | 81.82 | 18.18 |
Multimodal | 53.85 | 46.15 |
English | 66.67 | 33.33 |
French | 71.43 | 28.57 |
Spanish | 73.33 | 26.67 |
German | 62.50 | 37.50 |
Arabic | 64.71 | 35.29 |
Inference:
Engine: Compatible with open-source inference engines like vLLM
Test Hardware:
- NVIDIA H100
Ethical Considerations
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.