mistralai / mistral-medium-3-instruct

Mistral Medium 3 Overview

Description:

Mistral Medium 3 is a frontier-class dense language model optimized for enterprise use. It delivers state-of-the-art performance at significantly lower cost—up to 8× cheaper than leading alternatives—while maintaining high usability, adaptability, and deployability in enterprise environments. Designed to excel in professional workloads like coding, STEM reasoning, and multimodal understanding, it supports hybrid and self-hosted deployment, full model customization, and seamless integration into enterprise systems.

This model is ready for commercial/non-commercial use.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA Mistral Medium 3 Model Card.

License and Terms of Use

GOVERNING TERMS: The trial service is governed by the NVIDIA API Trial Terms of Service. To deploy and customize the model in your environment, please contact Mistral.

Deployment Geography:

Global

Use Case:

Enterprise and research users leveraging high-performance LLMs for reasoning, multilingual understanding, and coding tasks.

Hybrid or on-premises / in-VPC deployment
Custom post-training
Integration into enterprise tools and systems

Release Date:

May of 2025

Reference(s):

** https://mistral.ai/news/mistral-medium-3

Model Architecture:

Architecture Type: Transformer-based dense decoder-only autoregressive LLM
This model was developed based on: Proprietary Mistral architecture

Input:

Input Type(s): Text
Input Format(s): String
Input Parameters: 2D token sequences
Other Properties Related to Input:

Up to 128k tokens context length
Pre-tokenized using mistral-tokenizer

Output:

Output Type(s): Text
Output Format: String
Output Parameters: 2D token sequences
Other Properties Related to Output:

Output is in plain text, produced autoregressively
Post-processing required to decode tokens to readable text

Supported Hardware Microarchitecture Compatibility:

[Preferred/Supported] Operating System(s):

Linux
Windows

Model Version(s):

Mistral Medium 3 (2505)

Training, Testing, and Evaluation Datasets:

Training Dataset :

Data Collection Method by dataset: Undisclosed

Labeling Method by dataset: Undisclosed

Properties: Undisclosed

Testing Dataset:

Data Collection Method by dataset: Undisclosed

Labeling Method by dataset: Undisclosed

Properties: Undisclosed

Evaluation Dataset:

Top-tier performance.

Mistral Medium 3 is designed to be frontier-class, particularly in categories of professional use. In the evaluations below, we use numbers reported previously by other providers wherever available, otherwise we use our own evaluation harness. Performance accuracy on all benchmarks were obtained through the same internal evaluation pipeline. Mistral Medium 3 particular stands out in coding and STEM tasks where it comes close to its very large and much slower competitors.

Mistral Medium Benchmarking

Benchmark	Mistral Medium 3	Llama 4 Maverick	GPT-4o	Claude Sonnet 3.7	Command-A	DeepSeek 3.1
CODING
HumanEval 0-shot	92.1%	85.4%	91.5%	92.1%	82.9%	93.3%
LiveCodeBench (v6) 0-shot	30.3%	28.7%	31.4%	36.0%	26.3%	42.9%
MultiPL-E average 0-shot	81.4%	76.4%	79.8%	83.4%	73.1%	84.9%
INSTRUCTION FOLLOWING
ArenaHard 0-shot	97.1%	91.8%	95.4%	93.2%	95.1%	97.3%
IFEval 0-shot	89.4%	88.9%	87.2%	91.8%	89.7%	89.1%
MATH
Math500 Instruct 0-shot	91.0%	90.0%	76.4%	83.0%	82.0%	93.8%
KNOWLEDGE
GPQA Diamond 0-shot CoT	57.1%	61.1%	52.5%	69.7%	46.5%	61.1%
MMLU Pro 0-shot CoT	77.2%	80.4%	75.8%	80.0%	68.9%	81.1%
LONG CONTEXT
RULER 32K	96.0%	94.8%	96.0%	95.7%	95.6%	95.8%
RULER 128K	90.2%	86.7%	88.9%	93.8%	91.2%	91.9%
MULTIMODAL
MMMU 0-shot	66.1%	71.8%	66.1%	71.3%
DocVQA 0-shot	95.3%	94.1%	85.9%	84.3%	No multimodal support	No multimodal support
AI2D 0-shot	93.7%	84.4%	93.3%	78.8%
ChartQA 0-shot	82.6%	90.4%	86.0%	76.3%

*Performance accuracy on all benchmarks were obtained through the same internal evaluation pipeline.

Human Evals

In addition to academic benchmarks we report third-party human evaluations that are more representative of real-world use cases. Mistral Medium 3 continues to shine in the coding domain and delivers much better performance, across the board, than some of its much larger competitors.

Competitor	Mistral Wins (%)	Other Model Wins (%)
Claude Sonnet 3.7	40.00	60.00
DeepSeek 3.1	37.50	62.50
GPT-4o	50.00	50.00
Command-A	69.23	30.77
Llama 4 Maverick	81.82	18.18

Domain	Mistral Win Rate (%)	Llama 4 Maverick Win Rate (%)
Coding	81.82	18.18
Multimodal	53.85	46.15
English	66.67	33.33
French	71.43	28.57
Spanish	73.33	26.67
German	62.50	37.50
Arabic	64.71	35.29

Inference:

Engine: Compatible with open-source inference engines like vLLM
Test Hardware:

NVIDIA H100

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.