mistralai / mistral-medium-3-instruct

Mistral Medium 3 Overview

Description:

Mistral Medium 3 is a frontier-class dense language model optimized for enterprise use. It delivers state-of-the-art performance at significantly lower cost—up to 8× cheaper than leading alternatives—while maintaining high usability, adaptability, and deployability in enterprise environments. Designed to excel in professional workloads like coding, STEM reasoning, and multimodal understanding, it supports hybrid and self-hosted deployment, full model customization, and seamless integration into enterprise systems.

This model is ready for commercial/non-commercial use.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA Mistral Medium 3 Model Card.

License and Terms of Use

GOVERNING TERMS: The trial service is governed by the NVIDIA API Trial Terms of Service. To deploy and customize the model in your environment, please contact Mistral.

Deployment Geography:

Global

Use Case:

Enterprise and research users leveraging high-performance LLMs for reasoning, multilingual understanding, and coding tasks.

  • Hybrid or on-premises / in-VPC deployment
  • Custom post-training
  • Integration into enterprise tools and systems

Release Date:

  • May of 2025

Reference(s):

** https://mistral.ai/news/mistral-medium-3

Model Architecture:

Architecture Type: Transformer-based dense decoder-only autoregressive LLM
This model was developed based on: Proprietary Mistral architecture

Input:

Input Type(s): Text
Input Format(s): String
Input Parameters: 2D token sequences
Other Properties Related to Input:

  • Up to 128k tokens context length
  • Pre-tokenized using mistral-tokenizer

Output:

Output Type(s): Text
Output Format: String
Output Parameters: 2D token sequences
Other Properties Related to Output:

  • Output is in plain text, produced autoregressively
  • Post-processing required to decode tokens to readable text

Supported Hardware Microarchitecture Compatibility:

[Preferred/Supported] Operating System(s):

  • Linux
  • Windows

Model Version(s):

Mistral Medium 3 (2505)

Training, Testing, and Evaluation Datasets:

Training Dataset :

Data Collection Method by dataset: Undisclosed

Labeling Method by dataset: Undisclosed

Properties: Undisclosed

Testing Dataset:

Data Collection Method by dataset: Undisclosed

Labeling Method by dataset: Undisclosed

Properties: Undisclosed

Evaluation Dataset:

Top-tier performance.


Mistral Medium 3 is designed to be frontier-class, particularly in categories of professional use. In the evaluations below, we use numbers reported previously by other providers wherever available, otherwise we use our own evaluation harness. Performance accuracy on all benchmarks were obtained through the same internal evaluation pipeline. Mistral Medium 3 particular stands out in coding and STEM tasks where it comes close to its very large and much slower competitors.

Mistral Medium Benchmarking

BenchmarkMistral Medium 3Llama 4 MaverickGPT-4oClaude Sonnet 3.7Command-ADeepSeek 3.1
CODING
HumanEval 0-shot92.1%85.4%91.5%92.1%82.9%93.3%
LiveCodeBench (v6) 0-shot30.3%28.7%31.4%36.0%26.3%42.9%
MultiPL-E average 0-shot81.4%76.4%79.8%83.4%73.1%84.9%
INSTRUCTION FOLLOWING
ArenaHard 0-shot97.1%91.8%95.4%93.2%95.1%97.3%
IFEval 0-shot89.4%88.9%87.2%91.8%89.7%89.1%
MATH
Math500 Instruct 0-shot91.0%90.0%76.4%83.0%82.0%93.8%
KNOWLEDGE
GPQA Diamond 0-shot CoT57.1%61.1%52.5%69.7%46.5%61.1%
MMLU Pro 0-shot CoT77.2%80.4%75.8%80.0%68.9%81.1%
LONG CONTEXT
RULER 32K96.0%94.8%96.0%95.7%95.6%95.8%
RULER 128K90.2%86.7%88.9%93.8%91.2%91.9%
MULTIMODAL
MMMU 0-shot66.1%71.8%66.1%71.3%
DocVQA 0-shot95.3%94.1%85.9%84.3%No multimodal supportNo multimodal support
AI2D 0-shot93.7%84.4%93.3%78.8%
ChartQA 0-shot82.6%90.4%86.0%76.3%

*Performance accuracy on all benchmarks were obtained through the same internal evaluation pipeline.

Human Evals


In addition to academic benchmarks we report third-party human evaluations that are more representative of real-world use cases. Mistral Medium 3 continues to shine in the coding domain and delivers much better performance, across the board, than some of its much larger competitors.

CompetitorMistral Wins (%)Other Model Wins (%)
Claude Sonnet 3.740.0060.00
DeepSeek 3.137.5062.50
GPT-4o50.0050.00
Command-A69.2330.77
Llama 4 Maverick81.8218.18
DomainMistral Win Rate (%)Llama 4 Maverick Win Rate (%)
Coding81.8218.18
Multimodal53.8546.15
English66.6733.33
French71.4328.57
Spanish73.3326.67
German62.5037.50
Arabic64.7135.29

Inference:

Engine: Compatible with open-source inference engines like vLLM
Test Hardware:

  • NVIDIA H100

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.