Model Overview

Description:

Mistral-NeMo is a Large Language Model (LLM) composed of 12B parameters, trained jointly by Mistral AI and NVIDIANvidia. It significantly outperforms existing models smaller or similar in size.

Key features

Released under the Apache 2 License
Pre-trained and instructed versions
Trained with a 128k context window
Comes with a FP8 quantized version with no accuracy loss
Trained on a large proportion of multilingual and code data
Drop-in replacement of Mistral 7B

Model Architecture

Mistral-NeMo is a transformer model, with the following architecture choices:

Layers: 40
Dim: 5,120
Head dim: 128
Hidden dim: 14,436
Activation Function: SwiGLU
Number of heads: 32
Number of kv-heads: 8 (GQA)
Rotary embeddings (theta = 1M)
Vocabulary size: 2**17 ~= 128k

Benchmarks

Main benchmarks

HellaSwag (0-shot): 83.5%
Winogrande (0-shot): 76.8%
OpenBookQA (0-shot): 60.6%
CommonSenseQA (0-shot): 70.4%
TruthfulQA (0-shot): 50.3%
MMLU (5-shot): 68.0%
TriviaQA (5-shot): 73.8%
NaturalQuestions (5-shot): 31.2%

Multilingual benchmarks

MMLU
- French: 62.3%
- German: 62.7%
- Spanish: 64.6%
- Italian: 61.3%
- Portuguese: 63.3%
- Russian: 59.2%
- Chinese: 59.0%
- Japanese: 59.0%

Instruct benchmarks

MT Bench (dev): 7.84
MixEval Hard: 0.534
IFEval-v5: 0.629
Wildbench: 42.57

Terms of use

By using this software or model, you are agreeing to the terms and conditions of the license, acceptable use policy and Mistral's privacy policy. Mistral-7B is released under the Apache 2.0 license

References(s):

Mistral 7B Blogpost

Model Architecture:

Architecture Type: Transformer

Network Architecture: Mistral

Model Version: 0.1

Input

Input Type: Text
Input Format: String
Input Parameters: max_tokens, temperature, top_p, stop, frequency_penalty, presence_penalty, seed

Output

Output Type: Text
Output Format: String

Software Integration:

Supported Hardware Platform(s): NVIDIA Hopper
Preferred Operating System(s): Linux

Inference

Engine: TensorRT-LLM

Test Hardware: H100