Model Overview
Description:
Mistral-NeMo is a Large Language Model (LLM) composed of 12B parameters, trained jointly by Mistral AI and NVIDIANvidia. It significantly outperforms existing models smaller or similar in size.
Key features
- Released under the Apache 2 License
- Pre-trained and instructed versions
- Trained with a 128k context window
- Comes with a FP8 quantized version with no accuracy loss
- Trained on a large proportion of multilingual and code data
- Drop-in replacement of Mistral 7B
Model Architecture
Mistral-NeMo is a transformer model, with the following architecture choices:
- Layers: 40
- Dim: 5,120
- Head dim: 128
- Hidden dim: 14,436
- Activation Function: SwiGLU
- Number of heads: 32
- Number of kv-heads: 8 (GQA)
- Rotary embeddings (theta = 1M)
- Vocabulary size: 2**17 ~= 128k
Benchmarks
Main benchmarks
- HellaSwag (0-shot): 83.5%
- Winogrande (0-shot): 76.8%
- OpenBookQA (0-shot): 60.6%
- CommonSenseQA (0-shot): 70.4%
- TruthfulQA (0-shot): 50.3%
- MMLU (5-shot): 68.0%
- TriviaQA (5-shot): 73.8%
- NaturalQuestions (5-shot): 31.2%
Multilingual benchmarks
- MMLU
- French: 62.3%
- German: 62.7%
- Spanish: 64.6%
- Italian: 61.3%
- Portuguese: 63.3%
- Russian: 59.2%
- Chinese: 59.0%
- Japanese: 59.0%
Instruct benchmarks
- MT Bench (dev): 7.84
- MixEval Hard: 0.534
- IFEval-v5: 0.629
- Wildbench: 42.57
Terms of use
By using this software or model, you are agreeing to the terms and conditions of the license, acceptable use policy and Mistral's privacy policy. Mistral-7B is released under the Apache 2.0 license
References(s):
Mistral 7B Blogpost
Model Architecture:
Architecture Type: Transformer
Network Architecture: Mistral
Model Version: 0.1
Input
- Input Type: Text
- Input Format: String
- Input Parameters: max_tokens, temperature, top_p, stop, frequency_penalty, presence_penalty, seed
Output
- Output Type: Text
- Output Format: String
Software Integration:
- Supported Hardware Platform(s): NVIDIA Hopper
- Preferred Operating System(s): Linux
Inference
Engine: TensorRT-LLM
Test Hardware: H100