nv-mistralai / mistral-nemo-12b-instruct

Model Overview

Description:

Mistral-NeMo is a Large Language Model (LLM) composed of 12B parameters, trained jointly by Mistral AI and NVIDIANvidia. It significantly outperforms existing models smaller or similar in size.

Key features

  1. Released under the Apache 2 License
  2. Pre-trained and instructed versions
  3. Trained with a 128k context window
  4. Comes with a FP8 quantized version with no accuracy loss
  5. Trained on a large proportion of multilingual and code data
  6. Drop-in replacement of Mistral 7B

Model Architecture

Mistral-NeMo is a transformer model, with the following architecture choices:

  • Layers: 40
  • Dim: 5,120
  • Head dim: 128
  • Hidden dim: 14,436
  • Activation Function: SwiGLU
  • Number of heads: 32
  • Number of kv-heads: 8 (GQA)
  • Rotary embeddings (theta = 1M)
  • Vocabulary size: 2**17 ~= 128k

Benchmarks

Main benchmarks

  • HellaSwag (0-shot): 83.5%
  • Winogrande (0-shot): 76.8%
  • OpenBookQA (0-shot): 60.6%
  • CommonSenseQA (0-shot): 70.4%
  • TruthfulQA (0-shot): 50.3%
  • MMLU (5-shot): 68.0%
  • TriviaQA (5-shot): 73.8%
  • NaturalQuestions (5-shot): 31.2%

Multilingual benchmarks

  • MMLU
    • French: 62.3%
    • German: 62.7%
    • Spanish: 64.6%
    • Italian: 61.3%
    • Portuguese: 63.3%
    • Russian: 59.2%
    • Chinese: 59.0%
    • Japanese: 59.0%

Instruct benchmarks

  • MT Bench (dev): 7.84
  • MixEval Hard: 0.534
  • IFEval-v5: 0.629
  • Wildbench: 42.57

Terms of use

By using this software or model, you are agreeing to the terms and conditions of the license, acceptable use policy and Mistral's privacy policy. Mistral-7B is released under the Apache 2.0 license

References(s):

Mistral 7B Blogpost

Model Architecture:

Architecture Type: Transformer

Network Architecture: Mistral

Model Version: 0.1

Input

  • Input Type: Text
  • Input Format: String
  • Input Parameters: max_tokens, temperature, top_p, stop, frequency_penalty, presence_penalty, seed

Output

  • Output Type: Text
  • Output Format: String

Software Integration:

  • Supported Hardware Platform(s): NVIDIA Hopper
  • Preferred Operating System(s): Linux

Inference

Engine: TensorRT-LLM

Test Hardware: H100