Model Overview

Description:

Mistral-NeMo-Minitron-8B-Instruct is a model for generating responses for various text-generation tasks including roleplaying, retrieval augmented generation, and function calling. It is a fine-tuned version of nvidia/Mistral-NeMo-Minitron-8B-Base, which was pruned and distilled from Mistral-NeMo 12B using our LLM compression technique. The model was trained using a multi-stage SFT and preference-based alignment technique with NeMo Aligner. For details on the alignment technique, please refer to the Nemotron-4 340B Technical Report. The model supports a context length of 8,192 tokens.

License/Terms of Use:

NVIDIA Open Model License

Model Architecture:

Architecture Type: Transformer

Network Architecture: Decoder-only

Input:

Input Type(s): Text (Prompt)

Input Format(s): String

Input Parameters: One Dimensional (1D)

Other Properties Related to Input: The model has a maximum of 8192 input tokens.

Output:

Output Type(s): Text (Response)

Output Format: String

Output Parameters: 1D

Other Properties Related to Output: The model has a maximum of 8192 input tokens. Maximum output for both versions can be set apart from input.

Prompt Format:

We recommend using the following prompt template, which was used to fine-tune the model. The model may not perform optimally without it.

<extra_id_0>System
{system prompt}

<extra_id_1>User
{prompt}
<extra_id_1>Assistant\n

Note that a newline character \n should be added at the end of the prompt.
We recommend using <extra_id_1> as a stop token.

Evaluation Results

Category	Benchmark	# Shots	Mistral-NeMo-Minitron-8B-Instruct
General	MMLU	5	70.4
	MT Bench (GPT4-Turbo)	0	7.86
Math	GMS8K	0	87.1
Reasoning	GPQA	0	31.5
Code	HumanEval	0	71.3
	MBPP	0	72.5
Instruction Following	IFEval	0	84.4
Tool Use	BFCL v2 Live	0	67.6

Software Integration: (Cloud)

Runtime Engine: NeMo Framework 24.09

Supported Hardware Microarchitecture Compatibility:

[NVIDIA Ampere]
[NVIDIA Blackwell]
[NVIDIA Hopper]
[NVIDIA Lovelace]

[Preferred/Supported] Operating System(s):

Linux

Model Version(s)

Mistral-NeMo-Minitron 8B Instruct

Training & Evaluation:

Training Dataset:

** Data Collection Method by dataset

Hybrid: Automated, Human

** Labeling Method by dataset

Hybrid: Automated, Human

Evaluation Dataset:

** Data Collection Method by dataset

Hybrid: Automated, Human

** Labeling Method by dataset

Human

Inference:

Engine: TRT-LLM

Test Hardware:

A100
A10G
H100
L40S

Supported Hardware Platform(s): L40S, A10G, A100, H100

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.

Please report security vulnerabilities or NVIDIA AI Concerns here.