nvidia / nemotron-mini-4b-instruct

Model Overview

Description:

Nemotron-Mini-4B Instruct is a model for generating responses for roleplaying, retrieval augmented generation, and function calling. It is a small language model (SLM) optimized through distillation, pruning and quantization for speed and on-device deployment. VRAM usage has been minimized to approximately 2 GB, providing significantly faster time to first token compared to LLMs.

This model is ready for commercial use.

License/Terms of Use:

NVIDIA AI Foundation Models Community License Agreement

References

Please refer to the User Guide to use the model and use a suggested guideline for prompts.

Model Architecture:

Architecture Type: Transformer

Network Architecture: Decoder-only

Limitations

The model was trained on data that contains toxic language and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. This issue could be exacerbated without the use of the recommended prompt template. This issue could be exacerbated without the use of the recommended prompt template.

Input:

Input Type(s): Text (Prompt)

Input Format(s): String

Input Parameters: One Dimensional (1D)

Other Properties Related to Input: The model has a maximum of 4096 input tokens.

Output:

Output Type(s): Text (Response)

Output Format: String

Output Parameters: 1D

Other Properties Related to Output: The model has a maximum of 4096 input tokens. Maximum output for both versions can be set apart from input.

Prompt Format:

We recommend using the following prompt template, which was used to fine-tune the model. The model may not perform optimally without it.

Single Turn

<extra_id_0>System
{system prompt}

<extra_id_1>User
{prompt}
<extra_id_1>Assistant\n

Tool use

<extra_id_0>System
{system prompt}

<tool> ... </tool>
<context> ... </context>

<extra_id_1>User
{prompt}
<extra_id_1>Assistant
<toolcall> ... </toolcall>
<extra_id_1>Tool
{tool response}
<extra_id_1>Assistant\n

Software Integration: (On-Device)

Runtime(s): AI Inference Manager (NVAIM) Version 1.0.0

Toolkit: NVAIM

See this document for details on how to integrate the model into NVAIM.

Supported Hardware Platform(s): GPU supporting DirectX 11/12 and Vulkan 1.2 or higher

[Preferred/Supported] Operating System(s):

Windows

Software Integration: (Cloud)

Toolkit: NVIDIA NIM

See this document for details on how to integrate the model into NVAIM.

[Preferred/Supported] Operating System(s):

Linux

Model Version(s)

Nemotron-4-4B-instruct

Training & Evaluation:

Training Dataset:

** Data Collection Method by dataset

Hybrid: Automated, Human

** Labeling Method by dataset

Hybrid: Automated, Human

Properties (Quantity, Dataset Descriptions, Sensor(s)):

Trained on approximately 10000 Game/Non-Playable Character (NPC) dialog turns from domain chat data.

Evaluation Dataset:

** Data Collection Method by dataset

Hybrid: Automated, Human

** Labeling Method by dataset

Human

Properties (Quantity, Dataset Descriptions, Sensor(s)):

Evaluated on approximately Game/NPC 1000 dialog turns from domain chat data.

Inference:

Engine: TRT-LLM

Test Hardware [Name the specific test hardware model]:

A100
A10g
H100
L40s

Supported Hardware Platform(s): L40s, A10g, A100, H100

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.