Model Overview
Description:
Nemotron-4-Mini-Hindi-4B-Instruct is a chat model for generating responses for chat application and retrieval augmented generation in Hindi. It is a aligned version of Nemotron-4-Mini-Hindi-4B-Base . It is a small language model (SLM) optimized through distillation, pruning, and quantization for speed and on-device deployment. VRAM usage has been minimized to approximately 2 GB, providing significantly faster time to first token compared to LLMs.
This model is ready for commercial use.
License/Terms of Use:
References
Please refer to the User Guide to use the model and use a suggested guideline for prompts.
Model Architecture:
Architecture Type: Transformer
Network Architecture: Decoder-only
Limitations
The model was trained on data that contains toxic language and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive. The model may answer with I statements, exhibiting some anthropomorphizing. This issue could be exacerbated without the use of the recommended prompt template.
Input:
Input Type(s): Text (Prompt)
Input Format(s): String
Input Parameters: One Dimensional (1D)
Other Properties Related to Input: The model has a maximum of 4096 input tokens.
Output:
Output Type(s): Text (Response)
Output Format: String
Output Parameters: 1D
Other Properties Related to Output: The model has a maximum of 4096 input tokens. Maximum output for both versions can be set apart from input.
Prompt Format:
We recommend using the following prompt template, which was used to fine-tune the model. The model may not perform optimally without it.
Single Turn
<extra_id_0>System
{system prompt}
<extra_id_1>User
{prompt}
<extra_id_1>Assistant\n
- Note that a newline character
\n
should be added at the end of the prompt. - We recommend using
<extra_id_1>
as a stop token.
Evaluation Results
Category | Hindi Benchmark | # Shots | Nemotron-4-Mini-Hindi-4B-Instruct |
---|---|---|---|
General | MMLU | 0 | 50.5 |
ARC-C | 0 | 65.53 | |
ARC-E | 0 | 79.97 | |
Hella Swag | 0 | 39.9 | |
BoolQ | 0 | 67.86 | |
Chat | IndicQuest (GPT4-Turbo) | 0 | 4.15 |
Software Integration: (On-Device)
Runtime(s): AI Inference Manager (NVAIM) Version 1.0.0
Toolkit: NVAIM
See this document for details on how to integrate the model into NVAIM.
Supported Hardware Platform(s): GPU supporting DirectX 11/12 and Vulkan 1.2 or higher
[Preferred/Supported] Operating System(s):
- Windows
Software Integration: (Cloud)
Toolkit: NVIDIA NIM
See this document for details on how to integrate the model into NVAIM.
[Preferred/Supported] Operating System(s):
- Linux
Model Version(s)
Nemotron-4-Mini-Hindi-4B-Instruct
Training & Evaluation Datasets:
Training Dataset:
** Data Collection Method by dataset
- Hybrid: Automated, Human
** Labeling Method by dataset
- Hybrid: Automated, Human
Properties (Quantity, Dataset Descriptions, Sensor(s)):
Trained of general Supervised Fine-Tuning (SFT) data followed by DPO on general and translated corpus.
Evaluation Dataset:
** Data Collection Method by dataset
- Hybrid: Automated, Human
** Labeling Method by dataset
- Human
Properties (Quantity, Dataset Descriptions, Sensor(s)):
12 benchmark datasets including IndicXtreme and other translated English Benchmarks like MMLU and Hellaswag.
Inference:
Engine: TRT-LLM
Test Hardware [Name the specific test hardware model]:
- A100
- A10g
- H100
- L40s
Supported Hardware Platform(s): A10g, A100, L40s, H100
Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.