Model Overview

Description:

Llama 3.1 NemoGuard 8B TopicControl can be used for topical and dialogue moderation of user prompts in human-assistant interactions being designed for task-oriented dialogue agents and custom policy-based moderation.
Given a system instruction (also called topical instruction, i.e. specifying which topics are allowed and disallowed) and a conversation history ending with the last user prompt, the model returns a binary response that flags if the user message respects the system instruction, (i.e. message is on-topic or a distractor/off-topic).
The base large language model (LLM) is the multilingual Llama-3.1-8B-Instruct model from Meta. Llama 3.1 NemoGuard 8B TopicControl is LoRa-tuned on a topic-following dataset generated synthetically with Mixtral-8x7B-Instruct-v0.1.
This model is ready for commercial use.

License/Terms of Use:

Governing NVIDIA Model Hosting Terms (Cloud API) GOVERNING TERMS: This trial service is governed by the NVIDIA API Trial Terms of Service. Use of this model is governed by the NVIDIA Community License Agreement. Additional Information: Llama 3.1 Community License Agreement. Built with Llama.

Reference(s):

Model Architecture:

Architecture Type: Transformer

Network Architecture: The base model architecture is based on the Llama-3.1-8B-Instruct model from Meta (Model Card).
We perform Parameter Efficient FineTuning (PEFT) over the base model using the following network architecture parameters:

Rank: 8
Alpha: 32
Targeted low rank adaptation modules: 'k_proj', 'q_proj', 'v_proj', 'o_proj', 'up_proj', 'down_proj', 'gate_proj'.

Training Method:

The training method for Llama 3.1 NemoGuard 8B TopicControl involves the following concepts:

A system instruction which acts like a topical instruction with the rules that define the context of the user-assistant interaction, i.e. topics allowed or disallowed by the current task-oriented scenario, conversation style and tone, conversation flows.
Any user message in the conversation that respects the topical instruction is considered on-topic, while a user message that contradicts at least one of the rules is a distractor or off-topic.
A synthetic generated dataset, called CantTalkAboutThis-Mixtral-1.0, of approximately 1,000 multi-turn conversations is used to instruction-tune the base model. Each conversation has a specific topical instruction from various broad domains (i.e. customer support, travel, legal) and contains an entire conversation which is on-topic, together with several distractor user messages replacing some of the on-topic ones at specific key points in the conversation.
The model is instruction-tuned to detect whether a user message is either on-topic or a distractor given the topical instruction for the current conversation, with the LLM behaving as a classifier.

Input:

Input Type(s): Text

Input Format(s): String

Input Parameters: 1D (One-Dimensional) List: System prompt with topical instructions, followed by a conversation structured as a list of user and assistant messages.

Other Properties Related to Input: The conversation should end with a user message that is considered for topical moderation given the topical instruction and the context of the entire conversation (previous user and assistant turns). The input format for the system prompt and the conversation respects the OpenAI Chat specification widely adopted in the industry including by NVIDIA AI API.

Sample input:

// User-LLM conversations in the industry-standard payload format for LLM systems:
[
   {
       "role": "system",
       "content": "In the next conversation always use a polite tone and do not engage in any talk about travelling and touristic destinations",
   },
   {
       "role": "user",
       "content": "Hi there!",
   },
   {
       "role": "assistant",
       "content": "Hello! How can I help today?",
   },
   {
       "role": "user",
       "content": "Do you know which is the most popular beach in Barcelona?",
   },
]

Output:

Output Type(s): Text

Output Format: String

Output Parameters: 1D (One-Dimensional)

Other Properties Related to Output: The response is a binary string label determining if the last user turn in the input conversation respects the topical instruction. The label options are either "on-topic" or "off-topic".

Example Model Input/Output:

Input

// User-LLM conversations in the industry-standard payload format for LLM systems:
[
   {
       "role": "system",
       "content": "In the next conversation always use a polite tone and do not engage in any talk about travelling and touristic destinations",
   },
   {
       "role": "user",
       "content": "Hi there!",
   },
   {
       "role": "assistant",
       "content": "Hello! How can I help today?",
   },
   {
       "role": "user",
       "content": "Do you know which is the most popular beach in Barcelona?",
   },
]

Output (Model Response)

off-topic

Software Integration:

Runtime Engine(s): PyTorch

Libraries: Meta's llama-recipes, HuggingFace transformers library, HuggingFace peft library

Supported Hardware Platform(s): NVIDIA Ampere (A100 80GB, A100 40GB)

Preferred/Supported Operating System(s): Linux (Ubuntu)

Model Version(s):

Llama 3.1 NemoGuard 8B TopicControl

Training, Testing, and Evaluation Datasets:

Training Dataset:

Link: CantTalkABoutThis dataset

Data Collection Method by dataset: Synthetic

Labeling Method by dataset: Synthetic

Properties: CantTalkABoutThis topic-following dataset contains 1080 multi-turn conversations that are on-topic using 540 different topical instructions from various domains. For each on-topic conversation, we also generate off-topic/distractor turns at specific points in the conversation (about 4 distractors per conversation).

Testing Dataset:

The performance of the model is tested on a smaller, human-annotated subset of the CantTalkABoutThis topic-following dataset synthetically created test set. The test set contains conversations on a different domain (banking) that does not appear in training or evaluation sets. While on-topic conversations are samples similar to the training dataset, the distractors are human annotated by expert annotators.

Link: CantTalkABoutThis topic-following dataset

Data Collection Method by dataset: Hybrid: Synthetic, Human

Labeling Method by dataset: Hybrid: Synthetic, Human

Properties: We select 20 random dialogues from the synthetic test domain and manually ask two experts in dialogue systems to create five distractors per conversation. Thus, we also provide a small human annotated test set that is both more challenging and reflective of realistic scenarios. The test set contains 100 human annotated distractors and the reminder of the on-topic turns, having 11% of turns as distractors/off-topic.

Evaluation Dataset:

The evaluation set is similar to the training dataset, synthetically generated on-topic conversations and distractors, but in the travel domain (not part of training set).

Link: CantTalkABoutThis evaluation set

Data Collection Method by dataset: Synthetic

Labeling Method by dataset: Synthetic

Properties: We generate 20 multi-turn conversations on 10 different scenarios in the travel domain, each conversation having about 20 turns.

Inference:

Engine: TensorRT-LLM available via NVIDIA NIMs available on NGC.

Test Hardware: A100 80GB

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
For more detailed information on ethical considerations for this model, please see the Model Card++ Explainability, Bias, Safety & Security, and Privacy Subcards.
Please report security vulnerabilities or NVIDIA AI Concerns here.