nvidia / llama-3.2-nv-embedqa-1b-v1

Model Overview

Description

The NVIDIA Retrieval QA Llama3.2 1b Embedding Model is an embedding model optimized for multilingual and crosslingual text question-answering retrieval. This model was evaluated on 26 languages: English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish.

An embedding model is a crucial component of a text retrieval system, as it transforms textual information into dense vector representations. They are typically transformer encoders that process tokens of input text (for example: question, passage) to output an embedding.

This model is ready for commercial use.

NVIDIA Retrieval QA Embedding Model is a part of NVIDIA NeMo Retriever, which provides state-of-the-art, commercially-ready models and microservices, optimized for the lowest latency and highest throughput. It features a production-ready information retrieval pipeline with enterprise support. The models that form the core of this solution have been trained using responsibly selected, auditable data sources. With multiple pre-trained models available as starting points, developers can also readily customize them for their domain-specific use cases, such as Information Technology, Human Resource help assistants, and Research & Development research assistants.

Intended use

The NVIDIA Retrieval QA Llama3.2 1b Embedding model is most suitable for users who want to build a multilingual question and answer application over a large text corpus, leveraging the latest dense retrieval technologies.

License/Terms of use

The use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement and Llama 3.2 is licensed under the Llama 3.2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.

Model Architecture

Architecture Type: Transformer
Network Architecture: Fine-tuned Llama3.2 1b retriever

The NVIDIA Retrieval QA Embedding Model is a transformer encoder - a fine-tuned version of Llama3.2 1b, with 16 layers and an embedding size of 2048, which is trained on public datasets. The AdamW optimizer is employed incorporating 100 warm up steps and 5e-6 learning rate with WarmupDecayLR scheduler. Embedding models for text retrieval are typically trained using a bi-encoder architecture. This involves encoding a pair of sentences (for example, query and chunked passages) independently using the embedding model. Contrastive learning is used to maximize the similarity between the query and the passage that contains the answer, while minimizing the similarity between the query and sampled negative passages not useful to answer the question.

Input

Input Type: Text
Input Format: List of strings
Input Parameter: 1D
Other Properties Related to Input: The model's maximum context length is 512 tokens. Texts longer than maximum length must either be chunked or truncated.

Output

Output Type: Floats
Output Format: List of float arrays
Output: Model outputs embedding vectors of dimension 2048 for each text string.
Other Properties Related to Output: N/A

Software Integration

Runtime Engine: NeMo Retriever Text Embedding NIM
Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere, NVIDIA Hopper, NVIDIA Lovelace
Supported Operating System(s): Linux

Model Version(s)

NVIDIA Retrieval QA Llama 3.2 1B Embedding v1
Short name: llama-3.2-nv-embedqa-1B-v1

Training Dataset & Evaluation

Training Dataset

Properties: Semi-supervised pre-training on 100M samples from public datasets and fine-tuning on 1M samples from public datasets.

Data Collection Method by dataset: Automated, Unknown

Labeling Method by dataset: Automated, Unknown

Evaluation Results

Properties: We evaluated the NVIDIA Retrieval QA Embedding Model in comparison to literature open & commercial retriever models on academic benchmarks for question-answering - NQ, HotpotQA and FiQA (Finance Q&A) from BeIR benchmark and TechQA dataset. Note that the model was evaluated offline on A100 GPUs using the model's PyTorch checkpoint. In this benchmark, the metric used was Recall@5.

Open & Commercial Retrieval Models	Average Recall@5 on NQ, HotpotQA, FiQA, TechQA dataset
llama-3.2-nv-embedqa-1B-v1	68.97%
NV-EmbedQA-Mistral-7B-v2	72.97%
NV-EmbedQA-Mistral-7B-v1	64.93%
NV-EmbedQA-E5-v5	62.07%
NV-EmbedQA-E5-v4	57.65%
E5-large-unsupervised	48.03%
BM25	44.67%

We evaluated the multilingual capabilities on the academic benchmark MIRACL across 15 languages and translated the English and Spanish version of MIRACL into additional 11 languages. The reported scores are based on an internal version of MIRACL by selecting hard negatives for each query to reduce the corpus size.

Open & Commercial Retrieval Models	Average Recall@5 on multilingual
llama-3.2-nv-embedqa-1B-v1	60.07%
NV-EmbedQA-Mistral-7B-v2	50.42%
BM25	26.51%

We evaluated the cross-lingual capabilities on the academic benchmark MLQA based on 7 languages (Arabic, Chinese, English, German, Hindi, Spanish, Vietnamese). We consider only evaluation datasets when the query and documents are in different languages. We calculate the average Recall@5 across the 42 different language pairs.

Open & Commercial Retrieval Models	Average Recall@5 on MLQA dataset with different languages
llama-3.2-nv-embedqa-1B-v1	78.77%
NV-EmbedQA-Mistral-7B-v2	68.38%
BM25	13.01%

Data Collection Method by dataset: Unknown

Labeling Method by dataset: Unknown

Properties: The evaluation datasets are based on the MTEB/BEIR TextQA and TechQA, which are 4 public datasets and MIRACL. The size ranges between 10,000s up to 5M depending on the dataset.

Inference
Engine: TensorRT
Test Hardware: H100 PCIe/SXM, A100 PCIe/SXM, L40s, L4, and A10G

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

For more detailed information on ethical considerations for this model, please see the Model Card++ tab for the Explainability, Bias, Safety & Security, and Privacy subcards.

Please report security vulnerabilities or NVIDIA AI Concerns here.