nvidia / llama-3.2-nv-rerankqa-1b-v1

Model Overview

Description

The NVIDIA Retrieval QA Llama 1B Reranking Model is a model optimized for providing a logit score that represents how relevant a document(s) is to a given query. The model was fine-tuned for multilingual and cross-lingual text question-answering retrieval. This model was evaluated on 26 languages: English, Arabic, Bengali, Chinese, Czech, Danish, Dutch, Finnish, French, German, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Norwegian, Persian, Polish, Portuguese, Russian, Spanish, Swedish, Thai, Turkish.

The ranking model is a component in a text retrieval system to improve the overall accuracy. A text retrieval system often uses an embedding model (dense) or lexical search (sparse) index to return relevant text passages given the input. A ranking model can be used to rerank the potential candidate into a final order. The ranking model has the question-passage pairs as an input and therefore, can process cross attention between the words. It’s not feasible to apply a Ranking model on all documents in the knowledge base, therefore, ranking models are often deployed in combination with embedding models.

This model is ready for commercial use.

NVIDIA Retrieval QA Llama 1B Reranking Model is a part of NVIDIA NeMo Retriever, which provides state-of-the-art, commercially-ready models and microservices, optimized for the lowest latency and highest throughput. It features a production-ready information retrieval pipeline with enterprise support. The models that form the core of this solution have been trained using responsibly selected, auditable data sources. With multiple pre-trained models available as starting points, developers can also readily customize them for their domain-specific use cases, such as Information Technology, Human Resource help assistants, and Research & Development research assistants.

License/Terms of use

The use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement and Llama 3.2 is licensed under the Llama 3.2 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.

Intended use

The NVIDIA Retrieval QA Ranking model is most suitable for users who want to improve their multilingual retrieval tasks by reranking a set of candidates for a given question.

Model Architecture: Llama-3.2 1B Ranker

Architecture Type: Transformer
Network Architecture: Fine-tuned meta-llama/Llama-3.2-1B

The NVIDIA Retrieval QA Ranking Model is a transformer encoder fine-tuned for contrastive learning. We employ bi-directional attention when fine-tuning for higher accuracy. The last embedding output by the decoder model is used with a mean pooling strategy, and a binary classification head is fine-tuned for the ranking task.

Ranking models for text ranking are typically trained as a cross-encoder for sentence classification. This involves predicting relevancy of a sentence pair (for example, question and chunked passages). The CrossEntropy loss is used to maximize the likelihood for passages containing information to answer the question and minimize the likelihood for (negative) passages which do not contain information to answer the question.

We train the model on public datasets described in the Dataset and Training section.

Input

Input Type: Pair of Texts
Input Format: List of text pairs
Input Parameters: 1D
Other Properties Related to Input: The model's maximum context length is 512 tokens and trained on question and answering over text documents. Texts longer than maximum length must either be chunked or truncated.

Output
Output Type: Floats
Output Format: List of floats
Output Parameters: 1D
Other Properties Related to Output: Each the probability score (or raw logits). Users can decide to implement a Sigmoid activation function applied to the logits in their usage of the model.

Software Integration

Runtime: NeMo Retriever Text Reranking NIM
Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere, NVIDIA Hopper, NVIDIA Lovelace
Supported Operating System(s): Linux

Model Version(s)

NVIDIA Retrieval QA Llama 3.2 1B Reranking v1
Short name: llama-3.2-nv-rerankqa-1B-v1

Training Dataset & Evaluation

Training Dataset

Data Collection Method by dataset: Automated, Unknown

Labeling Method by dataset: Automated, Unknown

Properties: This model was trained on 800k samples from public datasets.

Evaluation Results

We evaluate the pipelines on a set of evaluation benchmarks. We applied the ranking model on the candidates retrieved from a retrieval embedding model.

Overall, the pipeline llama-3.2-nv-embedqa-1B-v1 + llama-3.2-nv-rerankqa-1B-v1 provides high BEIR+TechQA accuracy with multilingual and crosslingual support. The llama-3.2-nv-rerankqa-1B-v1 ranking model is 3.5x smaller than the NV-RerankQA-Mistral-4B-v3 model.

We evaluated the NVIDIA Retrieval QA Embedding Model in comparison to literature open & commercial retriever models on academic benchmarks for question-answering - NQ, HotpotQA and FiQA (Finance Q&A) from BeIR benchmark and TechQA dataset. In this benchmark, the metric used was Recall@5. As described, we need to apply the ranking model on the output of an embedding model.

Open & Commercial Reranker Models	Average Recall@5 on NQ, HotpotQA, FiQA, TechQA dataset
llama-3.2-nv-embedqa-1B-v1 + llama-3.2-nv-rerankqa-1B-v1	72.16%
llama-3.2-nv-embedqa-1B-v1	68.97%
NV-EmbedQA-E5-v5 + NV-RerankQA-Mistral-4B-v3	75.45%
NV-EmbedQA-E5-v5	62.07%
NV-EmbedQA-E5-v4	57.65%
E5-Large_unsupervised	48.03%
BM25	44.67%

We evaluated the model’s multilingual capabilities on the MIRACL academic benchmark - a multilingual retrieval dataset, across 15 languages, and on an additional 11 languages that were translated from the English and Spanish versions of MIRACL. The reported scores are based on a custom subsampled version by selecting hard negatives for each query to reduce the corpus size.

Open & Commercial Retrieval Models	Average Recall@5 on MIRACL multilingual datasets
llama-3.2-nv-embedqa-1B-v1 + llama-3.2-nv-rerankqa-1B-v1	66.21%
llama-3.2-nv-embedqa-1B-v1	60.07%
NV-EmbedQA-Mistral-7B-v2	50.42%
BM25	26.51%

We evaluated the cross-lingual capabilities on the academic benchmark MLQA based on 7 languages (Arabic, Chinese, English, German, Hindi, Spanish, Vietnamese). We consider only evaluation datasets when the query and documents are in different languages. We calculate the average Recall@5 across the 42 different language pairs.

Open & Commercial Retrieval Models	Average Recall@5 on MLQA dataset with different languages
llama-3.2-nv-embedqa-1B-v1 + llama-3.2-nv-rerankqa-1B-v1	85.77%
llama-3.2-nv-embedqa-1B-v1	78.77%
NV-EmbedQA-Mistral-7B-v2	68.38%
BM25	13.01%

Data Collection Method by dataset:
Unknown

Labeling Method by dataset:
Unknown

Properties
The evaluation datasets are based on three MTEB/BEIR TextQA datasets, the TechQA dataset, and MIRACL multilingual retrieval datasets, which are all public datasets. The sizes range between 10,000s up to 5M depending on the dataset.

Inference
Engine: TensorRT
Test Hardware: H100 PCIe/SXM, A100 PCIe/SXM, L40s, L4, and A10G

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

For more detailed information on ethical considerations for this model, please see the Model Card++ tab for the Explainability, Bias, Safety & Security, and Privacy subcards.

Please report security vulnerabilities or NVIDIA AI Concerns here.