The NVIDIA Retrieval QA Mistral 7B Embedding model is an embedding model optimized for text question-answering retrieval.
An embedding model is a crucial component of a text retrieval system, as it transforms textual information into dense vector representations. They are typically transformer encoders that process tokens of input text (for example, question, passage) to output an embedding.
This model is ready for commercial use.
NVIDIA Retrieval QA Mistral 7B Embedding model is part of the NVIDIA NeMo Retriever, which provides state-of-the-art, commercially-ready models and microservices, optimized for the lowest latency and highest throughput. It features a production-ready information retrieval pipeline with enterprise support. The models that form the core of this solution have been trained using responsibly selected, auditable data sources. With multiple pre-trained models available as starting points, developers can also readily customize them for their domain-specific use cases, such as Information Technology, Human Resource help assistants, and Research & Development research assistants.
The NVIDIA Retrieval QA Mistral 7B Embedding model is most suitable for users who want to build a question and answer application over a large text corpus, leveraging the latest dense retrieval technologies.
The use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement and the Apache License 2.0.
Technology can have a profound impact on people and the world, and NVIDIA is committed to enabling trust and transparency in AI development. NVIDIA encourages users to adopt principles of AI ethics and trustworthiness to guide your business decisions by following the guidelines in the NVIDIA AI Foundation Models Community License Agreement.
Architecture Type: Transformer
Network Architecture: Fine-tuned Mistral 7B foundation model
The NVIDIA Retrieval QA Mistral 7B Embedding model is a transformer encoder - a fine-tuned version of Mistral 7B, with 32 layers and 4096 as embedding size, which is trained on public datasets. Mistral Models are pre-trained with casual attention. As our research demonstrated that bi-directional attention improved the performance, we converted the model to bi-directional attention. Embedding models for text retrieval are typically trained using a bi-encoder architecture. This involves encoding a pair of sentences (for example, query and chunked passages) independently using the embedding model. Contrastive learning is used to maximize the similarity between the query and the passage that contains the answer, while minimizing the similarity between the query and sampled negative passages not useful to answer the question.
NVIDIA Retrieval QA Mistral 7B Embedding v2
Short name: NV-EmbedQA-Mistral-7B-v2
The development of large-scale public open-QA datasets has enabled tremendous progress in powerful embedding models. However, one popular dataset named MS MARCO restricts commercial licensing, limiting the use of these models in commercial settings. To address this, we created our own training dataset blend based on public QA datasets, which each has a license for commercial applications. The pretrained Mistral-7B-v0.1 embedding model was fine-tuned with contrastive learning with the prefix of “query:” for questions and “passage:” for context passages, using a mixture of commercially-viable public datasets.
The training dataset details are as follows:
Use Case: Information retrieval for question and answering over text documents.
Data Sources: Public datasets licensed for commercial use.
Language: English (US), potential support for other languages (in research)
Volume: 600k samples from public datasets
Data Collection Method by dataset: Unknown
Labeling Method by dataset: Unknown
We evaluated the NVIDIA Retrieval QA Mistral 7B Embedding model in comparison to literature open & commercial retriever models on academic benchmarks for question-answering - NQ, HotpotQA and FiQA(Finance Q&A) from BeIR benchmark, and TechQA dataset. Note that the model was evaluated offline on A100 GPUs using the model's PyTorch checkpoint. In this benchmark, the metric used was Recall@5.
Open & Commercial Retrieval Models | Average Recall@5 on NQ, HotpotQA, FiQA, TechQA dataset |
---|---|
NV-EmbedQA-Mistral-7B-v2 | 72.97% |
NV-EmbedQA-Mistral-7B-v1 | 64.93% |
NV-EmbedQA-E5-v5 | 62.07% |
NV-EmbedQA-E5-v4 | 57.65% |
E5-large-unsupervised | 48.03% |
BM25 | 44.67% |
Data Collection Method by dataset: Unknown
Labeling Method by dataset: Unknown
Properties: The evaluation datasets are based on the MTEB/BEIR TextQA, and TechQA dataset which are 4 public datasets. The size ranges between 10,000s up to 5M depending on the dataset.
Input Type: Text
Input Format: List of strings
Other Properties Related to Input: The model was trained with input length up to 512 tokens, whereas the Mistral-7B model has a theoretical attention span of approximately 131K tokens.
Output Type: Floats
Output Format: List of float arrays
Other Properties Related to Output: Model outputs embedding vectors of dimension 4096 for each text string.
Runtime: NeMo Retriever Text Embedding NIM
Supported Hardware Microarchitecture Compatibility: NVIDIA Ampere, NVIDIA Hopper, NVIDIA Lovelace
Supported Operating System(s): Linux
Engine: TensorRT
Test Hardware: See Support Matrix from NIM documentation.
We evaluated the models optimized for different hardware on a small sample dataset of 600 queries.
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their supporting model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ tab for the Explainability, Bias, Safety & Security, and Privacy subcards. Please report security vulnerabilities or NVIDIA AI Concerns here.