Model Overview

Description

BGE-M3 is distinguished for its versatility in Multi-Functionality, Multi-Linguality, and Multi-Granularity.

Multi-Functionality: It can simultaneously perform the three common retrieval functionalities of embedding model: dense retrieval, multi-vector retrieval, and sparse retrieval.
Multi-Linguality: It can support more than 100 working languages.
Multi-Granularity: It is able to process inputs of different granularities, spanning from short sentences to long documents of up to 8192 tokens.

Some suggestions for retrieval pipeline in RAG

Authors recommend to use the following pipeline: hybrid retrieval + re-ranking.

Hybrid retrieval leverages the strengths of various methods, offering higher accuracy and stronger generalization capabilities.
A classic example: using both embedding retrieval and the BM25 algorithm.
Now, you can try to use BGE-M3, which supports both embedding and sparse retrieval.
This allows you to obtain token weights (similar to the BM25) without any additional cost when generate dense embeddings.
To use hybrid retrieval, please refer to Vespa and Milvus.
As cross-encoder models, re-ranker demonstrates higher accuracy than bi-encoder embedding model.
Utilizing the re-ranking model (e.g., bge-reranker, bge-reranker-v2) after retrieval can further filter the selected text.

Specs

Model

Model Name	Dimension	Sequence Length	Introduction
BAAI/bge-m3	1024	8192	multilingual; unified fine-tuning (dense, sparse, and colbert) from bge-m3-unsupervised
BAAI/bge-m3-unsupervised	1024	8192	multilingual; contrastive learning from bge-m3-retromae
BAAI/bge-m3-retromae	--	8192	multilingual; extend the max_length of xlm-roberta to 8192 and further pretrained via retromae
BAAI/bge-large-en-v1.5	1024	512	English model
BAAI/bge-base-en-v1.5	768	512	English model
BAAI/bge-small-en-v1.5	384	512	English model

Terms of use

bge-m3 is licensed under the MIT Licence.

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.

References

HuggingFace

Github

Model Architecture

Architecture Type: Transformer

Network Architecture: Fine-tuned XLMRobertaModel

Input

Input Type: Text

Input Format: List of strings

Output

Output Type: Floating Points

Output Format: list of float arrays

Other Properties Related to Output: Each array contains the embeddings for the corresponding input string.

Model Version

BAAI/bge-m3

Supported Operating System(s):

Linux

Training Dataset:

Dataset	Introduction
MLDR	Document Retrieval Dataset, covering 13 languages
bge-m3-data	Fine-tuning data used by bge-m3

Inference:

Engine: TensorRT with Triton

Test Hardware: L40