Overview

NeMo Retrieval NIM API endpoints provide easy access to models to perform semantic search of enterprise data and deliver highly precise answers. Developers use these APIs, which are organized as a collection of NIMs, to create robust copilots, chatbots, and AI assistants from start to finish. NeMo Retriever NIMs enhance text question-answering retrieval and increase accuracy by reranking possible candidates.

Models

baai

Model	Endpoint
baai / bge-m3	Creates an embedding vector from the input text (bge-m3)
baai / bge-m3	Gets the result of an earlier function invocation request that returned a status of 202 (bge-m3)

nvidia

Model	Endpoint
nvidia / embed-qa-4	Create embedding vector (embed-qa-4)
nvidia / llama-3.2-nemoretriever-1b-vlm-embed-v1	Creates an embedding vector from the input text (llama-3.2-nemoretriever-1b-vlm-embed-v1)
nvidia / llama-3.2-nemoretriever-300m-embed-v1	Creates an embedding vector from the input text (llama-3.2-nemoretriever-300m-embed-v1)
nvidia / llama-3.2-nemoretriever-300m-embed-v2	Creates an embedding vector from the input text (llama-3.2-nemoretriever-300m-embed-v2)
nvidia / llama-3.2-nemoretriever-500m-rerank-v2	Rank passages by their relation to a query (llama-3.2-nemoretriever-500m-rerank-v2)
nvidia / llama-3.2-nv-embedqa-1b-v2	Creates an embedding vector from the input text (llama-3.2-nv-embedqa-1b-v2)
nvidia / llama-3.2-nv-rerankqa-1b-v1	Rank passages by their relation to a query (llama-3.2-nv-rerankqa-1b-v1)
nvidia / llama-3.2-nv-rerankqa-1b-v2	Rank passages by their relation to a query (llama-3.2-nv-rerankqa-1b-v2)
nvidia / llama-nemotron-embed-1b-v2	Creates an embedding vector from the input text (llama-nemotron-embed-1b-v2)
nvidia / llama-nemotron-embed-vl-1b-v2	Creates an embedding vector from the input text (llama-nemotron-embed-vl-1b-v2)
nvidia / llama-nemotron-rerank-1b-v2	Rank passages by their relation to a query (llama-nemotron-rerank-1b-v2)
nvidia / llama-nemotron-rerank-vl-1b-v2	Rank passages by their relation to a query (llama-nemotron-rerank-vl-1b-v2)
nvidia / nemotron-3-embed-1b	Creates an embedding vector from the input text. (nemotron-3-embed-1b)
nvidia / nvclip	Creates an embedding vector representing the input text or image (nvclip)
nvidia / nv-embed-v1	Creates an embedding vector from the input text (nv-embed-v1)
nvidia / nv-embedcode-7b-v1	Creates an embedding vector from the input text (nv-embedcode-7b-v1)
nvidia / nv-embedqa-e5-v5	Creates an embedding vector from the input text (nv-embedqa-e5-v5)
nvidia / nv-rerankqa-mistral-4b-v3	Rank passages by their relation to a query (nv-rerankqa-mistral-4b-v3)
nvidia / rerank-qa-mistral-4b	Create ranking (rerank-qa-mistral-4b)

snowflake

Model	Endpoint
snowflake / arctic-embed-l	Creates an embedding vector from the input text (arctic-embed-l)
snowflake / arctic-embed-l	Gets the result of an earlier function invocation request that returned a status of 202 (arctic-embed-l)