Creates an embedding vector from the input text.

Time	Status	User Agent
Retrieving recent requests…

Time

Status

User Agent

Retrieving recent requests…

input

required

Input text to embed. Max length is 32k tokens.

model

string

required

ID of the embedding model.

input_type

string

enum

nvidia/llama-3.2-nemoretriever-300m-embed-v2 operates in passage or query mode, and thus require the input_type parameter. passage is used when generating embeddings during indexing. query is used when generating embeddings during querying. It is very important to use the correct input_type. Failure to do so will result in large drops in retrieval accuracy.

Allowed:

encoding_format

string

enum

Defaults to float

The format to return the embeddings in.

Allowed:

truncate

string

enum

Defaults to NONE

Specifies how inputs longer than the maximum token length of the model are handled. Passing START discards the start of the input. END discards the end of the input. In both cases, input is discarded until the remaining input is exactly the maximum input token length for the model. If NONE is selected, when the input exceeds the maximum input token length an error will be returned.

Allowed:

user

string

Not implemented, but provided for API compliance. This field is ignored.

200Successful response

400Bad request