Creates a model response for the given chat conversation.

post

https://integrate.api.nvidia.com/v1/chat/completions

Given a list of messages comprising a conversation, the model will return a response. Compatible with OpenAI. See https://platform.openai.com/docs/api-reference/chat/create

Recent Requests

Time	Status	User Agent
Retrieving recent requests…

Loading…

Body Params

OpenAI ChatCompletionRequest

model

string

Defaults to nvidia/usdcode-llama-3.1-70b-instruct

max_tokens

integer

1 to 2048

Defaults to 1024

The maximum number of tokens to generate in any given call. Note that the model is not aware of this value, and generation will simply stop at the number of tokens specified.

stream

boolean

Defaults to false

If set, partial message deltas will be sent. Tokens will be sent as data-only server-sent events (SSE) as they become available (JSON responses are prefixed by data: ), with the stream terminated by a data: [DONE] message.

temperature

number

0 to 1

Defaults to 0.1

The sampling temperature to use for text generation. The higher the temperature value is, the less deterministic the output text will be. It is not recommended to modify both temperature and top_p in the same call.

top_p

number

≤ 1

Defaults to 1

The top-p sampling mass used for text generation. The top-p value determines the probability mass that is sampled at sampling time. For example, if top_p = 0.2, only the most likely tokens (summing to 0.2 cumulative probability) will be sampled. It is not recommended to modify both temperature and top_p in the same call.

expert_type

enum

Defaults to auto

The type of expert to use for the completion. Users can choose a value among knowledge, code, helperfunction, and auto. When knowledge is passed, the model will answer with USD knowledge expert. When code is selected, the model will respond with vanilla OpenUSD code. If helperfunction is chosen, it will use high-level helper functions to produce the code response. When auto is set, the LLM will determine which expert type to use. If not specified, the default expert for the model will be used.

Allowed:

messages

string | array

required

A list of messages comprising the conversation so far.

Responses

200Successful Response

402Payment Required

422Validation Error