Creates a model response for the given chat conversation.

post

https://integrate.api.nvidia.com/v1/chat/completions

Given a list of messages comprising a conversation, the model will return a response. Compatible with OpenAI. See https://platform.openai.com/docs/api-reference/chat/create

Recent Requests

Time	Status	User Agent
Retrieving recent requests…

Loading…

Body Params

model

string

Defaults to nvidia/llama-3.1-nemoguard-8b-content-safety

messages

array of objects

required

A list of messages comprising the conversation so far. The roles of the messages must be alternating between user and assistant. The last input message should have role user. A message with the the system role is optional, and must be the very first message if it is present; context is also optional, but must come before a user question.

Messages*

stream

boolean

Defaults to false

If set, partial message deltas will be sent. Tokens will be sent as data-only server-sent events (SSE) as they become available (JSON responses are prefixed by data: ), with the stream terminated by a data: [DONE] message.

Headers

string

enum

Defaults to application/json

Generated from available response content types

Allowed:

Responses

200Invocation is fulfilled

202Result is pending. Client should poll using the requestId.

422Validation failed, provided entity could not be processed.

500The invocation ended with an error.