Creates a model response for the given chat conversation.

Given a list of messages comprising a conversation, the model will return a response. Compatible with OpenAI. See https://platform.openai.com/docs/api-reference/chat/create

Log in to see full request history
Body Params
string
Defaults to nvidia/llama3-chatqa-1.5-8b
messages
array of objects
required
length ≥ 1

A list of messages comprising the conversation so far. The conversation optionally starts with one or more context or context_asset_id messages. The roles of the messages after that must be alternating between user and assistant. The last input message should have role user with content populated.

Messages*
string
required

The role of the message author.

Defaults to what is the percentage change of the net income from Q4 FY23 to Q4 FY24?

The contents of the message.

number
0 to 1
Defaults to 0.2

The sampling temperature to use for text generation. The higher the temperature value is, the less deterministic the output text will be. It is not recommended to modify both temperature and top_p in the same call.

number
0 to 1
Defaults to 0.7

The top-p sampling mass used for text generation. The top-p value determines the probability mass that is sampled at sampling time. For example, if top_p = 0.2, only the most likely tokens (summing to 0.2 cumulative probability) will be sampled. It is not recommended to modify both temperature and top_p in the same call.

Defaults to 0

Indicates how much to penalize new tokens based on their existing frequency in the text so far, decreasing model likelihood to repeat the same line verbatim. https://platform.openai.com/docs/api-reference/completions/create for details. Higher values increase the penalty.

Defaults to 0

Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

integer
1 to 1024
Defaults to 1024

The maximum number of tokens to generate in any given call. Note that the model is not aware of this value, and generation will simply stop at the number of tokens specified.

Defaults to null

If specified, our system will make a best effort to sample deterministically, such that repeated requests with the same seed and parameters should return the same result.

boolean
Defaults to false

If set, partial message deltas will be sent. Tokens will be sent as data-only server-sent events (SSE) as they become available (JSON responses are prefixed by data: ), with the stream terminated by a data: [DONE] message.

Responses

Language
Credentials
Click Try It! to start a request and see the response here! Or choose an example:
application/json
text/event-stream