Creates a model response for the given chat conversation.

Given a list of messages comprising a conversation, the model will return a response. Compatible with OpenAI. See https://platform.openai.com/docs/api-reference/chat/create

Log in to see full request history
timestatususer agent
Retrieving recent requests…
LoadingLoading…
Body Params
string
Defaults to meta/llama-3.3-70b-instruct
messages
array of objects
required

A list of messages comprising the conversation so far. The roles of the messages must be alternating between user and assistant. The last input message should have role user. A message with the the system role is optional, and must be the very first message if it is present; context is also optional, but must come before a user question.

Messages*
string
required

The role of the message author.

required

The contents of the message.

The id of the tool call.

The tool(s) called by the model.

number
0 to 1
Defaults to 0.2

The sampling temperature to use for text generation. The higher the temperature value is, the less deterministic the output text will be. It is not recommended to modify both temperature and top_p in the same call.

number
≤ 1
Defaults to 0.7

The top-p sampling mass used for text generation. The top-p value determines the probability mass that is sampled at sampling time. For example, if top_p = 0.2, only the most likely tokens (summing to 0.2 cumulative probability) will be sampled. It is not recommended to modify both temperature and top_p in the same call.

number
-2 to 2
Defaults to 0

Indicates how much to penalize new tokens based on their existing frequency in the text so far, decreasing model likelihood to repeat the same line verbatim.

number
-2 to 2
Defaults to 0

Positive values penalize new tokens based on whether they appear in the text so far, increasing model likelihood to talk about new topics.

integer
1 to 4096
Defaults to 1024

The maximum number of tokens to generate in any given call. Note that the model is not aware of this value, and generation will simply stop at the number of tokens specified.

boolean
Defaults to false

If set, partial message deltas will be sent. Tokens will be sent as data-only server-sent events (SSE) as they become available (JSON responses are prefixed by data: ), with the stream terminated by a data: [DONE] message.

A string or a list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.

Responses

Language
Credentials
Click Try It! to start a request and see the response here! Or choose an example:
application/json
text/event-stream