Create Completion

post

https://integrate.api.nvidia.com/v1/completions

Completion API similar to OpenAI's API.

See https://platform.openai.com/docs/api-reference/completions/create
for the API specification. This API mimics the OpenAI Completion API.

Body Params

model

string

Defaults to nvidia/mistral-nemo-minitron-8b-base

Name of target model.

prompt

User prompt.

max_tokens

1 to 1024

Maximum number of tokens to generate.

temperature

0 to 2

Control randomness by applying a scaling to the logits; a higher value has increased variety and a lower values makes sampling less diverse.

top_p

0 to 1

Also know as nucleus sampling - the cumulative probability cutoff for token selection. Using a lower value means sampling from a smaller set of candidates. This is done by sorting the logprobs and collecting them one by one until their cumulative sum exceed the top_p score.

stop

Up to 4 sequences where the API will stop generating further tokens. The returned text will not contain the stop sequence.

frequency_penalty

-2 to 2

Indicates how much to penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim. https://platform.openai.com/docs/api-reference/completions/create for details. Higher values increase the penalty.

presence_penalty

-2 to 2

Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

seed

0 to 18446744073709552000

The model generates random results. Changing the input seed alone will produce a different response with similar characteristics. It is possible to reproduce results by fixing the input seed (assuming all other hyperparameters are also fixed).

Responses

200Successful Response

422Validation Error