Create a chat completion

Given a list of messages comprising a conversation, the model will return a response. Compatible with OpenAI. See https://platform.openai.com/docs/api-reference/chat/create

Log in to see full request history

Model Overview

Description:

Mixtral 8x22B is MistralAI's latest open model. It sets a new standard for performance and efficiency within the AI community. It is a sparse Mixture-of-Experts (SMoE) model that uses only 39B active parameters out of 141B, offering unparalleled cost efficiency for its size.

Mixtral 8x22B comes with the following strengths:

  • It is fluent in English, French, Italian, German, and Spanish
  • It has strong mathematics and coding capabilities
  • It is natively capable of function calling; along with the constrained output mode implemented on la Plateforme, this enables application development and tech stack modernisation at scale
  • Its 64K tokens context window allows precise information recall from large documents

Third-Party Community Consideration:

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see Mixtral 8x22b's Model Card.

Terms of use

By using this software or model, you are agreeing to the terms and conditions of the license, acceptable use policy and Mistral's privacy policy. Mixtral-8x22B is released under the Apache 2.0 license

References(s):

Mixtral 8x22B Instruct Model Card on Hugging Face

Cheaper, Better, Faster, Stronger | Mistral AI

Model Architecture:

Architecture Type: Transformer

Network Architecture: Sparse Mixture of GPT-based experts

Model Version: 0.1

Input:

Input Format: Text

Input Parameters: Temperature, Top P, Max Output Tokens

Output:

Output Format: Text

Output Parameters: None

Software Integration:

Supported Hardware Platform(s): Hopper, Ampere, Turing, Ada

Supported Operating System(s): Linux

Inference:

Engine: Triton

Test Hardware: Other

Body Params
string
Defaults to mistralai/mixtral-8x22b-instruct-v0.1
integer
≥ 1
Defaults to 1024

The maximum number of tokens to generate in any given call. Note that the model is not aware of this value, and generation will simply stop at the number of tokens specified.

boolean
Defaults to false

If set, partial message deltas will be sent. Tokens will be sent as data-only server-sent events (SSE) as they become available (JSON responses are prefixed by data: ), with the stream terminated by a data: [DONE] message.

number
0 to 1
Defaults to 0.5

The sampling temperature to use for text generation. The higher the temperature value is, the less deterministic the output text will be. It is not recommended to modify both temperature and top_p in the same call.

number
0 to 1
Defaults to 1

The top-p sampling mass used for text generation. The top-p value determines the probability mass that is sampled at sampling time. For example, if top_p = 0.2, only the most likely tokens (summing to 0.2 cumulative probability) will be sampled. It is not recommended to modify both temperature and top_p in the same call.

A string or a list of strings where the API will stop generating further tokens. The returned text will not contain the stop sequence.

number
-2 to 2
Defaults to 0

Indicates how much to penalize new tokens based on their existing frequency in the text so far, decreasing model likelihood to repeat the same line verbatim.

number
-2 to 2
Defaults to 0

Positive values penalize new tokens based on whether they appear in the text so far, increasing model likelihood to talk about new topics.

integer
Defaults to 0

The model generates random results. Changing the input seed alone will produce a different response with similar characteristics. It is possible to reproduce results by fixing the input seed (assuming all other hyperparameters are also fixed).

required

A list of messages comprising the conversation so far.

Responses

Language
Credentials
Click Try It! to start a request and see the response here! Or choose an example:
application/json