mistralai / mixtral-8x7b-instruct

Model Overview

Description:

Mixtral 8x7B Instruct is a language model that can follow instructions, complete requests, and generate creative text formats. Mixtral 8x7B a high-quality sparse mixture of experts model (SMoE) with open weights.

This model has been optimized through supervised fine-tuning and direct preference optimization (DPO) for careful instruction following. On MT-Bench, it reaches a score of 8.30, making it the best open-source model, with a performance comparable to GPT3.5.

Mixtral outperforms Llama 2 70B on most benchmarks with 6x faster inference. It is the strongest open-weight model with a permissive license and the best model overall regarding cost/performance trade-offs. In particular, it matches or outperforms GPT3.5 on most standard benchmarks.

Mixtral has the following capabilities.

  • It gracefully handles a context of 32k tokens.
  • It handles English, French, Italian, German and Spanish.
  • It shows strong performance in code generation.
  • It can be finetuned into an instruction-following model that achieves a score of 8.3 on MT-Bench.

Third-Party Community Consideration:

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see Mistral's 8x7B Instruct Hugging Face Model Card.

Terms of use

By using this software or model, you are agreeing to the terms and conditions of the license, acceptable use policy and Mistral's privacy policy. Mixtral-8x7B is released under the Apache 2.0 license

References(s):

Mixtral 8x7B Instruct Model Card on Hugging Face

Mixtral of experts | Mistral AI | Open source models

Model Architecture:

Architecture Type: Transformer

Network Architecture: Sparse Mixture of GPT-based experts

Model Version: 0.1

Input:

Input Format: Text

Input Parameters: Temperature, Top P, Max Output Tokens

Output:

Output Format: Text

Output Parameters: None

Software Integration:

Supported Hardware Platform(s): Hopper, Ampere, Turing, Ada

Supported Operating System(s): Linux

Inference:

Engine: Triton

Test Hardware: Other