databricks / dbrx-instruct

dbrx-instruct

Model Information

Description:

DBRX is a transformer-based decoder-only large language model (LLM) that was trained using next-token prediction. It uses a fine-grained mixture-of-experts (MoE) architecture with 132B total parameters of which 36B parameters are active on any input.

Compared to other open MoE models like Mixtral-8x7B and Grok-1, DBRX is fine-grained, meaning it uses a larger number of smaller experts. DBRX has 16 experts and chooses 4, while Mixtral-8x7B and Grok-1 have 8 experts and choose 2. This provides 65x more possible combinations of experts and we found that this improves model quality. DBRX uses rotary position encodings (RoPE), gated linear units (GLU), and grouped query attention (GQA). It uses a converted version of the GPT-4 tokenizer as defined in the tiktoken repository. We made these choices based on exhaustive evaluation and scaling experiments.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA DBRX Model Card.

License and Terms of use

GOVERNING TERMS: Your use of this API is governed by the NVIDIA API Trial Service Terms of Use; and the use of this model is governed by the NVIDIA AI Foundation Models Community License and Databricks Open Model License.

References(s):

Blog post

Model Architecture:

Architecture Type: Transformer

Network Architecture: Fine-grained Mixture of Experts (MoE)

Input:

Input Format: Text

Input Parameters: Temperature, Top P, Max Output Tokens

Output:

Output Format: Text

Software Integration:

  • Supported Hardware Platform(s): Hopper

[Preferred/Supported] Operating System(s):

  • Linux

Training, Testing, and Evaluation Datasets:

Training Dataset:

Properties (Quantity, Dataset Descriptions, Sensor(s)): Pre-trained on 12T tokens of text and code data.

Inference:

Engine: Triton, TRT-LLM

Test Hardware: H100