zyphra/zamba2-7b-instruct

Model Overview

Description:

Zamba2-7B is a hybrid model composed of state-space (Mamba) and transformer blocks. It follows the Zamba architecture which consists of a Mamba backbone alternating with shared transformer blocks. Zamba2-7B possesses four major improvements over Zamba1:
1.) Mamba1 blocks have been replaced with Mamba2 blocks.
2.) We apply a LoRA projector to each shared MLP and attention block, which allows the network to specialize at each invocation of the shared transformer layer across depth. LoRA enables us to add depth-specialization for only a minimal increase in total parameter count.
3.) We utilize two alternating shared attention blocks.
4.) We utilize rotary position embeddings in the shared attention layer.
We found that while hybrid SSM-transformer models are perfectly capable of performing well without position embeddings, adding rotary embeddings to the shared attention block slightly improved performance. Secondly, we utilize two alternating shared attention blocks. We find that this improves performance slightly over a single shared block in terms of performance at fixed parameter budget.
Zamba2-7B uses the Mistral v0.1 tokenizer and was pre-trained on 3T tokens of text and code data sourced from open web-datasets, including Zyda. Subsequently, in a second phase, Zamba2-7B was annealed on a mixture of approximately 100B high-quality tokens.

This model is ready for commercial and non-commercial use.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see Zyphra's Huggingface Zamba2-7B model card.

License/Terms of Use:

GOVERNING TERMS: The trial service is governed by the NVIDIA API Trial Terms of Service; and the use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement. ADDITIONAL INFORMATION: Apache 2.0 License.

References:

Zamba2-7b model card on Huggingface

Zamba2-7b blog post

Model Architecture:

Architecture Type: Hybrid SSM Transformer

Network Architecture: Zamba2

Input:

Input Type(s): Text

Input Format(s): String

Input Parameters: max_tokens, temperature, top_p

Other Properties Related to Input: None

Output:

Output Type(s): Text

Output Format: String

Output Parameters: None

Other Properties Related to Output: None

Supported Hardware Microarchitecture Compatibility:

NVIDIA Ampere

NVIDIA Hopper

[Preferred/Supported] Operating System(s):

Linux

Model Version(s):

The instruction-tuned 7B Zamba2 model, Zamba2-7B-Instruct

Inference:

Engine: Triton

Test Hardware:

Hopper

Ethical Considerations And Limitations:

Zamba2-7B is a large language model trained on highly diverse internet corpora. As such, despite our best efforts, it has likely been exposed to and may potentially reproduce factually innacurate information, hate speech, profanity, sexually explicit content, and other harmful content. Additionally it may confabulate incorrect or non-factual answers to queries. As such, please treat the model's output with a warranted degree of caution.

Benchmarks

ModelPiqa (0)Arc Easy (0)Arc Challenge (25)Boolq (0)Winogrande (0)Hellaswag (0)Openbookqa (0)MMLU 5 shot
Zamba2-7B8381.968.098676.981.24767.2
Mistral-7B-v0.182.2679.5961.4383.6473.8881.0744.262.2
Gemma 7B81.1280.7761.0983.1273.880.4645.262.9
Llama3.1-8B81.281.657.8582.173.678.944.665.2