Model Overview
Description:
Zamba2-7B is a hybrid model composed of state-space (Mamba) and transformer blocks. It follows the Zamba architecture which consists of a Mamba backbone alternating with shared transformer blocks. Zamba2-7B possesses four major improvements over Zamba1:
1.) Mamba1 blocks have been replaced with Mamba2 blocks.
2.) We apply a LoRA projector to each shared MLP and attention block, which allows the network to specialize at each invocation of the shared transformer layer across depth. LoRA enables us to add depth-specialization for only a minimal increase in total parameter count.
3.) We utilize two alternating shared attention blocks.
4.) We utilize rotary position embeddings in the shared attention layer.
We found that while hybrid SSM-transformer models are perfectly capable of performing well without position embeddings, adding rotary embeddings to the shared attention block slightly improved performance. Secondly, we utilize two alternating shared attention blocks. We find that this improves performance slightly over a single shared block in terms of performance at fixed parameter budget.
Zamba2-7B uses the Mistral v0.1 tokenizer and was pre-trained on 3T tokens of text and code data sourced from open web-datasets, including Zyda. Subsequently, in a second phase, Zamba2-7B was annealed on a mixture of approximately 100B high-quality tokens.
This model is ready for commercial and non-commercial use.
Third-Party Community Consideration
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see Zyphra's Huggingface Zamba2-7B model card.
License/Terms of Use:
GOVERNING TERMS: The trial service is governed by the NVIDIA API Trial Terms of Service; and the use of this model is governed by the NVIDIA AI Foundation Models Community License Agreement. ADDITIONAL INFORMATION: Apache 2.0 License.
References:
Zamba2-7b model card on Huggingface
Zamba2-7b blog post
Model Architecture:
Architecture Type: Hybrid SSM Transformer
Network Architecture: Zamba2
Input:
Input Type(s): Text
Input Format(s): String
Input Parameters: max_tokens, temperature, top_p
Other Properties Related to Input: None
Output:
Output Type(s): Text
Output Format: String
Output Parameters: None
Other Properties Related to Output: None
Supported Hardware Microarchitecture Compatibility:
NVIDIA Ampere
NVIDIA Hopper
[Preferred/Supported] Operating System(s):
Linux
Model Version(s):
The instruction-tuned 7B Zamba2 model, Zamba2-7B-Instruct
Inference:
Engine: Triton
Test Hardware:
Hopper
Ethical Considerations And Limitations:
Zamba2-7B is a large language model trained on highly diverse internet corpora. As such, despite our best efforts, it has likely been exposed to and may potentially reproduce factually innacurate information, hate speech, profanity, sexually explicit content, and other harmful content. Additionally it may confabulate incorrect or non-factual answers to queries. As such, please treat the model's output with a warranted degree of caution.
Benchmarks
Model | Piqa (0) | Arc Easy (0) | Arc Challenge (25) | Boolq (0) | Winogrande (0) | Hellaswag (0) | Openbookqa (0) | MMLU 5 shot |
---|---|---|---|---|---|---|---|---|
Zamba2-7B | 83 | 81.9 | 68.09 | 86 | 76.9 | 81.2 | 47 | 67.2 |
Mistral-7B-v0.1 | 82.26 | 79.59 | 61.43 | 83.64 | 73.88 | 81.07 | 44.2 | 62.2 |
Gemma 7B | 81.12 | 80.77 | 61.09 | 83.12 | 73.8 | 80.46 | 45.2 | 62.9 |
Llama3.1-8B | 81.2 | 81.6 | 57.85 | 82.1 | 73.6 | 78.9 | 44.6 | 65.2 |