Marin 8B Instruct Overview
Description:
Marin 8B Instruct is a Transformer-style autoregressive language model, fine-tuned from marin-8b-base, designed to follow instructions and engage in dialogue. This model is intended for tasks such as question answering, summarization, code generation, and dialogue.
- Developed by: The Marin team at Stanford CRFM.
- Model type: a Transformer style autoregressive language model.
- Knowledge Cutoff: ~July 2024
- Language(s) (NLP): English
- License: The code and model are released under Apache 2.0.
- Contact:
dlwh at stanford.edu
This model is ready for non-commercial/research use.
Third-Party Community Consideration
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA marin-8b-instruct Model Card.
License and Terms of use:
GOVERNING TERMS: This trial service is governed by the NVIDIA API Trial Terms of Service. Use of this model is governed by the NVIDIA Community Model License. Additional Information: Apache 2.0.
Deployment Geography:
Global
Use Case:
The Marin 8B Instruct model is designed for tasks requiring instruction comprehension and generation, such as question answering, summarization, code generation, and dialogue. It is positioned as a research artifact or a foundational instruct model upon which others can build and implement their own safety protocols.
Release Date:
- Build.nvidia.com: May 2025
- Huggingface: May 2025
Reference(s):
- Marin Community Hugging Face: https://huggingface.co/marin-community
- marin-8b-instruct Hugging Face: https://huggingface.co/marin-community/marin-8b-instruct
- Stanford CRFM: https://crfm.stanford.edu/
- Levanter GitHub: https://github.com/stanford-crfm/levanter
Model Architecture:
- Architecture Type: Transformer (Autoregressive Language Model)
- Network Architecture: Llama 3 8B
- This model was developed based on marin-8b-base.
- This model has 8.03 billion model parameters.
- Hidden Size: 4096
- Feedforward Size: 14336
- Number of Layers: 32
- Number of Attention Heads: 32
- Number of Key-Value (KV) Heads: 8 (Grouped-Query Attention)
Input:
- Input Type(s): Text
- Input Format(s): String
- Input Parameters: 1D
- Other Properties Related to Input: 4K Context Window Length
Output:
- Output Type(s): Text
- Output Format: String
- Output Parameters: 1D
- Other Properties Related to Output: Generates text based on input instructions. Knowledge cutoff is around July 2024.
Our Al models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
[Preferred/Supported] Operating System(s):
- Linux
- Windows
- MacOS (via Hugging Face Transformers library compatibility)
Model Version(s):
marin-8b-instruct v1.0
Training, Testing, and Evaluation Datasets:
Training Dataset:
The Marin-8b-Instruct model was adapted from marin-8b-base through Supervised Fine-Tuning (SFT) for an additional 5.3 billion tokens.
- Data Collection Method by dataset: Hybrid: Automated, Human
- Labeling Method by dataset: Hybrid: Automated, Human
- Properties: Undisclosed
Datasets used in Marin 8B Base
- DCLM Baseline
- Starcoder Data
- Proofpile 2
- FineMath 3+
- Dolma, including their versions of:
- Dolmino-Mix-1124, including their versions of:
- FLAN
- CodeSearchNet (with OWM Filter)
- GSM8K
- MetaMath
- MathCoder2 Synthetic
A full report is available on our ReadTheDocs site.
Datasets used in Marin 8B Instruct
Marin 8B Instruct is currently an SFT-only model. It was trained on the following datasets:
- TIGER-Lab/AceCode-89K
- bespokelabs/Bespoke-Stratos-17k
- cognitivecomputations/dolphin-r1 (includes both nonreasoning and reasoning subsets)
- tuenguyen/dolphin_r1_reasoning
- facebook/natural_reasoning
- open-r1/OpenThoughts-114k-math
- HuggingFaceTB/smoltalk
- allenai/tulu-3-sft-mixture
- PrimeIntellect/verifiable-math-problems
Testing Dataset:
- Data Collection Method: Undisclosed
- Labeling Method: Undisclosed
- Properties: Undisclosed
Evaluation Dataset:
- Data Collection Method: Undisclosed
- Labeling Method: Undisclosed
- Properties: Undisclosed
Base Model Evaluation Results
We ran a suite of standard benchmarks to compare our model with Llama 3.1 8B, and the open source 7-8B models Olmo 2 7B, and MAP NEO 7B.
For all benchmarks, we used LM Eval Harness with the default setup for each task. (These numbers may differ from reported results due to differences in setup. LM Eval Harness is usually somewhat stricter than other harnesses.)
Average | AGI Eval LSAT-AR | ARC Easy | ARC Challenge | BBH | BoolQ | CommonSense QA | COPA | GPQA | HellaSwag 0-shot | HellaSwag 10-shot | lambada_openai | MMLU 5-shot | MMLU 0-shot | MMLU Pro | OpenBookQA | PIQA | WinoGrande | WSC | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Marin 8B Base (Starling) | 68.3 | 20.9 | 86.5 | 63.1 | 50.6 | 85.9 | 79.1 | 92.0 | 30.3 | 82.3 | 83.6 | 74.7 | 67.6 | 65.9 | 36.5 | 44.2 | 84.4 | 74.5 | 82.1 |
Llama 3.1 Base | 67.0 | 20.4 | 85.8 | 58.9 | 46.4 | 84.2 | 75.2 | 92.0 | 32.3 | 79.4 | 81.9 | 74.7 | 66.4 | 65.5 | 33.3 | 45.8 | 82.9 | 74.4 | 83.5 |
OLMo 2 Base | 66.7 | 17.4 | 85.0 | 60.7 | 44.4 | 85.5 | 75.4 | 89.0 | 26.8 | 80.5 | 81.7 | 73.1 | 63.9 | 61.9 | 30.6 | 46.2 | 82.5 | 74.3 | 86.1 |
MAP NEO 7B | 62.2 | 23.0 | 81.1 | 52.0 | 42.4 | 84.7 | 81.7 | 82.0 | 27.8 | 72.5 | 73.3 | 64.6 | 58.2 | 56.4 | TODO | 39.4 | 79.0 | 66.1 | 73.3 |
Marin 8B Base fares well on most tasks.
Inference:
- Engine: TensorRT-LLM
- Test Hardware: L40s
Additional Information
- Developed by: The Marin Project / Marin Community, closely associated with Stanford University's Center for Research on Foundation Models (CRFM).
- Primary Contact: David Hall (dlwh at stanford.edu) is listed as the primary contact for the Marin 8B models on their Hugging Face model cards.
- Training Framework: Developed using the
stanford-crfm/levanter
training framework, which uses JAX and Named Tensors. - Training Logs: Public Weights & Biases (W&B) logs are available for the Marin 8B training runs.
- Tokenizer:
stanford-crfm/marin-tokenizer
(variant of Llama 3 tokenizer). - Philosophy: The Marin Community operates as "an open lab for building foundation models collaboratively," emphasizing open sharing of source code, datasets, experimental methodologies, and mistakes.
- Distinction: Marin Community (AI research project) is distinct from Marin Software (digital advertising company).
- Training Checkpoints (for base model): Kestrel, Ocelot, Jellyfish, Phoenix, Starling, and deeper-starling (13.7T tokens).
Bias, Risks, and Limitations
Like any base language model or fine-tuned model without safety filtering, these models can easily be prompted by users to generate harmful and sensitive content. Such content may also be produced unintentionally, especially in cases involving bias, so we recommend that users consider the risks when applying this technology. Additionally, many statements from Marin or any LLM are often inaccurate, so responses should be verified.
Marin 8B has not undergone any safety tuning or evaluation. We strongly recommend that users use this model with caution and consider the risks when applying this technology. In particular, this model is not intended for fully autonomous use.
Ethical Considerations:
NVIDIA believes Trustworthy Al is a shared responsibility and we have established policies and practices to enable development for a wide array of Al applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.