deepseek-ai / deepseek-r1-distill-llama-8b

Model Overview

Description:

DeepSeek-R1-Distill-Llama-8B is a distilled version of the DeepSeek-R1 series, built upon the Llama3.1-8B-Instruct architecture. This model is designed to deliver efficient performance for reasoning, math, and code tasks while maintaining high accuracy. By distilling knowledge from the larger DeepSeek-R1 model, it provides state-of-the-art performance with reduced computational requirements.

This model is ready for both research and commercial use.
For more details, visit the DeepSeek website.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA DeepSeek-R1 Model Card.

License/Terms of Use

GOVERNING TERMS: The NIM container is governed by the NVIDIA Software License Agreement and Product-Specific Terms for AI Products; and the use of this model is governed by the NVIDIA Community Model License. Additional Information: MIT License; Meta Llama 3.1 Community License Agreement. Built with Llama.

References:

Model Architecture:

Architecture Type: Distilled version of Mixture of Experts (MoE)

Base Model: Llama3.1-8B-Instruct

Input:

Input Type(s): Text

Input Format(s): String

Input Parameters: (1D)

Other Properties Related to Input:

DeepSeek recommends adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:

Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
Avoid adding a system prompt; all instructions should be contained within the user prompt.
For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."
When evaluating model performance, it is recommended to conduct multiple tests and average the results.

Output:

Output Type(s): Text

Output Format: String

Output Parameters: (1D)

Software Integration:

Runtime Engine(s): TensorRT-LLM

Supported Hardware Microarchitecture Compatibility: NVIDIA's Ampere, NVIDIA Blackwell, NVIDIA Jetson, NVIDIA Hopper, NVIDIA Lovelace, NVIDIA Pascal, NVIDIA Turing, and NVIDIA Volta architectures

[Preferred/Supported] Operating System(s): Linux

Model Version(s):

DeepSeek-R1-Distill-Llama-8B

Training, Testing, and Evaluation Datasets:

Training Dataset:

Data Collection Method by dataset: Hybrid: Human, Automated

Labeling Method by dataset: Hybrid: Human, Automated

Testing Dataset:

Data Collection Method by dataset: Hybrid: Human, Automated

Labeling Method by dataset: Hybrid: Human, Automated

Evaluation Dataset:

Data Collection Method by dataset: Hybrid: Human, Automated

Labeling Method by dataset: Hybrid: Human, Automated

Inference:

Engine: TensorRT-LLM

Test Hardware: NVIDIA Hopper

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.

Model Limitations:

The DeepSeek-R1-Distill model may struggle with open-ended or complex tasks, such as mathematical problems, if a directive is not included in the prompt to reason step by step and put the final answer within a boxed notation. Additionally, the model may face challenges with benchmarks requiring sampling if the temperature, top-p value, and number of responses per query are not set correctly.

The base model was trained on data that contains toxic language and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.