deepseek-ai / deepseek-r1-distill-qwen-7b

Model Overview

Background

DeepSeek-R1-Zero, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step, demonstrated remarkable performance on reasoning. With RL, DeepSeek-R1-Zero naturally emerged with numerous powerful and interesting reasoning behaviors. However, DeepSeek-R1-Zero encounters challenges such as endless repetition, poor readability, and language mixing. DeepSeek-R1 sought to address these issues and enhance reasoning performance by incorporating cold-start data before RL. DeepSeek-R1 achieves performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

Description
DeepSeek-Distill-Qwen-7B is distilled from DeepSeek-R1 based on Qwen2.5-Math-7B. The reasoning patterns of larger models, DeepSeek-R1 in this case, can be distilled into smaller models, resulting in better performance compared to the reasoning patterns discovered through RL on small models. Using the reasoning data generated by DeepSeek-R1, dense models that are widely used in the research community can be fine-tuned. The evaluation results demonstrate that the distilled smaller dense models perform exceptionally well on benchmarks.

This model is ready for commercial use.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to the DeepSeek-R1-Distill-Qwen-7B Model Card.

License/Terms of Use

The NIM container is governed by the NVIDIA Software License Agreement and Product-Specific Terms for AI Products; and the use of this model is governed by the NVIDIA Community Model License. ADDITIONAL INFORMATION: MIT License and Apache 2.0 License.

Model Developer: DeepSeek-AI

Model Architecture
Architecture Type: Transformer
Network Architecture: Qwen
Version: 2.5

Input
Input Type: Text
Input Format: String
Input Parameters: 1D
Other Properties Related to Input:

DeepSeek recommends adhering to the following configurations when utilizing the DeepSeek-R1 series models, including benchmarking, to achieve the expected performance:

Set the temperature within the range of 0.5-0.7 (0.6 is recommended) to prevent endless repetitions or incoherent outputs.
Avoid adding a system prompt; all instructions should be contained within the user prompt.
For mathematical problems, it is advisable to include a directive in your prompt such as: "Please reason step by step, and put your final answer within \boxed{}."
When evaluating model performance, it is recommended to conduct multiple tests and average the results.

Additionally, the DeepSeek-R1 series models tend to bypass thinking patterns (i.e., outputting "<think>\n\n</think>") when responding to certain queries, which can adversely affect the model's performance. To ensure that the model engages in thorough reasoning, DeepSeek recommends enforcing the model to initiate its response with "<think>\n" at the beginning of every output.

Output
Output Type: Text
Output Format: String
Output Parameters: 1D

Software Integration

Runtime Engine: TensorRT-LLM
Supported Hardware Microarchitecture Compatibility: NVIDIA Hopper, NVIDIA Lovelace
Preferred/Supported Operating System(s): Linux

Model Version: 2.5

Training, Testing, and Evaluation Dataset:

Training Dataset

Data Collection Method by dataset: Automated. Reasoning data generated by DeepSeek-R1.
Labelling Method by dataset: Automated

Evaluation Dataset

Please see the Evaluation section of the DeepSeek-R1-Distill-Qwen-7B Model Card for more information.

Data Collection Method by dataset: Hybrid: Human, Automated

Labeling Method by dataset: Hybrid: Human, Automated

Inference

Engine: TensorRT-LLM
Test Hardware: L20, H20

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please report security vulnerabilities or NVIDIA AI Concerns here.

Model Limitations: The base model was trained on data that contains toxic language and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.

You are responsible for ensuring that your use of NVIDIA AI Foundation Models complies with all applicable laws.