DeepSeek-V3.1 Overview

Description

DeepSeek-V3.1 is a hybrid model that supports both thinking and non-thinking modes. Compared to the previous version, this upgrade brings improvements in multiple aspects:

Hybrid thinking mode: One model supports both modes by changing the chat template.
Smarter tool calling: Through post-training optimization, the model's performance in tool usage and agent tasks has significantly improved.

This model is ready for commercial/non-commercial use.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Non-NVIDIA Model Card DeepSeek-V3.1 Model Card.

License and Terms of Use:

GOVERNING TERMS: This trial service is governed by the NVIDIA API Trial Terms of Service. Use of this model is governed by the NVIDIA Community Model License. Additional Information: MIT License.

Deployment Geography:

Global

Use Case:

Designed to handle general instruction-following tasks, DeepSeek-V3.1 can be integrated into AI assistants across various domains, including business applications.

Supported Languages: Primarily English and Chinese, with multilingual capabilities.

Extended long-context tasks (up to 128K tokens):
- Long-form document summarization
- Whole-book comprehension
- Codebase and legal document analysis
Complex reasoning and problem-solving:
- Multi-step logic and mathematical reasoning
- Scientific and analytical writing assistance
- Puzzle solving and structured decision-making
Code generation and software development:
- Live coding and debugging support
- Generating or completing scripts and software logic
- Automated documentation and code analysis
Tool-augmented and agent-based applications:
- Intelligent agents with dynamic tool invocation
- Interactive chatbots with hybrid “thinking” mode
- Use in systems that call external APIs or utilities

Release Date:

Build.NVIDIA.com: 08/26/2025 (link)

Hugging Face: 08/20/2025 (link)

References:

Model Architecture:

Architecture Type: Transformer (Decoder-only)
Parameter Count: 671B (Total), 37B (Activated)
Notable Architectural Features: Hybrid thinking mode
Base Model: DeepSeek-V3.1-Base
Additional Notes: Trained using the UE8M0 FP8 scale data format

Input

Input Type(s): Text

Input Formats: String

Input Parameters: One-Dimensional (1D)

Other Properties Related to Input: Chat Template for different modes, Tool descriptions. Context Length: Supports up to 128K tokens

Output

Output Type(s): Text

Output Formats: String

Output Parameters: One-Dimensional (1D)

Other Properties Related to Output: Special Features: Supports both thinking and non-thinking response modes.

Our Al models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration

Runtime Engine(s): SGLang

Supported Hardware Microarchitecture Compatibility:

NVIDIA Blackwell
NVIDIA Hopper

Preferred/Supported Operating System(s): Linux

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.

Model Version:

DeepSeek-V3.1

Training, Testing, and Evaluation Datasets:

Training Dataset:

Data Collection Method: Undisclosed
Labeling Method: Undisclosed
Properties:
Expanded dataset with additional long documents.
- Training Format: UE8M0 FP8 scale data format.
- Two-phase long context extension; 32K extension phase: 630B tokens (10x expansion)
- 128K extension phase: 209B tokens (3.3x expansion)

Testing Dataset:

Data Collection Method by dataset: Undisclosed
Labeling Method by dataset: Undisclosed
Properties: Undisclosed

Evaluation Benchmark Results:

Please see the Evaluation section of the HuggingFace DeepSeek-V3 Model Card for more information.

Data Collection Method: Hybrid: Human, Automated
Labeling Method: Hybrid: Human, Automated
Properties: Evaluation on General, Search Agent, Code, Code Agent, and Math benchmarks.

Category	Benchmark (Metric)	DeepSeek V3.1-NonThinking	DeepSeek V3 0324	DeepSeek V3.1-Thinking	DeepSeek R1 0528
General
	MMLU-Redux (EM)	91.8	90.5	93.7	93.4
	MMLU-Pro (EM)	83.7	81.2	84.8	85.0
	GPQA-Diamond (Pass@1)	74.9	68.4	80.1	81.0
	Humanity's Last Exam (Pass@1)	-	-	15.9	17.7
Search Agent
	BrowseComp	-	-	30.0	8.9
	BrowseComp_zh	-	-	49.2	35.7
	Humanity's Last Exam (Python + Search)	-	-	29.8	24.8
	SimpleQA	-	-	93.4	92.3
Code
	LiveCodeBench (2408-2505) (Pass@1)	56.4	43.0	74.8	73.3
	Codeforces-Div1 (Rating)	-	-	2091	1930
	Aider-Polyglot (Acc.)	68.4	55.1	76.3	71.6
Code Agent
	SWE Verified (Agent mode)	66.0	45.4	-	44.6
	SWE-bench Multilingual (Agent mode)	54.5	29.3	-	30.5
	Terminal-bench (Terminus 1 framework)	31.3	13.3	-	5.7
Math
	AIME 2024 (Pass@1)	66.3	59.4	93.1	91.4
	AIME 2025 (Pass@1)	49.8	51.3	88.4	87.5
	HMMT 2025 (Pass@1)	33.5	29.2	84.2	79.4

Note:

Search agents are evaluated with our internal search framework, which uses a commercial search API + webpage filter + 128K context window. Search agent results of R1-0528 are evaluated with a pre-defined workflow.
SWE-bench is evaluated with our internal code agent framework.
HLE is evaluated with the text-only subset.

Inference:

Acceleration Engine: SGLang

Test Hardware: NVIDIA B200

Additional Details:

The model uses different chat templates for its operational modes. It also supports tool calls and agent functionality with specific formatting requirements.

Chat Template

The details of our chat template is described in tokenizer_config.json and assets/chat_template.jinja. Here is a brief description.

Non-Thinking

First-Turn

Prefix:
<｜begin▁of▁sentence｜>{system prompt}<｜User｜>{query}<｜Assistant｜></think>

With the given prefix, DeepSeek V3.1 generates responses to queries in non-thinking mode. Unlike DeepSeek V3, it introduces an additional token </think>.

Multi-Turn

Context:
<｜begin▁of▁sentence｜>{system prompt}<｜User｜>{query}<｜Assistant｜></think>{response}<｜end▁of▁sentence｜>...<｜User｜>{query}<｜Assistant｜></think>{response}<｜end▁of▁sentence｜>

Prefix:
<｜User｜>{query}<｜Assistant｜></think>

By concatenating the context and the prefix, we obtain the correct prompt for the query.

Thinking

First-Turn

Prefix:
<｜begin▁of▁sentence｜>{system prompt}<｜User｜>{query}<｜Assistant｜><think>

The prefix of thinking mode is similar to DeepSeek-R1.

Multi-Turn

Prefix:
<｜User｜>{query}<｜Assistant｜><think>

The multi-turn template is the same with non-thinking multi-turn chat template. It means the thinking token in the last turn will be dropped but the </think> is retained in every turn of context.

ToolCall

Toolcall is supported in non-thinking mode. The format is:

<｜begin▁of▁sentence｜>{system prompt}{tool_description}<｜User｜>{query}<｜Assistant｜></think> where the tool_description is

## Tools
You have access to the following tools:

### {tool_name1}
Description: {description}

Parameters: {json.dumps(parameters)}

IMPORTANT: ALWAYS adhere to this exact format for tool use:
<｜tool▁calls▁begin｜><｜tool▁call▁begin｜>tool_call_name<｜tool▁sep｜>tool_call_arguments<｜tool▁call▁end｜>{{additional_tool_calls}}<｜tool▁calls▁end｜>

Where:
- `tool_call_name` must be an exact match to one of the available tools
- `tool_call_arguments` must be valid JSON that strictly follows the tool's Parameters Schema
- For multiple tool calls, chain them directly without separators or spaces

Code-Agent

We support various code agent frameworks. Please refer to the above toolcall format to create your own code agents. An example is shown in assets/code_agent_trajectory.html.

Search-Agent

We design a specific format for searching toolcall in thinking mode, to support search agent.

For complex questions that require accessing external or up-to-date information, DeepSeek-V3.1 can leverage a user-provided search tool through a multi-turn tool-calling process.

Please refer to the assets/search_tool_trajectory.html and assets/search_python_tool_trajectory.html for the detailed template.

Usage Example

import transformers

tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V3.1")

messages = [
    {"role": "system", "content": "You are a helpful assistant"},
    {"role": "user", "content": "Who are you?"},
    {"role": "assistant", "content": "<think>Hmm</think>I am DeepSeek"},
    {"role": "user", "content": "1+1=?"}
]

tokenizer.apply_chat_template(messages, tokenize=False, thinking=True, add_generation_prompt=True)
# '<｜begin▁of▁sentence｜>You are a helpful assistant<｜User｜>Who are you?<｜Assistant｜></think>I am DeepSeek<｜end▁of▁sentence｜><｜User｜>1+1=?<｜Assistant｜><think>'

tokenizer.apply_chat_template(messages, tokenize=False, thinking=False, add_generation_prompt=True)
# '<｜begin▁of▁sentence｜>You are a helpful assistant<｜User｜>Who are you?<｜Assistant｜></think>I am DeepSeek<｜end▁of▁sentence｜><｜User｜>1+1=?<｜Assistant｜></think>'

How to Run Locally

The model structure of DeepSeek-V3.1 is the same as DeepSeek-V3. Please visit DeepSeek-V3 repo for more information about running this model locally.

Ethical Considerations:

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. Please report security vulnerabilities or NVIDIA AI Concerns here.