Qwen3-Coder-480B-A35B-Instruct
Model Overview
Description:
Qwen3-Coder-480B-A35B-Instruct is a state-of-the-art large language model specifically designed for code generation and agentic coding tasks. It is a mixture-of-experts (MoE) model with 480B total parameters and 35B activated parameters, featuring native support for 262,144 tokens context length and extendable up to 1M tokens using YaRN.
This model demonstrates significant performance among open models on Agentic Coding, Agentic Browser-Use, and other foundational coding tasks, achieving results comparable to Claude Sonnet. It supports function calling and tool choice capabilities, making it ideal for complex coding workflows and agentic applications.
This model is ready for commercial use.
License/Terms of Use
GOVERNING TERMS: This trial service is governed by the NVIDIA API Trial Terms of Service. Use of this model is governed by the NVIDIA Community Model License. Additional Information: Apache 2.0.
Deployment Geography
Deployment Geography: Global 
Use Cases
- Code Generation: Generate high-quality code from natural language descriptions
 - Agentic Coding: Execute complex coding workflows with function calling
 - Repository Understanding: Process large codebases with long-context capabilities
 - Tool Integration: Interface with development tools and APIs
 - Code Review and Analysis: Analyze and improve existing code
 - Documentation Generation: Create code documentation and comments
 - Browser Automation: Agentic browser-use scenarios
 - Function Calling: Structured tool execution and API integration
 
Release Information
Release Date: 08/22/2025 
Build.NVIDIA.com: Available via link 
Third-Party Community Consideration
This model is not owned or developed by NVIDIA. This model has been developed by Qwen (Alibaba Cloud). This model has been developed and built to a third-party's requirements for this application and use case; see link to Qwen3-Coder-480B-A35B-Instruct.
References
- Qwen3-Coder: A Large Language Model for Code Generation 
 - Qwen3-Coder GitHub Repository 
 - Qwen Documentation 
 - Hugging Face Model Page 
 - Qwen3 Technical Report (arXiv:2505.09388) 
 
Model Architecture
Architecture Type: mixture-of-experts (MoE) with Sparse Activation 
Network Architecture: Qwen3MoeForCausalLM (Transformer-based decoder-only) 
Parameter Count: 480B total parameters with 35B activated parameters 
Expert Configuration: 160 experts with 8 activated per forward pass 
Attention Mechanism: Grouped Query Attention (GQA) with 96 query heads and 8 KV heads 
Number of Layers: 62 
Hidden Size: 6144 
Head Dimension: 128 
Intermediate Size: 8192 
MoE Intermediate Size: 2560 
Context Length: 262,144 tokens (native), extendable to 1M with YaRN 
Vocabulary Size: 151,936 
Input
Input Type(s): Text, Code, Function calls 
Input Format(s): Natural language prompts, code snippets, structured function calls 
Input Parameters: 
- Max input length: 262,144 tokens (native), up to 1M with YaRN
 - Support for function calling format
 - Tool choice enabled
 - Trust remote code execution
 - Custom tool call parser (qwen3_coder)
 
Output
Output Type(s): Text, Code, Function responses 
Output Format(s): Natural language responses, code generation, structured function outputs 
Output Parameters: One-Dimensional (1D)
- 
Max output length: Configurable based on remaining context
 - 
Function call responses in structured format
Other Properties Related to Output:
 - 
Non-thinking mode (no
<think></think>blocks) - 
Auto tool choice responses
 
Software Integration
Runtime Engine: vLLM, Transformers (4.51.0+) 
Supported Hardware Platform(s): NVIDIA Hopper 
Supported Operating System(s): Linux 
Data Type: FP8 
Data Modality: Text 
Model Version: v1.0 
Training, Testing, and Evaluation Datasets
Training Dataset
- Data Collection Method by dataset: The model was trained on a diverse dataset including code repositories, documentation, and natural language text related to programming
 - Labeling Method by dataset: Supervised fine-tuning with instruction-following data
 - Properties: Multi-language code support, instruction-following capabilities, function calling training
 
Testing Dataset
- Data Collection Method by dataset: Standard benchmarks for code generation and agentic tasks
 - Labeling Method by dataset: Automated evaluation metrics
 - Properties: HumanEval, MBPP, Agentic coding benchmarks
 
Evaluation Dataset
- Data Collection Method by dataset: Public benchmarks and custom evaluation sets
 - Labeling Method by dataset: Automated metrics and human evaluation
 - Properties: Code generation quality, function calling accuracy, agentic task performance
 
Benchmark Results
The model achieves significant performance among open models on:
- Agentic Coding tasks
 - Agentic Browser-Use scenarios
 - Foundational coding benchmarks
 - Results comparable to Claude Sonnet on various coding tasks
 
Inference
Acceleration Engine: vLLM 
Test Hardware: NVIDIA Hopper 
Ethical Considerations
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
Please report security vulnerabilities or NVIDIA AI Concerns here.
