Model Overview

Description:

DeepSeek Coder is a series of code language models trained from scratch on 2T tokens, comprising 87% code and 13% natural language in English and Chinese. These models are available in sizes ranging from 1B to 33B parameters and are designed to support project-level code completion and infilling. The 6.7B parameter model, deepSeek-coder-6.7b-instruct, is fine-tuned on 2B tokens of instruction data and offers state-of-the-art performance on multiple programming languages and benchmarks.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to the deepseek-coder-6.7b-instruct on Hugging Face.

Terms of Use

GOVERNING TERMS: The use of this model is subject to the MIT License and DeepSeek AI Model Agreement.

Model Architecture:

Architecture Type: Generative Pre-Trained Transformer (GPT)-based

Network Architecture: Pre-trained on project-level code corpus with a window size of 16K and a fill-in-the-blank task

Model Version: 1.0

Input:

Input Type: Text

Output Format: String

Input Parameters: Temperature, Top K, Top P, Max Output Tokens

Output:

Output Type: Text

Output Format: String

Software Integration:

Supported Hardware Platform(s): NVIDIA L4 GPUs

Supported Operating System(s): Linux

Inference:

Engine: Triton

Test Hardware: Other