Model Overview
Description:
DeepSeek Coder is a series of code language models trained from scratch on 2T tokens, comprising 87% code and 13% natural language in English and Chinese. These models are available in sizes ranging from 1B to 33B parameters and are designed to support project-level code completion and infilling. The 6.7B parameter model, deepSeek-coder-6.7b-instruct, is fine-tuned on 2B tokens of instruction data and offers state-of-the-art performance on multiple programming languages and benchmarks.
Third-Party Community Consideration
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to the deepseek-coder-6.7b-instruct on Hugging Face.
Terms of Use
GOVERNING TERMS: The use of this model is subject to the MIT License and DeepSeek AI Model Agreement.
Model Architecture:
Architecture Type: Generative Pre-Trained Transformer (GPT)-based
Network Architecture: Pre-trained on project-level code corpus with a window size of 16K and a fill-in-the-blank task
Model Version: 1.0
Input:
Input Type: Text
Output Format: String
Input Parameters: Temperature, Top K, Top P, Max Output Tokens
Output:
Output Type: Text
Output Format: String
Software Integration:
Supported Hardware Platform(s): NVIDIA L4 GPUs
Supported Operating System(s): Linux
Inference:
Engine: Triton
Test Hardware: Other