Overview

The Large Language Model (LLM) NIM API endpoints provide simple access to use natural language based generative AI. This single API endpoint provides access to top models for use in a wide range of tasks including: chat, instruction following, question answering, summarization, creative text generation, and code generation.

NOTE: Select models are available as downloadable container images and supported with an NVIDIA AI Enterprise entitlement. These select models have additional OpenAI API spec details for running self-hosted localized NIMs. Please refer to the Downloadable NIM documentation for additional information.

URL: https://integrate.api.nvidia.com

Endpoint: POST /v1/chat/completions

Models

abacusai

Model	Endpoint
abacusai / dracarys-llama-3.1-70b-instruct	Creates a model response for the given chat conversation. (dracarys-llama-3.1-70b-instruct)

bytedance

Model	Endpoint
bytedance / seed-oss-36b-instruct	Creates a model response for the given chat conversation. (seed-oss-36b-instruct)

deepseek-ai

Model	Endpoint
deepseek-ai / deepseek-v4-flash	Creates a model response for the given chat conversation. (deepseek-v4-flash)
deepseek-ai / deepseek-v4-pro	Creates a model response for the given chat conversation. (deepseek-v4-pro)

google

Model	Endpoint
google / codegemma-7b	Create a chat completion (codegemma-7b)
google / gemma-2-2b-it	Creates a model response for the given chat conversation. (gemma-2-2b-it)
google / gemma-7b	Create a chat completion (gemma-7b)

Model	Endpoint
meta / llama2-70b	Create a chat completion (llama2-70b)
meta / llama-3.1-8b-instruct	Creates a model response for the given chat conversation. (llama-3.1-8b-instruct)
meta / llama-3.1-70b-instruct	Creates a model response for the given chat conversation. (llama-3.1-70b-instruct)
meta / llama-3.2-1b-instruct	Creates a model response for the given chat conversation. (llama-3.2-1b-instruct)
meta / llama-3.2-3b-instruct	Creates a model response for the given chat conversation. (llama-3.2-3b-instruct)
meta / llama-3.3-70b-instruct	Creates a model response for the given chat conversation. (llama-3.3-70b-instruct)

microsoft

Model	Endpoint
microsoft / phi-4-mini-instruct	Creates a model response for the given chat conversation. (phi-4-mini-instruct)
microsoft / phi-4-mini-flash-reasoning	Creates a model response for the given chat conversation. (phi-4-mini-flash-reasoning)

minimaxai

Model	Endpoint
minimaxai / minimax-m2.5	Creates a model response for the given chat conversation. (minimax-m2.5)
minimaxai / minimax-m2.7	Creates a model response for the given chat conversation. (minimax-m2.7)

mistralai

Model	Endpoint
mistralai / mistral-nemotron	Creates a model response for the given chat conversation. (mistral-nemotron)
mistralai / mixtral-8x7b-instruct	Create a chat completion (mixtral-8x7b-instruct)
mistralai / mixtral-8x22b-instruct	Create a chat completion (mixtral-8x22b-instruct)

moonshotai

Model	Endpoint
moonshotai / kimi-k2-instruct	Creates a model response for the given chat conversation. (kimi-k2-instruct)
moonshotai / kimi-k2-thinking	Creates a model response for the given chat conversation. (kimi-k2-thinking)

nvidia

Model	Endpoint
nvidia / gliner-pii	Extract named entities from text using GLiNER PII model (gliner-pii)
nvidia / llama-3.1-nemoguard-8b-content-safety	Creates a model response for the given chat conversation. (llama-3.1-nemoguard-8b-content-safety)
nvidia / llama-3.1-nemoguard-8b-topic-control	Creates a model response for the given chat conversation. (llama-3.1-nemoguard-8b-topic-control)
nvidia / nemotron-3-ultra-550b-a55b	Creates a model response for the given chat conversation (nemotron-3-ultra-550b-a55b)
nvidia / llama-3.1-nemotron-nano-8b-v1	Creates a model response for the given chat conversation. (llama-3.1-nemotron-nano-8b-v1)
nvidia / llama-3.1-nemotron-safety-guard-8b-v3	Creates a model response for the given chat conversation. (llama-3.1-nemotron-safety-guard-8b-v3)
nvidia / llama-3.3-nemotron-super-49b-v1	Creates a model response for the given chat conversation. (llama-3.3-nemotron-super-49b-v1)
nvidia / llama-3.3-nemotron-super-49b-v1.5	Creates a model response for the given chat conversation. (llama-3.3-nemotron-super-49b-v1.5)
nvidia / llama-3.1-nemotron-ultra-253b-v1	Create a model response for a given chat (nvidia-llama-3.1-nemotron-ultra-253b-v1)
nvidia / nemoguard-jailbreak-detect	Classify text for jailbreak attempt (nemoguard-jailbreak-detect)
nvidia / nemotron-3-nano-30b-a3b	Creates a model response for the given chat conversation. (nemotron-3-nano-30b-a3b)
nvidia / nemotron-3-super-120b-a12b	Creates a model response for the given chat conversation. (nemotron-3-super-120b-a12b)
nvidia / nemotron-content-safety-reasoning-4b	Creates a model response for the given chat conversation. (nemotron-content-safety-reasoning-4b)
nvidia / nemotron-mini-4b-instruct	Creates a model response for the given chat conversation. (nemotron-mini-4b-instruct)
nvidia / nvidia-nemotron-nano-9b-v2	Creates a model response for the given chat conversation. (nvidia-nemotron-nano-9b-v2)
nvidia / riva-translate-4b-instruct-v1_1	Creates a model response for the given chat conversation. (riva-translate-4b-instruct-v1_1)
nvidia / usdcode	Creates a model response for the given chat conversation. (usdcode)

openai

Model	Endpoint
openai / gpt-oss-20b	Creates a model response for the given chat conversation. (gpt-oss-20b)
openai / gpt-oss-120b	Creates a model response for the given chat conversation. (gpt-oss-120b)

poolside

Model	Endpoint
poolside / laguna-xs-2-1	Creates a model response for the given chat conversation. (laguna-xs-2-1)

qwen

Model	Endpoint
qwen / qwen2.5-coder-32b-instruct	Creates a model response (qwen2.5-coder-32b-instruct)
qwen / qwen3.5-122b-a10b	Request response from the model (qwen3.5-122b-a10b)
qwen / qwen3-coder-480b-a35b-instruct	Creates a model response for the given chat conversation. (qwen3-coder-480b-a35b-instruct)
qwen / qwen3-next-80b-a3b-instruct	Creates a model response for the given chat conversation. (qwen3-next-80b-a3b-instruct)
qwen / qwen3-next-80b-a3b-thinking	Creates a model response for the given chat conversation. (qwen3-next-80b-a3b-thinking)
qwen / qwq-32b	Creates a model response for the given chat conversation. (qwq-32b)

sarvamai

Model	Endpoint
sarvamai / sarvam-m	Creates a model response for the given chat conversation. (sarvam-m)

stepfun-ai

Model	Endpoint
stepfun-ai / step-3-5-flash	Creates a model response for the given chat conversation. (step-3-5-flash)

stockmark

Model	Endpoint
stockmark / stockmark-2-100b-instruct	Creates a model response for the given chat conversation. (stockmark-2-100b-instruct)

thinking machines

Model	Endpoint
thinking machines / inkling	Creates a model response for the given chat conversation. (inkling)

upstage

Model	Endpoint
upstage / solar-10.7b-instruct	Creates a model response for the given chat conversation. (solar-10.7b-instruct)

z-ai

Model	Endpoint
z-ai / glm4.7	Creates a model response for the given chat conversation. (glm4.7)
z-ai / glm5.1	Creates a model response for the given chat conversation. (glm5.1)
z-ai / glm-5.2	Creates a model response for the given chat conversation (glm5.2)