What is NIM

NVIDIA NIM™ are performance-optimized, portable inference microservices designed to accelerate and simplify the deployment of AI models. NIM are containerized, so you can self-host GPU-accelerated pretrained, fine-tuned, and customized models in the cloud, data center, or on your own workstation.

The ever-growing catalog of NIM microservices contains models for a wide range of AI use cases, from chatbot assistants to computer vision models for video processing. The image below shows some of the NIM microservices, organised by use case.

For Developers

NIM provides a simplified way to integrate AI models into your applications by exposing industry standard APIs and abstracting away model inference internals such as execution engine and runtime operations. NIM are the most performant option available, whether with TRT-LLM, vLLM, or others. With NIM you can use AI models by simply calling their hosted endpoints.

For IT Professionals and DevOps

On the infrastructure side, NIM make it easy to deploy, manage and optimize AI infrastructure, reducing engineering resources required to set up and maintain accelerated models.

Use Cases

NIM microservices expose industry-standard API endpoints that allow you to quickly test and develop. Models are categorized by use cases and come with full API reference documentation, whether you want to work with a specific model or explore all available models for your use case.


What’s Next