General NIM FAQ

What is NVIDIA NIM?

NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of easy-to-use microservices designed to accelerate deployment of generative AI across your enterprise. These prebuilt containers support a broad spectrum of AI models—from open-source community models to NVIDIA AI Foundation models, as well as custom AI models. NIM microservices are deployed with a single command for easy integration into enterprise-grade AI applications using standard APIs and just a few lines of code. Built on robust foundations including inference engines like Triton Inference Server, TensorRT, TensorRT-LLM, and PyTorch, NIM is engineered to facilitate seamless AI inferencing at scale, ensuring that you can deploy AI applications anywhere with confidence. Whether on-premises or in the cloud, NIM is the fastest way to achieve accelerated generative AI inference at scale.

What are the benefits of NIM?

With NVIDIA NIM you can:

Maintain security and control of generative AI applications and data with
self-hosted deployment of the latest AI models in your choice of infrastructure,
on-premises or in the cloud.
Speed time to market with prebuilt, cloud-native microservices that are
continuously maintained to deliver optimized inference on NVIDIA accelerated
infrastructure.
Empower enterprise developers with industry standard APIs and tools tailored for
Enterprise environments.
Improve TCO with low latency, high throughput AI inference that scales with cloud
Achieve best accuracy with support for pre-tuned models out of the box.
Leverage enterprise-grade software with dedicated feature branches, rigorous
validation processes, and support including direct access to NVIDIA AI experts and
defined service-level agreements.

How do I get started with NVIDIA NIM?

To get started, you can experience accelerated generative AI models in NVIDIA’s API catalog. Here you can interact with the latest NVIDIA AI Foundation Models through a browser and build POCs with model APIs. After prototyping is complete, you may want to transition AI models over to your own compute environment, mitigating the risk of data IP leakage, and fine-tuning a model. Models from NVIDIA’s API catalog can be downloaded for self-hosting with NVIDIA NIM, included with NVIDIA AI Enterprise, giving enterprise developers ownership of their customizations, infrastructure choices, and full control of their IP and AI application.

What is in a NIM?

Each NIM is its own Docker container with a model, such as meta/llama3-8b-instruct, and
the runtime capable of running the model on any supported NVIDIA GPU.

NIM containers include:

Optimized AI models
APIs conforming to domain-specific industry standards
Optimized inference engines

Are self-hosted NIM microservices only compatible with accelerated infrastructure (GPUs)

Yes, NIM is designed to run on CUDA infrastructure that is a NVIDIA-Certified System.

What if I do not currently have an NVIDIA-Certified System, - is there another way to try out NIM?

If you do not have available GPU infrastructure, check out NVIDIA Brev if you are an individual, or NVIDIA LaunchPad if you are part of an enterprise. Additional documentation on NVIDIA-Certified Systems can be found here.

How do I get started evaluating and deploying AI models?

You can get started by visiting build.nvidia.com where you can discover the latest AI models and learn about NVIDIA NIM. There, you can explore and interact with more AI models through the browser to access NVIDIA-hosted endpoints for application prototyping in the NVIDIA API catalog. To deploy AI models on your preferred NVIDIA accelerated infrastructure, you can pull models from a docker registry, by following instructions in the API catalog. You will be enrolled in the NVIDIA Developer program to access NIM for development and testing, but to take it to production you will need an NVIDIA AI Enterprise License. You can sign up for a free 90-day evaluation license from the NVIDIA AI Enterprise website.

How much consistency is there in the NIM APIs?

OpenAPI has been the industry standard; it has set the standard for Embedding, but for the rest of the microservices, NVIDIA has set the standard. NVIDIA has invested in much research and thoughtful customization to align to industry standards. Every model has different parameters, and NVIDIA is making sure to adapt to those standards.

Account a FAQ

How do I access NVIDIA NIM via the NVIDIA Developer Program?

Members of the NVIDIA Developer Program have free access to NIM API endpoints for prototyping, and to downloadable NIM microservices for research, application development, and experimentation on up to 16 GPUs on any infrastructure—cloud, data center, or personal workstation. Access to NIM microservices is available through the duration of program membership.

How can developers register for the NVIDIA Developer Program and access NIM microservices?

Developers can join the free NVIDIA Developer Program and access NIM at any time via the [NVIDIA API catalog](https://build.nvidia.com/explore/discover) at build.nvidia.com. Users looking for enterprise-grade security, support, and API stability can select the option to access NIM via a free 90-day NVIDIA AI Enterprise Trial.

What does NIM access via the NVIDIA Developer Program include?

Access to NIM microservices API endpoints accelerated by DGX Cloud, available at build.nvidia.com
Ability to download and self-host NIM microservices on your own infrastructure
License to use your self-hosted NIM for research, application development, and experimentation on up to 16 GPUs
Community support via the NVIDIA Developer Forum

How does access to NIM via NVIDIA Developer Program compare to the 90-Day NVIDIA AI Enterprise Trial?

The NVIDIA AI Enterprise 90-Day Trial is designed for production deployments and includes a license for commercial use, enterprise-grade security, support, and API stability. NIM access through the NVIDIA Developer Program is for prototyping, research, development and testing purposes only and does not include enterprise features or support.

What does “production” mean?

Production use involves any use of NIM for purposes other than development, testing, research or evaluation such as conducting business transactions and any non-testing activity including activity serving real end-users. Using NIM in production requires an NVIDIA AI Enterprise license.

What is the pricing model for NIM?

To use NIM In production, your organization must have an NVIDIA AI Enterprise license. These licenses start at $4500 per GPU per year or ~ $1 per GPU per hour in the cloud. Pricing is based on the number of GPUs, not the number of NIMs. Pricing is the same regardless of GPU size.

What does NVIDIA support in regard to NIM through NVIDIA AI Enterprise?

NVIDIA AI Enterprise supports the optimized inference engine and runtime of the container. It does not support output that may be generated by the models or the models themselves. For more information, visit the NVIDIA AI Enterprise Support overview page.