1. What is NVIDIA NIM?

NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of easy-to-use microservices designed to accelerate deployment of generative AI across your enterprise. These prebuilt containers support a broad spectrum of AI models—from open-source community models to NVIDIA AI Foundation models, as well as custom AI models. NIM microservices are deployed with a single command for easy integration into enterprise-grade AI applications using standard APIs and just a few lines of code. Built on robust foundations including inference engines like Triton Inference Server, TensorRT, TensorRT-LLM, and PyTorch, NIM is engineered to facilitate seamless AI inferencing at scale, ensuring that you can deploy AI applications anywhere with confidence. Whether on-premises or in the cloud, NIM is the fastest way to achieve accelerated generative AI inference at scale.

2. What are the benefits of NIM?

  • Maintain security and control of generative AI applications and data with
    self-hosted deployment of the latest AI models in your choice of infrastructure,
    on-premises or in the cloud.
  • Speed time to market with prebuilt, cloud-native microservices that are
    continuously maintained to deliver optimized inference on NVIDIA accelerated
    infrastructure.
  • Empower enterprise developers with industry standard APIs and tools tailored for
    Enterprise environments.
  • Improve TCO with low latency, high throughput AI inference that scales with cloud
    Achieve best accuracy with support for pre-tuned models out of the box.
  • Leverage enterprise-grade software with dedicated feature branches, rigorous
    validation processes, and support including direct access to NVIDIA AI experts and
    defined service-level agreements.

3. How do I get started with NVIDIA NIM?

To get started, users can experience accelerated generative AI models in NVIDIA’s API catalog. Here they can interact with the latest NVIDIA AI Foundation Models through a browser and build POCs with model APIs. After prototyping is complete, users often want to transition AI models over to their own compute environment, mitigating the risk of data IP leakage, and fine-tuning a model. Models from NVIDIA’s API catalog can be downloaded for self-hosting with NVIDIA NIM, included with NVIDIA AI Enterprise, giving enterprise developers ownership of their customizations, infrastructure choices, and full control of their IP and AI application.

4. Are self-hosted NIMs only compatible with accelerated infrastructure (GPU’s)

Yes, NIM is designed to run on CUDA infrastructure that is a NVIDIA-Certified System.

5. What if I do not currently have NVIDIA-Certified System, is there another way to try out NIM?

If you do not have available GPU infrastructure, check out NVIDIA LaunchPad. Additional documentation on NVIDIA-Certified Systems can be found here.

6. How do I get started evaluating and deploying AI models?

You can get started by visiting build.nvidia.com where you can discover the latest AI models and learn about NVIDIA NIM. Then, you can explore and interact with more AI models through the browser or sign up for free credits to access NVIDIA-hosted endpoints for application prototyping in the NVIDIA API catalog. To deploy AI models on your preferred NVIDIA accelerated infrastructure, you will be prompted through interacting with downloadable models at ai.nvidia.com or the API catalog to sign up for an NVIDIA AI Enterprise 90-day evaluation license.

7. I signed up at build.nvidia.com and now have credits for API calls, how do I use them?

API call credits are not deducted when interacting with models on build.nvidia.com through the browser. Remote API calls to NVIDIA-hosted endpoints count against trail API credits.

8. What is contained within a NIM?

Each NIM is its own Docker container with a model, such as meta/llama3-8b-instruct, and
the runtime capable of running the model on any NVIDIA GPU.

NIM containers include:

  • Optimized AI models
  • APIs conforming to domain-specific industry standards
  • Optimized inference engines

9. What is the value of using NIM microservices?

NVIDIA NIM is NVIDIA’s opinionated view of building enterprise LLM applications.

10. What is the pricing model for NIM?

NIM is available through a license of NVIDIA AI Enterprise for $4500 per GPU per year or $1 per GPU per hour in the cloud. Pricing is based on the number of GPUs, not the number of NIMs.

11. Regarding the licensing, is it flat or does it change based on the size of the GPU?

Pricing is part of an NVIDIA AI Enterprise pricing structure, not a license per NIM.

12. What does NVIDIA support in regard to NIM through NVIDIA AI Enterprise?

NVIDIA AI Enterprise supports the optimized inference engine and runtime of the container. It does not support what is generated by the models or the models themselves. There is too much variance in terms of the sources of the model or data incorporated through RAG for NVIDIA to assume responsibility.

13. How does NIM work in CSP environments including CSP MLOps (Sagemaker, Azure AI Studio, Vertex AI) and CSP managed Kubernetes Solutions (EKS, AKS and GKE)?

NIM is containerized and is deployable out of the box on CSP managed Kubernetes solutions like AKS, GKE, EKS. A Reference helm Chart is available here. For deploying NIM on CSP MLOps like Sagemaker, Azure AI Studio, Vertex AI, an additional ‘shim’ is needed on top of NIM. NVIDIA is continuing to collaborate with these partners on NIM integration.