General & Credit FAQ

General NIM FAQ

What is NVIDIA NIM?

NVIDIA NIM, part of NVIDIA AI Enterprise, is a set of easy-to-use microservices designed to accelerate deployment of generative AI across your enterprise. These prebuilt containers support a broad spectrum of AI models—from open-source community models to NVIDIA AI Foundation models, as well as custom AI models. NIM microservices are deployed with a single command for easy integration into enterprise-grade AI applications using standard APIs and just a few lines of code. Built on robust foundations including inference engines like Triton Inference Server, TensorRT, TensorRT-LLM, and PyTorch, NIM is engineered to facilitate seamless AI inferencing at scale, ensuring that you can deploy AI applications anywhere with confidence. Whether on-premises or in the cloud, NIM is the fastest way to achieve accelerated generative AI inference at scale.

What are the benefits of NIM?

With NVIDIA NIM you can:

  • Maintain security and control of generative AI applications and data with
    self-hosted deployment of the latest AI models in your choice of infrastructure,
    on-premises or in the cloud.
  • Speed time to market with prebuilt, cloud-native microservices that are
    continuously maintained to deliver optimized inference on NVIDIA accelerated
    infrastructure.
  • Empower enterprise developers with industry standard APIs and tools tailored for
    Enterprise environments.
  • Improve TCO with low latency, high throughput AI inference that scales with cloud
    Achieve best accuracy with support for pre-tuned models out of the box.
  • Leverage enterprise-grade software with dedicated feature branches, rigorous
    validation processes, and support including direct access to NVIDIA AI experts and
    defined service-level agreements.

How do I get started with NVIDIA NIM?

To get started, you can experience accelerated generative AI models in NVIDIA’s API catalog. Here you can interact with the latest NVIDIA AI Foundation Models through a browser and build POCs with model APIs. After prototyping is complete, you may want to transition AI models over to your own compute environment, mitigating the risk of data IP leakage, and fine-tuning a model. Models from NVIDIA’s API catalog can be downloaded for self-hosting with NVIDIA NIM, included with NVIDIA AI Enterprise, giving enterprise developers ownership of their customizations, infrastructure choices, and full control of their IP and AI application.

What is in a NIM?

Each NIM is its own Docker container with a model, such as meta/llama3-8b-instruct, and
the runtime capable of running the model on any supported NVIDIA GPU.

NIM containers include:

  • Optimized AI models
  • APIs conforming to domain-specific industry standards
  • Optimized inference engines

Are self-hosted NIM microservices only compatible with accelerated infrastructure (GPUs)

Yes, NIM is designed to run on CUDA infrastructure that is a NVIDIA-Certified System.

What if I do not currently have an NVIDIA-Certified System, - is there another way to try out NIM?

If you do not have available GPU infrastructure, check out NVIDIA Brev if you are an individual, or NVIDIA LaunchPad if you are part of an enterprise. Additional documentation on NVIDIA-Certified Systems can be found here.

How do I get started evaluating and deploying AI models?

You can get started by visiting build.nvidia.com where you can discover the latest AI models and learn about NVIDIA NIM. There, you can explore and interact with more AI models through the browser or sign up for free credits to access NVIDIA-hosted endpoints for application prototyping in the NVIDIA API catalog. To deploy AI models on your preferred NVIDIA accelerated infrastructure, you can pull models from a docker registry, by following instructions in the API catalog. To do this you will need an NVIDIA AI Enterprise License. You can sign up for a free 90-day evaluation license from the NVIDIA AI Enterprise website.

How much consistency is there in the NIM APIs?

OpenAPI has been the industry standard; it has set the standard for Embedding, but for the rest of the microservices, NVIDIA has set the standard. NVIDIA has invested in much research and thoughtful customization to align to industry standards. Every model has different parameters, and NVIDIA is making sure to adapt to those standards.

Account and API Credit FAQ

I signed up at build.nvidia.com and now have credits for API calls. How do I use them?

API call credits are not deducted when interacting with models on build.nvidia.com through the browser. Remote API calls to NVIDIA-hosted endpoints count against trail API credits.

How long do my API credits last?

The first 1000 credits expire within 30 days of creating an account, and additional credits expire 60 days from signing up for an NVIDIA AI Enterprise trial license with a business email, which can be done at account creation or when requesting additional credits by providing a business email.

How do I get additional API credits?

To obtain more credits, log in to build.nvidia.com and click on your profile, then select ‘Request More’. If you signed up to use the API catalog with a personal email address, you will be asked to provide a business email to activate a free 90-day NVIDIA AI Enterprise license and unlock additional 4000 credits.

How do I continue using NIM after I use up my credits?

To continue using NIM after you’ve used up your credits or they have expired, you have these options:

  1. Self-host the API on your cloud provider or on-prem. Research and test use is free under the NVIDIA Developer Program. Please note that your organization must have an NVIDIA AI Enterprise license for production use.
  2. Use serverless NIM API on Hugging Face with per-pay-use pricing. The NVIDIA AI Enterprise license is included with this option so you don’t need a separate license.

What is the pricing model for NIM?

To use NIM In production, your organization must have an NVIDIA AI Enterprise license. These licenses start at $4500 per GPU per year or ~ $1 per GPU per hour in the cloud. Pricing is based on the number of GPUs, not the number of NIMs. Pricing is the same regardless of GPU size.

What does NVIDIA support in regard to NIM through NVIDIA AI Enterprise?

NVIDIA AI Enterprise supports the optimized inference engine and runtime of the container. It does not support output that may be generated by the models or the models themselves. For more information, visit the NVIDIA AI Enterprise Support overview page.