USD Search
Model Overview
USD Search is an AI-powered search for OpenUSD data, three dimensional (3D) models, images, and assets using text or image-based inputs. It leverages NVCLIP, which is a NVIDIA commercial version of the "Contrastive Language-Image Pre-Training (CLIP)" model that transforms an image into textual embeddings.
References:
- Radford, Alec, et al. "Learning transferable visual models from natural language supervision." International conference on machine learning. PMLR, 2021.
Model Architecture:
Architecture Type: Transformer-based architecture
Input:
Input Type(s): Text or Image
Input Format(s): Text, or Red, Green, Blue (RGB)
Other Properties Related to Input: The model accepts either text or image input, but not both simultaneously
Output:
Output Type(s): List
Output Format: Rendered thumbnails, asset metadata
Other Properties Related to Output:
The output of this model is a sorted-by-relevance list of OpenUSD assets. List contains rendered thumbnails and associated metadata containing URL pointing to the location of the asset in the backend database.
Software Integration:
Runtime Engine(s):
- TensorRT
Supported Hardware Architecture(s):
- NVIDIA Ampere
- NVIDIA Hopper
- NVIDIA Lovelace
Supported Operating System(s):
- Linux
Model Version(s):
- nv_clip_224_vit_h - NVCLIP ViT-H with 224 resolution.
Training & Evaluation:
No additional training or evaluation in addition to what has been done for the NVCLIP model.
These models need to be used with NVIDIA hardware and software. For hardware, the models can run on any of the latest NVIDIA GPUs since NVIDIA Ampere.
Training Dataset:
Data Collection Method by dataset:
- Automated
Labeling Method by dataset:
- Automated
Properties:
Dataset | No. of Images |
---|---|
NV Internal Data | 700M |
Evaluation Dataset:
Link: https://www.image-net.org/
Data Collection Method by dataset:
- Unknown
Labeling Method by dataset:
- Unknown
Properties:
50,000 validation images from ImageNet dataset
The performance details of the underlying NVCLIP model is noted below.
Methodology and KPI
The performance of zero shot accuracy of NVCLIP on ImageNet validation dataset.
model | top-1 Accuracy |
---|---|
ViT-H-336 | 0.7786 |
ViT-L-336 | 0.7629 |
Inference:
Engine: TensorRT
Test Hardware:
- A100
- L40
Ethical Considerations:
NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse. For more detailed information on ethical considerations for this model, please see the Model Card++ Promise and the Explainability, Bias, Safety & Security, and Privacy Subcards.