Model Overview

Description:

The Google PaLIGemma-3B-mix model is a one-shot visual language understanding solution for image-to-text generation. This model is ready for commercial use.

Third-Party Community Consideration

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see link to Google's (PaliGemma Model Card.

License, Acceptable Use, and Research Privacy Policy

By using this model, you are agreeing to the terms and conditions of the
License,
Acceptable Use Policy and
Google Research Privacy Policy.

References(s):

Model Architecture:

Architecture Type: Transformer

Network Architecture: SigLIP + Gemma

Input:

Input Format: Image + Text

Input Parameters: Image: Red, Green, and Blue (RGB); Text: String

Other Properties Related to Input: Prompt to caption the image or a question.

Output:

Output Format: Text

Output Parameters: temperature, top_p, max_tokens

Other Properties Related to Output: Stream

Supported Operating System(s):

Linux

Inference:

Engine: Triton

Test Hardware: Other