Model Overview
Description:
The Google PaLIGemma model is a one-shot visual language understanding solution for image-to-text generation.
Terms of use
By using this model, you are agreeing to the terms and conditions of the
license,
acceptable use policy and
Google Research privacy policy.
References(s):
Model Architecture:
Architecture Type: Transformer
Network Architecture: SigLIP + Gemma
Input:
Input Format: Red, Green, Blue (RGB) Image + Text
Input Parameters: None
Other Properties Related to Input: None
Output:
Output Format: Text
Output Parameters: temperature, top_p, max_tokens
Other Properties Related to Output: stream
Supported Operating System(s):
Linux
Inference:
Engine: Triton
Test Hardware: Other