google / paligemma

Model Overview

Description:

The Google PaLIGemma model is a one-shot visual language understanding solution for image-to-text generation.

Terms of use

By using this model, you are agreeing to the terms and conditions of the
license,
acceptable use policy and
Google Research privacy policy.

References(s):

Model Architecture:

Architecture Type: Transformer

Network Architecture: SigLIP + Gemma

Input:

Input Format: Red, Green, Blue (RGB) Image + Text

Input Parameters: None

Other Properties Related to Input: None

Output:

Output Format: Text

Output Parameters: temperature, top_p, max_tokens

Other Properties Related to Output: stream

Supported Operating System(s):

Linux

Inference:

Engine: Triton

Test Hardware: Other