adept / fuyu-8b

Model Overview

Description:

Fuyu-8B is a multi-modal transformer introduced by Adept AI. It can perform a wide range of tasks,
including image understanding, text generation, and code generation.
Architecturally, Fuyu is a vanilla decoder-only transformer - there is no image encoder.
Image patches are instead linearly projected into the first layer of the transformer, bypassing the embedding lookup.
The transformer decoder is simply treated like an image transformer (albeit with no pooling and causal attention).

📗

Note

This API is used in conjunction with the NVCF large assets API.

Terms of use

By accessing this model, you are agreeing to the Fuyu-8b terms and conditions of the CC BY-NC license.

Third-Party Community Consideration:

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see Fuyu's Hugging Face Model Card.

References(s):

Model Architecture:

Architecture Type: Transformer

Network Architecture: Fuyu-8b

Model Version: N/A

Input:

Input Format: Red, Green, Blue (RGB) Image + Text

Input Parameters: None

Output:

Output Format: Text

Output Parameters: None

Software Integration:

Supported Hardware Platform(s): Hopper, Ampere/Turing

Supported Operating System(s): Linux

Inference:

Engine: Triton

Test Hardware: Other