Model Overview

Description:

Fuyu-8B is a multi-modal transformer introduced by Adept AI. It can perform a wide range of tasks,
including image understanding, text generation, and code generation.
Architecturally, Fuyu is a vanilla decoder-only transformer - there is no image encoder.
Image patches are instead linearly projected into the first layer of the transformer, bypassing the embedding lookup.
The transformer decoder is simply treated like an image transformer (albeit with no pooling and causal attention).

📗
Note
This API is used in conjunction with the NVCF large assets API.

Terms of use

By accessing this model, you are agreeing to the Fuyu-8b terms and conditions of the CC BY-NC license.

Third-Party Community Consideration:

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see Fuyu's Hugging Face Model Card.

References(s):

Fuyu-8B Blog Post by adept.ai
Fuyu-8B Model Card on Hugging Face

Model Architecture:

Architecture Type: Transformer

Network Architecture: Fuyu-8b

Model Version: N/A

Input:

Input Format: Red, Green, Blue (RGB) Image + Text

Input Parameters: None

Output:

Output Format: Text

Output Parameters: None

Software Integration:

Supported Hardware Platform(s): Hopper, Ampere/Turing

Supported Operating System(s): Linux

Inference:

Engine: Triton

Test Hardware: Other

adept / fuyu-8b

Model Overview

Description:

📗
Note

Terms of use

Third-Party Community Consideration:

References(s):

Model Architecture:

Input:

Output:

Software Integration:

Inference:

Model Overview

Description:

📗Note

Terms of use

Third-Party Community Consideration:

References(s):

Model Architecture:

Input:

Output:

Software Integration:

Inference:

📗
Note