Model Overview
Description:
Fuyu-8B is a multi-modal transformer introduced by Adept AI. It can perform a wide range of tasks,
including image understanding, text generation, and code generation.
Architecturally, Fuyu is a vanilla decoder-only transformer - there is no image encoder.
Image patches are instead linearly projected into the first layer of the transformer, bypassing the embedding lookup.
The transformer decoder is simply treated like an image transformer (albeit with no pooling and causal attention).
Note
This API is used in conjunction with the NVCF large assets API.
Terms of use
By accessing this model, you are agreeing to the Fuyu-8b terms and conditions of the CC BY-NC license.
Third-Party Community Consideration:
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see Fuyu's Hugging Face Model Card.
References(s):
- Fuyu-8B Blog Post by adept.ai
- Fuyu-8B Model Card on Hugging Face
Model Architecture:
Architecture Type: Transformer
Network Architecture: Fuyu-8b
Model Version: N/A
Input:
Input Format: Red, Green, Blue (RGB) Image + Text
Input Parameters: None
Output:
Output Format: Text
Output Parameters: None
Software Integration:
Supported Hardware Platform(s): Hopper, Ampere/Turing
Supported Operating System(s): Linux
Inference:
Engine: Triton
Test Hardware: Other