Model Overview
Note: You need to request the model checkpoint and license from Stability AI
Request the model checkpoint from Stability AI
Description:
SDXL is a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder.
Model Card
Terms of use
By accessing this model, you are agreeing to the SDXL 1.0 terms and conditions of the license, acceptable use policy and stability.ai privacy policy
Third-Party Community Consideration:
This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see Stability-AI's SDXL Model Card.
References(s):
- SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis paper
- Stability-AI repo
- Stability-AI's SDXL Model Card webpage
Model Architecture:
Architecture Type: Transformer and Convolutional Neural Network (CNN)
Network Architecture: UNet + attention blocks
Model Version: SDXL 1.0
Input:
Input Format: Text
Input Parameters: scheduler type, denoising steps, classifier-free guidance
Output:
Output Format: Red, Green, Blue (RGB) Image
Output Parameters: 2D
Software Integration:
Supported Hardware Platform(s): Hopper, Ampere/Turing
Supported Operating System(s): Linux
Training & Finetuning:
Dataset:
Link: LAION 5B
Properties (Quantity, Dataset Descriptions, Sensor(s)):
The dataset consists of 5.85 billion CLIP-filtered image-text pairs.
Inference:
Engine: Triton
Test Hardware: Other