Model Overview

Note: You need to request the model checkpoint and license from Stability AI

Request the model checkpoint from Stability AI

Description:

SDXL is a latent diffusion model for text-to-image synthesis. Compared to previous versions of Stable Diffusion, SDXL leverages a three times larger UNet backbone: The increase of model parameters is mainly due to more attention blocks and a larger cross-attention context as SDXL uses a second text encoder.

Model Card

SDXL Model Card

Terms of use

By accessing this model, you are agreeing to the SDXL 1.0 terms and conditions of the license, acceptable use policy and stability.ai privacy policy

Third-Party Community Consideration:

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party’s requirements for this application and use case; see Stability-AI's SDXL Model Card.

References(s):

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis paper
Stability-AI repo
Stability-AI's SDXL Model Card webpage

Model Architecture:

Architecture Type: Transformer and Convolutional Neural Network (CNN)

Network Architecture: UNet + attention blocks

Model Version: SDXL 1.0

Input:

Input Format: Text

Input Parameters: scheduler type, denoising steps, classifier-free guidance

Output:

Output Format: Red, Green, Blue (RGB) Image

Output Parameters: 2D

Software Integration:

Supported Hardware Platform(s): Hopper, Ampere/Turing

Supported Operating System(s): Linux

Training & Finetuning:

Dataset:

Link: LAION 5B

Properties (Quantity, Dataset Descriptions, Sensor(s)):

The dataset consists of 5.85 billion CLIP-filtered image-text pairs.

Inference:

Engine: Triton

Test Hardware: Other